What can Venn diagrams teach us about doing data science better?

被引:0
作者
Sung Yang Ho
Sophia Tan
Chun Chau Sze
Limsoon Wong
Wilson Wen Bin Goh
机构
[1] Nanyang Technological University,School of Biological Sciences
[2] Nanyang Technological University,Teaching Learning and Pedagogy Division
[3] National University of Singapore,Department of Computer Science
来源
International Journal of Data Science and Analytics | 2021年 / 11卷
关键词
Exploratory data analysis; Data science; Graph literacy; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Data science is about deriving insight, learning and understanding from data. This process may be automated via the use of advanced algorithms or scaffolded cognitively via the use of graphs. While much emphasis is currently placed on machine learning, there is still much to learn about the role of the data scientist, in particular the thinking process by which he reaches conclusions. The thinking process of the data scientist needs to be scaffolded as the human brain is easily overwhelmed by many variables. Graphs are a form of data abstraction and constitute an essential part of the data scientist’s toolkit. Graphs are also a viable scaffold on which the data scientist may gain familiarity with data. But the process of extracting insight from graphs is not always a trivial or straightforward process; it requires interpretative logic as well. Generalizing from the example of a simple graph type, the Venn diagram, we discuss various logical fallacies that can be committed when interpreting a Venn diagram. Amidst various considerations that dictate how a graph should be tackled, we explain why context is most important, and should form the first guiding principle during data analysis.
引用
收藏
页码:1 / 10
页数:9
相关论文
共 68 条
[1]  
Rudin C(2019)Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Nat. Mach. Intell. 1 206-215
[2]  
Fellous JM(2019)Explainable artificial intelligence for neuroscience: behavioral neurostimulation Front. Neurosci. 13 1346-57
[3]  
Sapiro G(2018)Conducting highly principled data science: a Statistician’s job and joy Stat. Prob. Lett. 136 51-76
[4]  
Rossi A(2005)How many variables can humans process? Psychol. Sci. 16 70-304
[5]  
Mayberg H(2018)Visualization of biomedical data Ann. Rev. Biomed. Data Sci. 1 275-5
[6]  
Ferrante M(2012)Visualizing biological data Nat. Methods 9 1131-769
[7]  
Meng X-L(2019)AI paradigms for teaching biotechnology Trends Biotechnol. 37 1-1252
[8]  
Halford GS(2015)Beyond bar and line graphs: time for a new data presentation paradigm PLoS Biol. 13 e1002128-35
[9]  
Baker R(2010)Domain-driven data mining: challenges and prospects IEEE Trans. Knowl. Data Eng. 22 755-9
[10]  
McCredden JE(2012)A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations Psychol. Bull. 138 1218-130