Data mining and visualization tools are important for data structured models. When looking at traditional relationship databases, we can see primary key and foreign key data yielding different results (Jamsa, 2020, p. 77). The information is usually in a static table format. Data mining and visualization can change how information is presented by making the experience exciting, dynamic, and informative for drilling and finding insightful information.
Often cited as the gold standard use of data visualization is “Hans Rosling’s 200 Countries, 200 Years, 4 Minutes” (BBC, 2010). The video is short but spans 200 countries and 200 years of data. It shows how wealth has an effect on lifespan. Rosling’s video is engaging and makes use of categorical coloring to represent regions of the world and larger and smaller circles representing quantitative population data. It even uses time as a 5th dimension for data visualization (HasGeek TV, 2015). The visuals are impressive but also extremely informative. Rosling may never had achieved the same effect had it only been simple charts.
Visualization and mining tools can also help in drilling down information. For example, databases for cancer can describe the growth of tumors but then can have more detailed information such as the shape and the state of the cells (Stephanou et al., 2020). The visualization from these models can be integrated with patient data to spot trends and to provide better treatment. Genomic data is another example that can benefit from visualization and mining. Disease subtypes often appear as clusters in different data sets (Swanson et al., 2019). Using unsupervised clustering algorithms can yield more understanding of different diseases. Visualization can also help in an Anscombe’s Quartet type of problem where data sets have the same descriptive statistics but are quite different when plotted visually (Revell et al., 2018).
Data visualization is already making a lot of headway and will advance further in how people present information. People are more visually inclined and visualizations can often tell a more informative story than with numbers alone. A data analyst will have to constantly be learning and catching up with new software trends to keep pace.
References
BBC. (2010, November 26). Hans Rosling’s 200 Countries, 200 Years, 4 Minutes – The Joy of Stats – BBC Four [Video file]. Retrieved from https://youtu.be/jbkSRLYSojo
HasGeek TV. (2015, July 17). Amit Kapoor – Visualising Multi Dimensional Data [Video file]. Retrieved from https://youtu.be/X8rNDvPNg30
Jamsa, K. (2020). Introduction to Data Mining and Analytics. Burlington, MA: Jones & Bartlett Learning.
Revell, L. J., Schliep, K., Valderrama, E., Richardson, J. E., & Blomberg, S. (2018). Graphs in phylogenetic comparative analysis: Anscombe’s quartet revisited. Methods in Ecology and Evolution, 9(10), 2145-2154.
Stephanou, A., Ballet, P., Powathil, G., & Volpert, V. (2020). Hybrid data-based modelling in oncology: successes, challenges and hopes. Mathematical Modelling of Natural Phenomena, 15, 21-12.
Swanson, D. M., Lien, T., Bergholtz, H., Sørlie, T., Frigessi, A., & Hancock, J. (2019). A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort. Bioinformatics, 35(23), 4886-4897.