The trouble with data visualization is it’s in the beginning of the data, information, knowledge, and wisdom (DIKW) hierarchy (Wegener & Petty, 1994, p. 6). Being in the beginning means that most of the data will be raw and unrefined. Naturally problems such as mistyped entries, duplicate entries, and other errors are common. This is why it’s important to have normalized data before it can be used for analytics.
Data warehouses can provide an important role in solving this problem. They can provide a service to make sure that the data is clean for analysis. Aside from errors, data warehouses should also avoid data that are not relevant, identify data that may not be complete, and periodically test and recheck the data (Bramantoro, 2018, p. 834-837). Making sure that the data is coded correctly and can be updated with less problems will make it easier for data scientist to analyze.
Having quality control over data ensures that it can be used over many other applications for the data analyst. With less defectiveness and irregularities, it can be readily used for different purposes. The quality of the data can ensure the analyst that the result is void of the “garbage in, garbage out” problem (Zhang et al., 1993, p. 1). The analyst can be confident in knowing that the result that they have is actionable knowledge.
As time goes on, there will be more efficient and user friendly ways to deal with data set problems. Similar to other computer applications, the future for data warehouses will probably be made much simpler and with less hardcoding (Jamsa, 2020, p. 11). Like the move from statement based programming to visual based, data will become easier to access and analyze.
References
Bramantoro, A. (2018). Data Cleaning Service for Data Warehouse: An Experimental Comparative Study on Local Data. Telkomnika, 16(2), 834-842.
Guang Lan Zhang, Zhang, G. L., Sun, J., Chitkushev, L., & Brusic, V. (2014). Big Data Analytics in Immunology: A Knowledge-Based Approach. BioMed Research International, 2014, 1-9.
Jamsa, Kris. Data Mining and Analytics. Jones & Bartlett Learning, 2020.
Mastrian, Kathleen Garver, and Dee McGonigle. Informatics for Health Professionals. Burlington, MA, Jones & Bartlett Learning, 2017.