Raw data is picked from the source and has not been subjected to any processing or manipulation by any software or the researcher. The data must then be processed by the researcher to make it useful.
Reply 1: Why are the original/raw data not readily usable by analytic tasks?
Raw data is not used for the analytic task because it is usually dirty and has to be cleaned and transformed into a format that can be analyzed. Cleaning data involves detecting inaccurate information from the recorded set of data and correcting it by deleting, modifying, and replacing it with accurate data to make it consistent with the system (Sharda, Delen, & Turban, 2020).
Raw data is usually misaligned. Raw data is not arranged in a specific way or stored using a specific format that is consistent with the subject in research. The researcher has to align the data well to be consistent with the data architecture so that it can match with the behaviors of the whole set of data. The data must be organized linearly to show similarity with the data already in the analytic machines (Sawall, Hahn, Maier, Kachelriel, 2018).
Raw data is usually composed of multiple information and parts that are not easy to understand because they are not straightforward, making it overly complex. You realize the data’s complexity when you encounter difficulty in analyzing the data to make it of business value. The structure of the data and the size are major factors that make data complex if they are not broken down into simpler bits that will enhance better understanding (Castle, 2017).
Raw data is also not used in analysis since it’s usually inaccurate. It contains errors of omissions and errors of commissions that have to be rectified before it’s used for analysis.
Reply 2: Data processing steps.
Raw data is processed in the following steps.
Data consolidation- at this stage, data is collected from the source, then the necessary information selected, and then this information is merged. The second step is the data cleaning step, where raw data value is identified, and work is done on it. Analyst identifies the unusual values within variables using expert’s opinions. The third stage is where raw data is transformed for easy processing. Here data is normalized between certain maximum and minimum to reduce bias and some values categorized to make the data more amendable by computers. The final phase is data reduction, so that it can be visualized and used for prediction purposes. It also assists in making the data more manageable and the most relevant subset of data (Sharda et al., 2020).
Importance of data processing steps
- Data consolidation helps to ensure great quality of data is available and makes the data easier to access and analyze.
- Data cleansing is to assist in making data sets consistent with other similar data sets in the systems. It helps to remove errors that may be available in the data.
- Data transformation helps to make the data easier for both humans and computers to use. It’s achieved when data is converted from one format to the other hence easy to analyze.
- Data reduction helps to come up with a reduced representation of the whole data that is smaller in size but contains the required information from the whole data.
Sharda, R., Delen, D., Turban, E. (2020). Analytics, Data Science, & Artificial Intelligence: Systems for Decision Support
Sawall, S., Hahn, A., Maier, J., Kachelriel, M. (2018) Intrinsic Raw Data-based CT misalignment correction without redundant data
Castle, E. (2017) signs you are dealing with complex data