Information systems and big data

Table of Contents

Introduction 1

Big data 2

Characteristics 2

Techniques 4

Limitations 5

Examples 6

Conclusion…………………………………………………………………………………….….8

References……………………………………………….………………………………………..8

Introduction:-

Information systems and big data are used to analyze big data better to understand the information available within an organization. It predicts market trends, customer’s demand, and patterns, which is essential in effective decision-making. It extracts new information from the given data, which is necessary for the organization. The data analysts state various alternatives and change in any pattern if needed to analyze the big data. The information is divided into two types, that is, unstructured and semi-structured data. These data are run through a processing engine by the analysts. The data should be organized and configured correctly so that the analysts can work on it quickly.

Big data analysis:-

There are many hidden patterns in an organization. Big data analysis makes it possible to determine these patterns, which are necessary for the organization’s decision-making process. It is very hectic for the organization to extract information from big data. So, various techniques have been implemented to make it easy to know all the big data. These data are characterized based on source, size, and structure. Though there are many intelligence queries (IQ), it is not possible to access information. So, predictive models and statistical algorithms of data visualization are put into action. With the growth of artificial intelligence, the internet of things, and social media, the big data volume tries to fluctuate. The problem of big data came into existence in 1990 and gave rise to big data analytics. When large spreadsheets were used to analyze numbers by the organization, the term big data emerged. With the increase in search engines and mobile devices, the volume of the data went on expanding. It became impossible to handle these data as the speed of data kept on increasing. The relational database deletes unnecessary data and implements normalization. Relational database exports and imports the data, which is very vital. It also supports a backup of all the data. The relational database is very flexible and enables us to change insert or delete tables while the database is running or while queries are occurring. However, it was not an easy task for the relational database to extract all the necessary information. The example of a few big data analytics engines are as follows:-

Hadoop
Apache spark
Apache Kafka
SQL engine
Presto

Characteristics:-

The characteristics of big data are as follows:-

Variety
Velocity
Veracity
Value
Volume (Pandit et al,2019)

Variety:-

The data are enormous in number and are of a wide range. The data available from various websites, social media, and review sites are vast and used to process and analyze them. There is no limit to collecting data by the analysts. So the data were divided into the following:-

Structured data- this is the set of specified data that are organized and can be accessible to the people.
Unstructured data- this is a set of unorganized data that is not readily available. These include images, videos, posts, etc.., which don’t have any particular structure.
Semi-structured data- this is a set of data lying in between structured and unstructured data. It is neither standard nor minimal. Email is an example of semi-structured data.

Velocity-

Various data are obtained from different sources and at high speed. The speed tries to exceed each data. With a rate increase, the information also doubles and becomes a severe problem for many analysts. Examples of these data are Facebook posts, messages, ecommerce transactions, credit card swipes, etc.

Veracity:-

Veracity indicates the quality and accuracy of data collected. To conclude, it is essential to obtain the correct data available. From the various data available, accurate data has to be identified and implemented accordingly. As per records, over $3.1 trillion a year has to be paid by the US companies for bad decision-making, which was obtained from inferior quality of data and improper data.

Value:-

The value defines value added to the data collected for the smooth running of the business. It determines how effectively the organization covers these data into a real place. It is through data that customer preferences can be understood, and the decision is made wisely. Amazon uses to detect the value and increase its business globally.

Volume:-

The size of the data refers to the volume. These vast volumes of data require various techniques and technologies to gather accurate information. The technologies have to be superior to a laptop or desktop processor. Facebook currently has 2.2 billion active users, so the data available is also massive. This massive volume of data obtained has to be adequately analyzed.

Techniques of big data analysis:-

The techniques of big data analysis are as follows:-

A/B testing
Data fusion and data integration
Data mining
Machine learning
Natural Language Processing
Statistics

A/B testing:-

A/B testing data analysis technique is used to compare a control group with other test groups. According to the comparison, the best alternative can be implemented, and changes in the pattern can be made if required (Azevedo et al,2020).

Data fusion and data integration:-

It is much more profitable to merge and integrate data from multiple sources than that single source of data.

Data mining:-

Data mining data analysis technique is used to extract patterns from the wide range of data available with machine learning (Wang et al,2020). It is used to determine the customer preferences from the customer data.

Machine learning:-

Artificial intelligence and machine learning are a few popular ways to analyze data. It is essential to make assumptions from the data. The machine makes it possible to make these predictions, which are impossible for data analysts.

Natural Language Processing:-

Natural Language Processing or NLP data analysis technique is used to convert algorithms to human language.

Statistics:-

This technique is used to collect and organize data from various surveys and experiments conducted.

Limitations of big data analysis:-

Data quality
Cultural change
Compliance
Cybersecurity
Rapid change
Costs

Data quality:-

It is very crucial to maintain the quality of the data. Since various data is collected, it is essential to manage accurate data. The quality of information is not always up to the mark and affects the decision-making process. The proper format of analysis has to be implemented to avoid the latter.

Cultural change:-

The organization only tries to focus on the data collected and excludes to create a data-driven culture throughout the workflow. It is one of the main limitations of big data analysis. As per the surveys of NewVintage, only a few organizations are capable of implementing a corporate culture throughout the organization.

Compliance:-

It is a big issue to compile the data according to the Rules and Regulations of the Government. While handling and collecting data, it is crucial to meet the standards of the government. However, it is not possible for many organizations.

Cybersecurity:-

The data are very personal and sensitive. It should be stored to ensure protection. It is often evident that cyber attackers tend to violate the norms and exploit the data.

Rapid change:-

Technology tends to change rapidly, so it is impossible for an organization to invest in only a single machine. With the implementation of one device, there is an emergence of another new machine. So, it is not okay to invest only in one machine.

Costs:-

With the implementation of big data analysis, the software costs reduce, and investment should be made in training, staffing, maintenance, and hardware. It usually includes enormous costs and can be a con for the organization.

Examples of big data technologies:-

Artificial intelligence
NoSQL database
R programming
Data lakes
Apache spark
Blockchain
Hadoop

Artificial intelligence:-

The new machines require experience, human intelligence, and skills to operate. Artificial intelligence is becoming an essential branch of science (Niazi et al,2019). The strength to intellectualize and making decisions is a vital aspect of artificial intelligence. Artificial intelligence is also used for drug treatment, conducting surgery in OT.

NoSQL database:-

NoSQL database is used to design modern applications within an organization. It not only retrieves data but also accumulates it. It increases the performances and uses various relational databases.

R programming:-

R programming is the programming language which is used to develop specific environment, visualization, etc. It is software free. It is considered to be the most prominent language. It is used by many data miners and analysts to design data efficiently.

Data lakes:-

While saving data, it is not possible to transform unstructured data into structured data. Data lakes ensure that data can be transferred to other data structures in the process of accumulation. It fetches better opportunities available for better functioning of the organization. It enables that new machines are implemented by identifying the peers.

Apache spark:-

Apache spark is famous for its faster transformation of data. It is the quickest and standard generator. It supports Java, Scala, Python, etc. it reduces the period between decision making and execution of the decisions. It had also introduced Hadoop with to intention to increase the speed.

Blockchain:-

Blockchain provides security to the data collected. In this, once the data had been input, it can neither de deleted nor be changed in the future (Dai et al,2019). It is appropriate for banking, retailing, finance, healthcare, and insurance industries. Various companies, like IBM and Microsoft, have already implemented blockchain technology. However, blockchain technology is improving itself.

Hadoop:-

Hadoop ecosystem is used to create a platform that can resolve all problems surrounding big data analysis. Hadoop ecosystem includes Apache Open Source projects and other commercial tools (Glushkova et al,2019). Examples of a few open sources are Spark, Hive, Oozie, etc. The services provided by it are HDFS, YARN, MapReduce, etc.

Conclusion:-

According to the IT industries’ demand, various technologies in big data analysis are emerging as days are passing by. These technologies ensure salvation within an organization. The following report gives detailed research on big data analysis. It also states the new technologies used in recent times. It discusses the techniques of big data analysis. However, there are both good and evil in everything. The disadvantages of big data analysis have also been mentioned in the above report. Various techniques have been identified to manage the wide variety of data collected by the analysts. With proper implementation, the analysts can predict the situation and make decisions wisely to meet any emergency. It gathers necessary information from the existing data of the organizations and provide

s accurate data. If implemented correctly, it can prove to be a success for an organization.

References:-

Glushkova, D., Jovanovic, P. and Abelló, A., 2019. Mapreduce performance model for Hadoop 2. x. Information Systems, 79, pp.32-43.

Dai, H.N., Zheng, Z. and Zhang, Y., 2019. Blockchain for Internet of Things: A survey. IEEE Internet of Things Journal, 6(5), pp.8076-8094.

Niazi, M.K.K., Parwani, A.V. and Gurcan, M.N., 2019. Digital pathology and artificial intelligence. The lancet oncology, 20(5), pp.e253-e261.

Pandit, V., Amiriparian, S., Schmitt, M., Mousa, A. and Schuller, B., 2019. Big data multimedia mining: feature extraction facing volume, velocity, and variety. Big Data Analytics for Large-Scale Multimedia Search, 61.

Azevedo, E.M., Deng, A., Montiel Olea, J.L., Rao, J. and Weyl, E.G., 2020. A/B testing with fat tails. Journal of Political Economy, 128(12), pp.000-000.

Wang, S., Cao, J. and Yu, P., 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Transactions on Knowledge and Data Engineering.