Final Project Research Paper
Abstract
The current business world is characterized by hostile competition, and every firm must strive to remain at the edge of the competition or be rendered obsolete. Data and information are some of the assets that determine the continuity and survival of the firm as well as its profitability. This research paper will primarily discuss some of the essential data related topics, namely, data warehousing and big data. The concept and implementation of data warehousing simplify the analyzing and reporting concept of the firm by containing commutative and historical data from various single and multiple sources. Data warehouses are typically subject-oriented, integrated, time-variant, and non-volatile data repositories that are used to facilitate a single version of truth for an organization’s forecasting and decision making. I will also cover the process of changing the format, structure, and values of data in (data transformation) and some of the significant trends in data warehousing. Due to the availability of rich sources of information such as the internet and social media platforms, modern-day organizations have the capability to collect a massive amount of data for analytical purposes. The analysis of big data provides useful insights into the current and future market conditions as well as consumer requirements. In the field of software engineering, big data provides valuable information concerning end-user requirements. Lastly, I will discuss the significance of IT green computing. IT green computing is one of the disrupting topics that are revolutionizing computing by helping firms adopt eco-friendly computing options.
Introduction
In the current information age, data and information are the most valuable assets that any organization can possess. A data warehouse is a section of an information system that acts as a data repository and is used by the firm to store data from single or multiple sources. The primary role of a data warehouse is to simplify the process of analyzing data and generating useful insights. A data warehouse is constructed by incorporating data from multiple heterogeneous sources that support analytical reporting, decision making, and structured or ad hoc queries. In simple terms, data warehousing is the process of constructing and using a data warehouse for the sole purpose of storing and analyzing data.
Advancement in data warehousing has lead to the collection of massive amounts of data that cannot be analyzed using traditional analytical tools. These enormous amounts of data are referred to as big data and are sourced from rich data sources such as social media platforms, such as Facebook and Instagram, and the internet. The analysis of big data has significantly improved the functionality and operations of a firm by helping the management make more intelligent decisions. Working as a software engineer, I have first-hand experience on the significance of big data and its importance in decision making.
Every industry across the globe is working towards conserving the environment, and the IT industry is no different. PricewaterhouseCoopers is a perfect example of an IT firm that has incorporated the principles of IT Green Computing in its operations. For instance, the firm has significantly reduced its carbon print and power consumption.
Prompt 1 Data Warehouse Architecture
Over the years, the data warehouse has been the business-insights workhorse of business computing. There are mainly three types of data warehouse architectures, namely single-tier architecture, two-tier architecture, and three-tier architecture. The three-tier architecture is the most widely used, and it consists of the bottom tier, middle tier, and top tier. A data warehouse is based on an RDBMS server that acts as a central data repository that is surrounded by some crucial components that make the entire environment accessible, functional, and accessible. The following are the main components of a data warehouse.
Data Warehouse Database.
The central database is the core of the entire data warehousing environment. The primary database is implemented on the RDBMS technology, whose functionality is limited by the fact that a traditional RDBMS system is designed and optimized for transactional database processing and not for data warehousing. The limitations which are placed because of relational data models such as Essbase from oracle are overcome using the multidimensional database (MDDBs). In addition, new index structures are used to improve speed and bypass the relational table. Relational databases are deployed in parallel to ease analyzation, allow shared memory, or shared-nothing model on various multiprocessor configurations as well as allow for scalability.
Sourcing, Acquisition, Clean-Up And Transformation Tools (ETL)
The data sourcing, migration, and transformation tools are used to summarize, transform, and perform all the changes needed to encode the data into a unified format in the data warehouse. These tools are commonly known as Extract, Transform, and Load (ETL) Tools. The EFL tools are also helpful in maintaining the metadata. The following are some of the functions of these tools:
- Eliminate unnecessary and unwanted data from loading into the data warehouse.
- Calculating summaries and derived data
- Anonymizing data as per regulatory stipulations
- Populate missing data with default values
- Normalize and de-duplicate repeated data that has arrived from multiple sources.
Metadata
Metadata defines the information that describes the data in a data warehouse. Metadata is used for creating, building, maintaining, and managing the data warehouse. It defines the source, values, features, and usage of data warehouse data. Metadata provides answers to questions such as where did the data come from? What attributes, keys, and tables does the data warehouse contain and how many times do data get reloaded.
Query Tools
These tools are used to provide information to the business to make strategic decisions. These tools fall into four categories.
- Query and reporting tools – designed for end-users for their analysis and report generation.
- Application and development tools – used to develop custom reports
- Data mining tools – used to automate the process of discovering meaningful new correlations, trends, and patterns by mining large amounts of data.
- OLAP tools – Allows users to analyze the data using complex dimensional views.
Data Marts
The data mart defines an access layer that is used to get data to users. It takes less time and financial resources to build, thus presented as an option for a large size data warehouse.
Data Transformation
Data transformation refers to the process of changing the structure, values, and format of data. ETL tools are used to transform data during the initial stages of storage and other processes, such as mining. Data transformation can be:
- Constructive – such as adding, copying and replicating
- Destructive – such as deleting
- Aesthetic – such as standardizing
- Structural – such as moving, renaming, and combining columns.
New Trends In Data Warehousing
- As cloud computing continues to expand and evolve, there is exponential growth in data warehousing technologies. Cloud computing is slowly replacing the current data repository technologies due to the ever-increasing on-demand availability of storage, higher-level services, and resources. The following are some of the emerging trends and technologies in data warehousing.
- Using managed services – these are the type of higher-level services, in which the cloud automatically processes most of the challenging issues for particular users. These issues include reliability, performance, efficiency, scalability, and security. Since the cloud providers bill most of these high-level services on-demand, managed services create possibilities to reduce costs.
- Data marts for production lines – these technologies are essential to analyze data for different production lines. They are also used as an intermediate source of data for each business unit to analyze their data isolation as well as the data warehouse.
- Using columnar storage – columnar storage is used to store data from various sources and improve disk performance in comparison to row-based storage when retrieving sophisticated analytical queries.
- In-memory Analytical Engines – these engines are used when performing analytics and reporting by processing massive amounts of data by processing them in parallel for fast responses and visualization.
Prompt 2 Big Data
The availability of rich data sources such as the internet has enabled firms to acquire massive amounts of both structured and unstructured data. The main characteristic of big data is that it is difficult to process it using traditional database technologies and analytical techniques. Big data analytics examines and analyzes massive amounts of data to uncover hidden patterns, correlations, and other insights. Modern-day technologies such as cloud computing enable firms to get answers to most of the challenges they face today and in the future. Advancement in mining tools such as R studio and Python programming language has also enhanced the significance of big data. The new benefits that big data analytics brings to the table are speed and efficiency. The ability to operate and work faster, as well as stay agile, gives businesses a competitive edge and help them to remain competitive in the hostile business world.
Working as a software engineer, I have to analyze end-user requirements to determine the kind of software to make. The three primary sources of big data for the organization include social data from likes, tweets, and general media, the internet, and transactional data from all the daily transactions. Despite the availability of massive amounts of data, real business value comes from the ability to combine this data in ways to generate decisions, insights, and actions. However, the management of big data comes at a price, and big data place some demands on the organization and data management technology. These demands include human resources, computing power, and Analytical capability. The organization has invested in cloud computing as well as social media data mining tools. In addition, big data management is a complex task that requires the know-how of a data specialist. As a result, management and analysis of big data is a complex and expensive process that requires both times, human and financial resources.
Prompt 3 Green Computing
Green computing defines eco-friendly and environmentally responsible use of computers and their resources. In other terms, green computing is also defined as the study of designing, manufacturing, using, and disposing of computing devices in a way that reduces their environmental impact. Numerous environmental activists have raised concerns over the effect of computing on the environment, mainly due to carbon emission and radioactivity poisoning. To curb these challenges, firms have adapted green computing and environment-friendly use of computers. PricewaterhouseCoopers is a perfect example of a firm that has implemented Green computing in its functionalities. One of the most notable green computing achievements of PricewaterhouseCoopers is chopping its carbon print by 20% in just two years. PricewaterhouseCoopers incorporated both high-profile projects such as building LEED Gold-certified data center and low-profile projects such as rolling out multifunction printers configured with two-sided as the default that helped to reduce the firm’s office energy use by 18%. The following are some of the activities that have led to the implementation of Green computing at PricewaterhouseCoopers.
- Building a new data center in Georgia that has significantly reduced power consumption by 20 million kilowatts and cutting operational costs by $2 million.
- Reducing traveling by 30% and opting to use environment-friendly means to communicate, such as video conferencing.
The following are some of the ways in which organizations can make their data centers “green.”
- Purchasing from Environmentally Committed Companies.
- Participating in Electronic Recycling Programs.
- Deploying Virtual Technologies.
- Limiting Printing and Recycling Paper.
Conclusion
The topics discussed in this paper are some of the revolutionizing ideas that are disrupting the operations and functionality of IT. The future of data warehousing and big data analysis promises to help the organization succeed in the hostile business world. Continued advancement in technology and exponential growth in technologies such as cloud computing aim to prove the capability of managing data and information. All the new trends in big data management and data warehousing are driven by the need to reduce the cost of operation, increase efficiency, and promote green computing. Techniques such as the use of shared memory and virtual computing have significantly reduced the amount of power needed to run a data center as well as the amount of carbon emitted. Recycling of old computing devices and safe disposal of these devices are some of the additional mechanisms used to promote green computing.