We are now generating massive volumes of data at
an accelerated rate. To meet business needs, address changing market
dynamics as well as improve decision-making, sophisticated analysis of
this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases.
Big data is not a fad. We are just at the beginning of a revolution that will touch every business and every life on this planet.
Every second we create new data. For example, we perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and 1.2 trillion searches per year.
Every second we create new data. For example, we perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and 1.2 trillion searches per year.
Why migrating to Hadoop?
Let’s consider a shopping website. There is a
need to maintain product information, transaction data as well as
product reviews. That data can easily be stored in a relational
database management system (RDBMS). But as the number of comments increases, we must alter the table to accommodate the increase. These changes are near-real-time and data modeling becomes very challenging due to the time and resources required to complete these changes. Any changes in the RDBMS schema may also affect the performance of the production database.
There can be many scenarios similar to this where changes in the RDBMS
schema are required due to the nature and volume of information stored
in the database. These challenges can be addressed using toolsets from the Hadoop ecosystem.
WorkFlow
MySQL
First of all, if we are dealing with MySQL(which most of the
companies are currently using) or any other relational database, Hadoop
may look different. Very different. Apparently, Hadoop is the opposite
to any relational database. Unlike the database where we have a set of
tables and indexes, Hadoop works with a set of text files. And… there
are no indexes at all. And yes, this may be shocking, but all scans are
sequential.
So, when does Hadoop makes sense..?
Hadoop is great if you need to store huge amounts of data (we are
talking about Petabytes now) and those data does not require real-time
(milliseconds) response time. Hadoop works as a cluster of nodes
(similar to MySQL Cluster) and all data are spread across the cluster
(with redundancy), so it provides both high availability (if implemented
correctly) and scalability. The data retrieval process (map/reduce) is a
parallel process, so the more data nodes you will add to Hadoop the
faster the process will be.
Sqoop
Importing/Exporting data from Mysql to Hadoop Ecosystem.
// sqoop import –connect “jdbc:mysql://ip-172-31-20-247:3306/dbname” –table pipeline –username sqoopuser -P –target-dir /path
Apache Spark
For the analysis of big
data, the industry is extensively using Apache Spark. Hadoop enables a
flexible, scalable, cost-effective, and fault-tolerant computing
solution. But the main concern is to maintain the speed while processing
big data. The industry needs a powerful engine that can respond in less
than seconds and perform in-memory processing. Also, that can perform
stream processing as well as batch processing of the data. This is what
made Apache Spark come into existence!
Apache Spark is a powerful open-source framework that provides
interactive processing, real-time stream processing, batch processing
as well as the in-memory processing at very fast speed, with standard
interface and ease of use.
Why Data Visualisation?
As day by day, the data is getting increased it is a challenge to
visualize these data and provide productive results within the lesser
amount of time. Thus, Data visualization comes to the rescue to convey
concepts in a universal manner and to experiment in different scenarios
by making slight adjustments.
- Helps in identifying areas that need attention or improvement.
- Clarify which factors influence customer behavior
- Helps to understand which fields to place where
- Helps to predict scenarios and more
Data visualization just
not makes data more beautiful but also provides insight into complex
data sets by communicating with the key aspects more intrude on the
meaningful ways.
Refer the below insights:If you like this blog, please do show your appreciation by hitting like button and sharing this blog. Also, drop any comments about the post & improvements if needed. Till then HAPPY LEARNING.
Comments
Post a Comment