What is BIG DATA

 As Name suggests BIG Data is huge data created due to IOT, Apps and real time application data colelction

• Walmart handles more than 1 million customer transactions every hour.

• Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.

• 230+ millions of tweets are created every day.

• More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide.


The three different formats of big data are:

1. Structured: Organised data format with a fixed schema. Ex: RDBMS

2. Semi-Structured: Partially organised data which does not have a fixed format. Ex: XML, JSON

3. Unstructured: Unorganised data with an unknown schema. Ex: Audio, video files etc.

The core problem which it handles is -

Core is 3V-

  • Volume
  • Velocity
  • Variety 



It helps us on Analytics of prediction & perspective 


Hadoop implementation

  • Amazon EMR
  • MapR
  • Cloudera + Hortonworks - cloudera manager (CDH earlier and now CDP )
  • Azure BigInsights


Apache Oozie - used for workflow for Hadoop tack stacks 

Comments

Popular posts from this blog

Spark Cluster

DORA Metrics