Posts

Data Science with BIGDATA

Image
                                        Data Science -   

Machine Learning using Spark

Image
  Machine Learning is  part of a broader umbrella known as Artificial Intelligence . Machine learning refers to the  study of statistical models to solve specific problems  with patterns and inferences. These models are “trained” for the specific problem by the means of training data drawn from the problem space. Category Supervised learning   works with a set of data that contains both the inputs and the desired output  — for instance, a data set containing various characteristics of a property and the expected rental income. Supervised learning is further divided into two broad sub-categories called classification and regression:           Classification algorithms are related to categorical output, like whether a property is occupied or not      Regression algorithms are related to a continuous output range, like the value of a property Unsupervised learning , on the other hand,   works with a set of data...

Learning Spark

Image
A pache Spark  is one of the most famous library for big data processing engine. It  is a framework for real time data analytics in a distributed computing environment. The Spark is written in Scala and was originally developed at the University of California, Berkeley. It executes in-memory computations to increase speed of data processing over Map-Reduce.It is 100x faster than Hadoop for large scale data processing by exploiting in-memory computations and other optimizations. Therefore, it requires high processing power than Map-Reduce. Spark comes with high-level libraries which including support for R, SQL, Python, Scala, Java etc.  These standard libraries increase the seamless integrations in complex workflow. Over this, it also  allows various sets of services to integrate with it like MLlib, GraphX, SQL + Data Frames, Streaming services etc  to increase its capabilities. RDD is a fundamental data structure of Spark.  ■ It is an immutable d...