DataScience

Posts

Machine Learning is part of a broader umbrella known as Artificial Intelligence . Machine learning refers to the study of statistical models to solve specific problems with patterns and inferences. These models are “trained” for the specific problem by the means of training data drawn from the problem space. Category Supervised learning works with a set of data that contains both the inputs and the desired output — for instance, a data set containing various characteristics of a property and the expected rental income. Supervised learning is further divided into two broad sub-categories called classification and regression: Classification algorithms are related to categorical output, like whether a property is occupied or not Regression algorithms are related to a continuous output range, like the value of a property Unsupervised learning , on the other hand, works with a set of data...

Learning Spark

- May 17, 2020

A pache Spark is one of the most famous library for big data processing engine. It is a framework for real time data analytics in a distributed computing environment. The Spark is written in Scala and was originally developed at the University of California, Berkeley. It executes in-memory computations to increase speed of data processing over Map-Reduce.It is 100x faster than Hadoop for large scale data processing by exploiting in-memory computations and other optimizations. Therefore, it requires high processing power than Map-Reduce. Spark comes with high-level libraries which including support for R, SQL, Python, Scala, Java etc. These standard libraries increase the seamless integrations in complex workflow. Over this, it also allows various sets of services to integrate with it like MLlib, GraphX, SQL + Data Frames, Streaming services etc to increase its capabilities. RDD is a fundamental data structure of Spark. ■ It is an immutable d...

Search This Blog

DataScience

Posts

Data Science with BIGDATA

Machine Learning using Spark

Learning Spark