Data Mining : Handling large Data set

Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems

Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.

Data mining involves exploring and analyzing large blocks of information to glean meaningful patterns and trends. It can be used in a variety of ways, such as database marketing, credit risk management, fraud detection, spam Email filtering, or even to discern the sentiment or opinion of users.

Type - 

Data mining has several types, including pictorial data mining, text mining, social media mining, web mining, and audio and video mining amongst others. Another example of Data Mining and Business Intelligence comes from the retail sector. Retailers segment customers into 'Recency, Frequency, Monetary' (RFM) groups and target marketing and promotions to those different groups.

·         

·        Data mining techniques:

·         Regression (predictive)

·         Association Rule Discovery (descriptive)

·         Classification (predictive)

·         Clustering (descriptive)

 


Data mining is a process that is used by an organization to turn the raw data into useful data. Utilizing software to find patterns in large data sets, organizations can learn more about their customers to develop more efficient business strategies, boost sales, and reduce costs.

 

Data mining has a lot of advantages when using in a specific industry. Besides those advantages, data mining also has its own disadvantages e.g., privacy, security, and misuse of information. To make use of it, we need to extract useful information from this mountain of data by digging through it, and looking for sense among the bytes. This is called data mining.

Data mining is a five-step process:

·      Identifying the source information

·      Picking the data points that need to be analyzed

·      Extracting the relevant information from the data

·      Identifying the key values from the extracted data set

·      Interpreting and reporting the results

 


Comments

Popular posts from this blog

Spark Cluster

DORA Metrics