Data mining is the process of extraction of available data and discovering new relations and connections in order to acquire new information and knowledge. It can be made use of in all business sectors. Data mining uses different methods and IT solutions.
Can you give me some examples of data mining? Why do we need it?
Every company using multiple applications or enterprise software solutions has their data scattered across multiple databases, multiple sources. Even if this is not the case and all data is stored at one place, it tend not to be cleaned (of a good quality) on one hand, and not to provide a quality information needed for decision-making on the other hand. In such cases, companies need to obtain some information by extracting them from different sources. Most often, data mining is used in sales and marketing, production, fraud detection, or potential risk detection.
- In marketing and sales, data mining is typically used for market segmentation or to estimate the subsequent purchasing behavior of customers. This allows, for instance, to target an advertising campaign. Thanks to data mining, customer behavior can be estimated, for example, by analyzing previous purchases of similar customers.
* Identifying customers who want to buy something * Customer segmentation (dividing customers into segments) * Shopping cart analysis * Identifying consumer behavior * Analysis of customers’ purchasing behavior * Tracking risks and risks development
- **In churn management, data mining is used to identify customers who might be able to leave to competitors. The behavior patterns of such customers are similar.
* Identifying customers who might be ready to go to competitors
In production, data mining is mainly used in the area of quality improvement since it allows to detect trouble spots. For example, data analysis may reveal sources of error or inefficiency in the production process.
Data mining can also help in the area of Risk Management and detection of frauds. Data analysis helps to identify connections and to predict future developments.
* Searching for links between products, people, bank accounts, movement of funds, and so on * Searching for fraudulent behavior and potential fraudsters * Estimate of future credit risk * Estimate of the risk of late payments
What is the data mining process?
It is a process of cleaning and analyzing data. However, the goals (what kind of information or material to be used for decision-making is expected) have to be clearly stated prior to the beginning of the data mining process. All the rest of phases stated below will be adapted to this goal.
- Identifying data sources will allow to define where the sources of business data are stored and how they will be made use of in the data mining process in order to achieve the goals
- Data preparation is an indispensable phase for further data processing, it includes preparation of the sources, extraction of data from the data sources, and preparation of data for data mining tool
- Data cleaning is mostly done in a specialized tool and aims for data consolidation and preparation for analytical processing
- Data analysis starts by making choice of method and model of analysis, continues by parameterization of mathematical models and, finally, ends by data analysis
- Data presentation is the last stage where the analyzed data is appropriately prepared for presentation to its consumers so that they can make the right decisions
What are the methods used in data mining?
Data mining involves a wide range of methods and ways of working. Basic models include the SEMMA and CRISP-DM.