Have you ever heard of the term Data Mining? Maybe you have also heard of data science. So what are the definitions and methods of data mining, what are their functions and what are the stages in data mining? We will discuss everything in more detail here.
Table of Contents
- What is Data Mining?
- Data Mining Functions
- Data Mining Methods
- The problem in Data Mining
- Example of Application of Data Mining
What is Data Mining?
Data mining is a process of dredging or collecting important information from large data. The data mining process often uses statistical methods, and mathematics, and makes use of artificial intelligence technology.
Alternative names are Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, and others.
Many concepts and techniques are used in the data mining process, as seen in the pictures of the KDD process. The process requires several steps to get the desired data.
The KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation.
Data Mining Functions
Data mining has many functions. For the main function itself, there are two; Namely descriptive function and predictive function. Other functions will be discussed below
Descriptive
The description function is a function to understand more about the observed data. By carrying out a process, you can hope to find out the behavior of the data and use it to determine the characteristics of the data in question.
By using the descriptive function, later you can find certain patterns hidden in data. The characteristics of the data can be known if it repeats and has value.
Predictive
The prediction function is a function of how a process will later find a certain pattern from data. The various variables in the data can reveal these patterns.
When someone has found a pattern, they can use the obtained pattern to predict other variables whose value or type is unknown.
This function is classified as a predictive function and utilized for predictive analysis due to its capability to forecast variables that are not present in the data.
So this function makes it easy and profitable for anyone who needs accurate predictions to make these important things better.
Other functions are characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
- Multidimensional concept description, Characterization, and discrimination, Or serves to generalize, summarize, and distinguish data characteristics, etc.
- Frequent patterns, associations, correlations
- Classification and prediction, Building models (functions) that describe and distinguish classes or concepts for future predictions. For example, Classify countries by (climate), or classify cars by (gas mileage)
- Cluster analysis, Group data to form new classes. For example, Maximize intra-class similarity & minimize inter-class similarity
- Outlier analysis, Data objects that don’t fit the general behavior of the data, Useful in fraud detection, and rare event analysis.
- Trend and evolution analysis, Trend and deviation: eg Regression analysis or Mining Sequential pattern mining: eg Digital camera, or Periodicity analysis and Similarity-based analysis.
- Other pattern-directed or statistical analyses
Data Mining Methods
In collecting information, of course, there are methods, these methods will assist in the process of finding data. Data mining provides planning from idea to final implementation.
Data Collection Process
How is the data collection process? Earlier, it was explained about KDD or Knowledge discovery (mining) in databases. With this KDD, you can carry out the data retrieval process.
The process or stages start from raw data and end with processed knowledge or information. So, the process is as follows:
- Data Cleansing is a Process in which incomplete, error-ridden, and inconsistent data is removed from the data collection. Also know data lifecycle management to know about data processing.
- Data Integration, Data integration process in which iterative data will be combined.
- Selection is the process of selecting or selecting data that is relevant to the analysis to be received from existing data collections.
- Data Transformation, The process of transforming data that has been selected into the form of a mining procedure through methods and data aggression.
- Data Mining, The most important process where various techniques will be applied to extract various potential patterns to obtain useful data.
- Pattern Evolution is a process in which interesting patterns that have previously been found are identified based on the measure that has been given
- Knowledge Presentation is the final stage of the process. In this case, visualization techniques are used which aim to assist the user in understanding and interpreting the results of data mining.
Techniques In The Data Mining Process
What are the techniques that can be used in the data mining process?
- Predictive Modeling, There are two techniques namely Classification and Value Prediction
- Database Segmentation, Partitioning the database into several segments, clusters, or the same record
- Link analysis A technique for establishing relationships between individual records or groups of records in a database.
- Deviation detection is a technique for identifying outliers that express a deviation from previously known expectations.
- Nearest Neighbor, which is a technique that predicts grouping, this technique itself is the oldest technique used in data mining.
- Clustering is a technique for classifying data based on the criteria of each data.
- Decision Tree, Is a next-generation technique, where this technique is a predictive model that can be described as a tree. Each node contained in the tree structure represents a question that is used to classify data.
The problem in Data Mining
It is not an easy matter to collect information and perform data mining which will be useful in the future. When doing data mining, one can encounter a lot of problems.
One problem is related to the reliability or durability of the hardware or VPS server used to process data mining. Choosing the server is an important point to consider since it is related to the speed of data processing.
Mining Methodology
- Mining different kinds of knowledge from different types of data
- Performance: efficiency, effectiveness, and scalability
- Pattern evaluation: the problem of attraction
- Include background knowledge
- Handle noise and incomplete data
- Parallel, distributed, and incremental mining methods
- Integration of discovered knowledge with existing ones: knowledge fusion
User interactions
- Data mining query language and ad-hoc mining
- Expression and visualization of data mining results
- Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
- Domain-specific data mining & invisible data mining
- Protection of data security, integrity, and privacy
Example of Application of Data Mining
Various sectors, such as business, management, finance, and others, can use data mining. The following are examples of the application in several sectors:
1. Market Analysis and Management
The marketing sector typically uses data mining for target marketing, customer relationship management (CRM), market analysis, cross-selling, and market segmentation.
- Target Marketing, For example finding “model” customer groups that share the same characteristics: interests, income levels, shopping habits, etc., or determining customer buying patterns over time.
- Market traffic analysis, Find relationships/relationships between sales products, & predictions based on these associations.
- Customer profiling What types of customers buy what products (grouping or classification)
- Analysis of customer needs, E.g. identification of the best product for different customer groups, Predicting what factors will attract new customers, Provision of summary information, Multidimensional summary reports, Statistical summary information (data center trends and variations)
2. Corporate Analysis & Risk Management
Companies usually use data mining for prediction, customer retention, improved underwriting, quality control, and competitive analysis.
- Financial planning and asset evaluation, E.g. cash flow analysis and prediction, contingent claims analysis to evaluate assets, cross-sectional and time series analysis (financial ratios, trend analysis, etc.)
- Planning Resource planning, for example, summarizing and comparing resources and expenditures
- Competition, for example monitoring competitors and market direction, classifying customers and class-based pricing procedures, and setting pricing strategies in highly competitive markets.
3. Fraud Detection & Mining Unusual Patterns
Data mining also functions to find and detect fraud in a system. By using mini data, you will be able to see the millions of incoming transactions.
- Approach: Clustering & model construction for fraud, outlier analysis
- Applications: Healthcare, retail, credit card services, telecom. For example, auto insurance, money laundering, health insurance, telecommunications, analysis of patterns that deviate from the expected norm, retail industry, etc.
That’s some information about data mining, you can learn about data mining to get and collect useful information/data for the future.