Part One Introduction
Introduction
1.1 Basic Dam Mining Tasks
1.1.1 C1aSSi6cation
1.1.2 Regression
1.1.3 Time Series Analysis
1.1.4 Prediction
1.1.5 Clustering
1.1.6 Summarization
1.1.7 Association Rules
1.1.8 Sequence Discovery
1.2 Data Mining Versus Knowledge Discovery in Databases
1.2.1 The Development Of Data Mining
1.3 Dam Mining Issues
1.4 Data Mining Metrics
1.5 Social Implications Of Data Mining
1.6 Dam Mining from a Database Perspec6ve
1.7 The Future
1.8 Exercises
1.9 Bibliographic Notes
2 Related Concepts
2.1 Database/OLTP Systems
2.2 Fuzzy Sets and Fuzzy Logic
2.3 1nformiton Retrieval
2.4 Decision Support Systems
2.5 Dimensional MOdenn9
2.5.1 Multidimensional Schemas
2.5.2 1ndexing
2.6 Data Warehousing
2.7 OLAP.
2.8 Web Search Engines
2.9 Statistics
2.10 Machine Leaning
2.11 Pattern Matching
2.12 Summary
2.13 Exercises
2.14 Bibliographic Notes
3 Data Mining Techniques
3.1 Introduction
3.2 A Statistical Perspective on Data Mining
3.2.1 Point Estimation
3.2.2 Models Based on Summarization
3.2.3 Bayes Theorem
3.2.4 Hypothesis Testing
3.2.5 Regression and Correlation
3.3 Similarity Measures
3.4 Decision Trees
3.5 Neural Networks
3.5.1 Activation Functions
3.6 Genetic Algorithms
3.7 Exercises
3.8 Bibliographic Notes
Part Two Core Topics
4 Classification
4.1 Introduction
4.1.1 Issues in Classification
4.2 Statistical-Based Algorithms
4.2.1 Regression
4.2.2 Bayesian Classification
4.3 Distance-Based Algorithms
4.3.1 Simple Approach
4.3.2 K Nearest Neighbors
4.4 Decision Tree-Based Algorithms
4.4.1 ID3
4.4.2 C4.5 and C5.0
4.4.3 CART
4.4.4 Scalable DT Techniques
4.5 Neural Network—Based Algorithms
4.5.1 Propagation
4.5.2 NN Supervised Learning
4.5.3 Radial Basis Function Networks
4.5.4 Perceptrons
4.6 Rule-Based Algorithms
4.6.1 Generating Rules from a DT
4.6.2 Generating Rules from a Neural Net
4.6.3 Generating Rules Without a DT or NN
4.7 Combining Techniques
4.8 Summary
4.9 Exercises
4.10 Bibliographic Notes
5 Clustering
5.1 Introduction
5.2 Similarity and Distance Measures
5.3 Outliers
5.4 Hierarchical Algorithms
5.4.1 Agglomerative Algorithms
5.4.2 Divisive Clustering
5.5 Partitional Algorithms
5.5.1 Minimum Spanning Tree
5.5.2 Squared Error Clustering Algorithm
5.5.3 K-Means Clustering
5.5.4 Nearest Neighbor Algorithm
5.5.5 PAM Algorithm
5.5.6 Bond Energy Algorithm
5.5.7 Clustering with Genetic Algorithms
5.5.8 Clustering with Neural Networks
5.6 Clustering Large Databases
5.6.1 BIRCH
5.6.2 DBSCAN
5.6.3 CURE Algorithm
5.7 Clustering with Categorical Attributes
5.8 Comparison
5.9 Exercises
5.10 Bibliographic Notes
6 Association Rules
6.1 Introduction
6.2 Large Itemsets
6.3 Basic Algorithms
6.3.1 Apriori Algorithm
6.3.2 Sampling Algorithm
6.3.3 Partitioning
6.4 Parallel and Distributed Algorithms
6.4.1 Data Parallelism
6.4.2 Task Parallelism
6.5 Comparing Approaches
6.6 Incremental Rules
6.7 Advanced Association Rule Techniques
6.7.1 Generalized Association Rules
6.7.2 Multiple-Level Association Rules
6.7.3 Quantitative Association Rules
6.7.4 Using Multiple Minimum Supports
6.7.5 Correlation Rules
6.8 Measuring the Quality of Rules
6.9 Exercises
6.10 Bibliographic Notes
Part Three Advanced Topics
7 Web Mining
7.1 Introduction
7.2 Web Content Mining
7.2.1 Crawlers
7.2.2 Harvest System
7.2.3 Virtual Web View
7.2.4 Personalization
7.3 Web Structure Mining
7.3.1 Page Rank
7.3,2 Clever
7.4 Web Usage Mining
7.4,1 Preprocessing
7.4.2 Data Structures
7,4.3 Pattern Discovery
7.4.4 Pattern Analysis
7.5 Exercises
7.6 Bibliographic Notes
8 Spatial Mining
8.1 Introduction
8.2 Spatial Data Overview
8.2.1 Spatial Queries
8.2.2 Spatial Data Structures
8.2.3 Thematic Maps
8.2.4 Image Databases
8.3 Spatial Data Mining Primitives
8.4 Generalization and Specialization
8.4.1 Progressive Refinement
8.4.2 Generalization
8.4.3 Nearest Neighbor
8.4.4 STING
8.5 Spatial Rules
8.5.1 Spatial Association Rules
8.6 Spatial Classification Algorithm
8.6.1 ID3 Extension
8,6.2 Spatial Decision Tree
8.7 Spatial Clustering Algorithms
8.7,1 CLARANS Extensions
8.7.2 SD(CLARANS)
8.7.3 DBCLASD
8.7.4 BANG
8.7.5 Wave Cluster
8.7.6 Approximation
8.8 Exercises
8.9 Bibliographic Notes
9 Temporal Mining
9.1 Introduction
9.2 Modeling Temporal Events
9.3 Time Series
9.3.1 Time Series Analysis
9.3.2 Trend Analysis
9.3.3 Transformation
9.3.4 Similarity
9.3.5 Prediction
9.4 Pattern Detection
9.4.1 String Matching
9.5 Sequences
9.5.1 AprioriAll
9.5.2 SPADE
9.5.3 Generalization
9.5.4 Feature Extraction
9.6 Temporal Association Rules
9.6.1 Intertransaction Rules
9.6.2 Episode Rules
9.6.3 Trend Dependencies
9.6.4 Sequence Association Rules
9.6.5 Calendric Association Rules
9.7 Exercises
9.8 Bibliographic Notes
APPENDICES
A Data Mining Products
A.1 Bibliographic Notes
B Bibliography
Index
About the Author