preface
chapter 1 introduction
1.1 is pattern recognition important?
1.2 features, feature vectors, and classifiers
1.3 supervised versus unsupervised pattern
recognition
1.4 outline of the book
chapter classifiers based on bayes decision theory
2.1 introduction
2.2 bayes decision theory
2.3 discriminant functions and decision surfaces
2.4 bayesian classification for normal distributions
2.5 estimation of unknown probability density
functions
2.5.1 maximum likelihood parameter estimation
2.5.2 maximum a posteriori probability
estimation
2.5.3 bayesian inference
2.5.4 maximum entropy estimation
2.5.5 mixture models
2.5.6 nonparametric estimation
2.6 the nearest neighbor rule
chapter 3 linear classifiers
3.1 introduction
3.2 linear discriminant functions and decision
hyperplanes
3.3 the perceptron algorithm
3.4 least squares methods
3.4.1 mean square error estimation
3.4.2 stochastic approximation and the lms
algorithm
3.4.3 sum of error squares estimation
3.5 mean square estimation revisited
3.5.1 mean square error regression
3.5.2 mse estimates posterior class probabilities
3.5.3 the bias-variance dilemma
3.6 support vector machines
3.6.1 separable classes
3.6.2 nonseparable classes
chapter 4 nonlinear classifiers
4.1 introduction
4.2 the xor problem
4.3 the two-layer perceptron
4.3.1 classification capabilities of the two-layer
perceptron
4.4 three-layer perceptrons
4.5 algorithms based on exact classification of the
training set
4.6 the backpropagation algorithm
4.7 variations on the; backpropagation theme
4.8 the cost function choice
4.9 choice of the network size
4.10 a simulation example
4.11 networks with weight sharing
4.12 generalized linear classifiers
4.13 capacity of the/-dimensional space in linear
dichotomies
4.14 polynomial classifiers
4.15 radial basis function networks
4.16 universal approximators
4.17 support vector machines: the nonlinear case
4.18 decision trees
4.18.1 set of questions
4.18.2 splitting criterion
4.18.3 stop-splitting rule
4.18.4 class assignment rule
4.19 discussion
chapter 5 feature selection
5.1 introduction
5.2 preprocessing
5.2.1 outlier removal
5.2.2 data normalization
5.2.3 missing data
5.3 feature selection based on statistical hypothesis
testing
5.3.1 hypothesis testing basics
5.3.2 application of the t-test in feature
selection
5.4 the receiver operating characteristics croc curve
5.5 class separability measures
5.5.1 divergence
5.5.2 chernoff bound and
bhattacharyya distance
5.5.3 scatter matrices
5.6 feature subset selection
5.6.1 scalar feature selection
5.6.2 feature vector selection
5.7 optimal feature generation
5.8 neural networks and feature generation/selection
5.9 a hint on the vapnik--chemovenkis learning
theory
chapter 6 feature generation i: linear transforms
6.1 introduction
6.2 basis vectors and images
6.3 the karhunen-loeve transform
6.4 the singular value decomposition
6.5 independent component analysis
6.5.1 ica based on second- and fourth-order
cumulants
6.5.2 ica based on mutual information
6.5.3 an ica simulation example
6.6 the discrete fourier transform (dft)
6.6.1 one-dimensional dft
6.6.2 two-dimensional dft
6.7 the discrete cosine and sine transforms
6.8 the hadamard transform
6.9 the haar transform
6.10 the haar expansion revisited
6.11 discrete time wavelet transform (dtwt)
6.12 the multiresolution interpretation
6.13 wavelet packets
6.14 a look at two-dimensional generalizations
6.15 applications
chapter 7 feature generation ii
7.1 introduction
7.2 regional features
7.2.1 features for texture characterization
7.2.2 local linear transforms for texture
feature extraction
7.2.3 moments
7.2.4 parametric models
7.3 features for shape and size characterization
7.3.1 fourier features
7.3.2 chain codes
7.3.3 moment-based features
7.3.4 geometric features
7.4 a glimpse at fractals
7.4.1 self-similarity and fractal dimension
7.4.2 fractional brownian motion
chapter 8 template matching
8.1 introduction
8.2 measures based on optimal path searching
techniques
8.2.1 bellman's optimality principle and
dynamic programming
8.2.2 the edit distance
8.2.3 dynamic time warping in speech
recognition
8.3 measures based on correlations
8.4 deformable template models
chapter 9 context-dependent classification
9.1 introduction
9.2 the bayes classifier
9.3 markov chain models
9.4 the viterbi algorithm
9.5 channel equalization
9.6 hidden markov models
9.7 training markov models via neural networks
9.8 a discussion of markov random fields
chaptsr 10 system evaluation
10.1 introduction
10.2 error counting approach
10.3 exploiting the finite size of the data set
10.4 a case study from medical imaging
chapter 11 clustering: basic concepts
11.1 introduction
11.1.1 applications of cluster analysis
11.1.2 types of features
11.1.3 definitions of clustering
11.2 proximity measures
11.2.1 definitions
11.2.2 proximity measures between two points
11.2.3 proximity functions between a point and
a set
11.2.4 proximity functions between two sets
chapter 12 clustering algorithms i: sequential
algorithms
12.1 introduction
12.1.1 number of possible clusterings
12.2 categories of clustering algorithms
12.3 sequential clustering algorithms
12.3.1 estimation of the number of clusters
12.4 a modification of bsas
12.5 a two-threshold sequential scheme
12.6 refinement stages
12.7 neural network implementation
12.7.1 description of the architecture
12.7.2 implementation of the bsas algorithm
chapter 13 clustering algorithms ii: hierarchical
algorithms
13.1 introduction
13.2 agglomerative algorithms
13.2.1 definition of some useful quantities
13.2.2 agglomerative algorithms based on
matrix thetry
13.2.3 monotonicity and crossover
13.2.4 implementational issues
13.2.5 agglomerative algorithms based on
graph theory
13.2.6 ties in the proximity matrix
13.3 the cophenetic matrix
13.4 divisive algorithms
13.5 choice of the best number of clusters
chapter 14 clustering algorithms iii:
schemes based on function optimization
14.1 introduction
14.2 mixture decomposition schemes
14.2.1 compact and hyperellipsoidal clusters
14.2.2 a geometrical interpretation
14.3 fuzzy clustering algorithms
14.3.1 point representatives
14.3.2 quadric surfacesas representatives
14.3.3 hyperplane representatives
14.3.4 combining quadric and hyperplane
representatives
14.3.5 a geometrical interpretation
14.3.6 convergence aspects of the fuzzy
clustering algorithms
14.3.7 alternating cluster estimation
14.4 possibilistic clustering
14.4.1 the mode-seeking property
14.4.2 an alternative possibilistic scheme
14.5 hard clustering algorithms
14.5.1 the isodata or k-means or c-means
algorithm
14.6 vector quantization
chapter 15 clustering algorithms iv
15.1 introduction
15.2 clustering algorithms based on graph theory
15.2.1 minimum spanning tree algorithms
15.2.2 algorithms based on regions of influence
15.2.3 algorithms based on directed trees
15.3 competitive learning algorithms
15.3.1 basic competitive learning algorithm
15.3.2 leaky learning algorithm
15.3.3 conscientious competitive learning
algorithms
15.3.4 competitive learning-like algorithms
associated with cost functions
15.3.5 self-organizing maps
15.3.6 supervised learning vector quantization
15.4 branch and bound clustering algorithms
15.5 binary morphology clustering algorithms (bmcas)
15.5.1 discretization
15.5.2 morphological operations
15.5.3 determination of the clusters in a discrete
binary set
15.5.4 assignment of feature vectors to clusters
15.5.5 the algorithmic scheme
15.6 boundary detection algorithms
15.7 valley-seeking clustering algorithms
15.8 clustering via cost optimization (revisited)
15.8.1 simulated annealing
15.8.2 deterministic annealing
15.9 clustering using genetic algorithms
15.10 other clustering algorithms
chapter 16 cluster validity
16.1 introduction
16.2 hypothesis testing revisited
16.3 hypothesis testing in cluster validity
16.3.1 external criteria
16.3.2 internal criteria
16.4 relative criteria
16.4.1 hard clustering
16.4.2 fuzzy clustering
16.5 validity of individual clusters
16.5.1 external criteria
16.5.2 internal criteria
16.6 clustering tendency
16.6.1 tests for spatial randomness
appendix a
hints from probability and statistics
appendix b
linear algebra basics
appendix c
cost function optimization
appendix d
basic definitions from linear systems theory
index