Preface
Acknowledgments
1 Introduction
1.1 Well-Posed Learning Problems
1.2 Designing a Learning System
1.2.1 Choosing the Training Experience
1.2.2 Choosing the Target Function
1.2.3 Choosing a Representation for the Target Function
1.2.4 Choosing a Function Approximation Algorithm
1.2.5 The Final Design
1.3 Perspectives and Issues in Machine Learning
1.3.1 Issues in Machine Learning
1.4 How to Read This Book
1.5 Summary and Further Reading
Exercises
References
2 Concept Learning and the General-to-Specific Ordering
2.1 Introduction
2.2 A Concept Learning Task
2.2.1 Notation
2.2.2 The Inductive Learning Hypothesis
2.3 Concept Learning as Search
2.3.1 General-to-Specific Ordering of Hypotheses
2.4 FIND-S: Finding a Maximally Specific Hypothesis
2.5 Version Spaces and the CANDIDATE-ELIMINATION
Algorithm
2.5.1 Representation
2.5.2 The LIST-THEN-ELIMINATE Algorithm
2.5.3 A More Compact Representation for Version Spaces
2.5.4 CANDIDATE-ELIMINATION Lsarning Algorithm
2.5.5 AR Illustrative Example
2.6 Remarks on Version Spaces and CANOIDATE-ELIMINATION
2.6.1 Will the CANDIDATE-ELIMINATION Algorithm
Converge to the Correct Hypothesis?
2.6.2 What Training Example Should the Leamer Request
Next?
2.6.3 How Can Panially Leamed Concepts Be Used?
2.7 Inductive Bias
2.7.1 A Biased Hypothesis Space
2.7.2 An Unbiased Learner
2.7.3 The Futility of Bias-Free Learning
2.8 Summary and Further Reading
Excercises
References
3 Decision Tree Learning
3.1 Introduction
3.2 Decision Tree Representation
3.3 Appropriate problems for Decision Tree Learning
3.4 The Basic Decision Tree Leaming Algorithm
3.4.1 Which Attribute Is the Best Classifier?
3.4.2 An Illustrative Example
3.5 Hypothesis Space Search in Decision Tree Leaming
3.6 Inductive Bias in Decision Tree Learning
3.6.1 Restriction Biases and Preference Biases
3.6.2 Why Prefer Shon Hypotheses?
3.7 Issues in Decision Tree Lsarning
3.7.1 Avoiding Overfitting the Data
3.7.2 Incorporating Continuous-Valued Attributes
3.7.3 Alternative Measures for Selecting Attributes
3.7.4 Handling Training Examples witll Missing Attribute
Values
3.7.5 Handling Attributes with Differing Costs
3.8 Summary and Further Reading
Exercises
References
4 Artificial Neural Networks
4.1 Introduction
4.1.1 Biological Motivation
4.2 Neural Network Representations
4.3 Appropriate Ptoblems for Neural Network Learning
4.4 Perceptrons
4.4.1 Represenational Power of Perceptrons
4.4.2 The Perceptron Training Rule
4.4.3 Gradient Descent and the Delta Rule
4.4.4 Remarks
4.5 Multilayer Networks and the BACKPROPAOATION Algorithm
4.5.1 A Differentiable Threshold Unit
4.5.2 The BACKPROPAGAUON Algorithm
4.5.3 Derivation of the BACKPROPAGATION Rule
4.6 Remarks on the BACKPROPAGATION Algorithm
4.6.1 Convergence and Local Minima
4.6.2 Representational Power of Feedforward Networks
4.6.3 Hypothesis Space Search and Inductive Bias
4.6.4 Hidden Layer Representations
4.6.5 Generalization, Overfitting, and Stopping Criterion
4.7 An Illusuative Example: Face Recognition
4.7.1 The Task
4.7.2 Design Choices
4.7.3 Lsarned Hidden Representations
4.8 Advanced Topics in Artificial Neural Networks
4.8.1 Altemative Error Functions
4.8.2 Altemative Error Minimization Procedures
4.8.3 Recument Networks
4.8.4 Dynamically Modifying Network Structure
4.9 Summary and Further Reading
Exercises
References
5 Evaluating Hypotheses
5.1 Motivation
5.2 Estimating Hypothesis Accuracy
5.2.1 Sample Error and True Error
5.2.2 Confidence Intervals for Discrete-Valued Hypotheses
5.3 Basics of Sampling Theory
5.3.1 Error Estimation and Estimating Binomial Proportions
5.3.2 The Binomial Distribution
5.3.3 Mean and Variance
5.3.4 Estimators, Bias, altd Variance
5.3.5 Confidence Intervals
5.3.6 Two-Sided and One-Sided Bounds
5.4 A General Approach for Deriving Confidence Intervals
5.4.1 Central Limit Theorem
5.5 Difference in Error of Two Hypotheses
5.5.1 Hypothesis Testing
5.6 Comparing Learning Algorithms
5.6.1 Paired t Tests
5.6.2 Practical Considerations
5.7 Summary and Further Reading
Exercises
References
6 Bayesian Learning
6.1 Introduction
6.2 Bayes Theorem
6.2.1 An Example
6.3 Bayes Theorem and Concept Learning
6.3.1 Brute-Force Bayes Concept Learning
6.3.2 MAP Hypotheses and Consistent Lsarners
6.4 Maximum Likelihood and Least-Squared Error Hypotheses
6.5 Maximum Likelihood Hypotheses for Predicting Probabilities
6.5.1 Gradient Search to Maximize Likelihood in a Neural Net
6.6 Minimum Description Length Principle
6.7 Bayes Optimal Classifier
6.8 Gibbs Algorithm
6.9 Naive Bayes Classifier
6.9.1 An Illustrative Example
6.10 An Example: Learning to Classify Text
6.10.1 Experimental Results
6.11 Bayesian Belief Networks
6.11.1 Conditional Independence
6.11.2 Representation
6.11.3 Inference
6.11.4 Leaming Bayesian Belief Networks
6.11.5 Gradient Ascent Training of Bayesian Networks
6.11.6 Leanling the Suucture of Bayesian Networks
6.12 The EM Algorithm
6.12.1 Estimating Means of k Gaussians
6.12.2 Oeneral Statement of EM Algorithm
6.12.3 Derivation of the k Means Algorithm
6.13 Summary and Further Reading
Exercises
References
7 Computational Leaming Theory
7.1 Introduction
7.2 Probably Learning an Approximately Correct Hypothesis
7.2.1 The Problem Setting
7.2.2 Error of a Hypothesis
7.2.3 PAC Leamability
7.3 Sample Complexity for Finite Hypothesis Spaces
7.3.1 Agnostic Leaming and Inconsistent Hypotheses
7.3.2 Conjunctions of Boolean Literals Are PAC-Learnable
7.3.3 PAC-Learnability of Other Concept Classes
7.4 Sample Complexity for Infinite Hypothesis Spaces
7.4.1 Shattering a Set of Instances
7.4.2 The Vapnik-Chervonenkis Dimension
7.4.3 Sample Complexity and the VC Dimension
7.4.4 VC Dimension for Neural Networks
7.5 The Mistake Bound Model of Learning
7.5.1 Mistake Bound for the FIND-S Algorithm
7.5.2 Mistake Bound for the HALVING Algorithm
7.5.3 Optimal Mistake Bounds
7.5.4 WEIGHTED-MAJORITY Algorithm
7.6 Summary and Further Reading
Exercises
References
8 Instance-Based Learning
8.1 Introduction
8.2 k-NEAREST NEIGHBOR LEARNING
8.2.1 Distance-Weighted NEAREST NEIGHBOR Algorithm
8.2.2 Remarks on k-NEAREST NEIGHBOR Algorithm
8.2.3 A Note on Terminology
8.3 Locally Weighted Regression
8.3.1 Locally Weighted Linear Regression
8.3.2 Remarks on Locally Weighted Regression
8.4 Radial Basis Functions
8.5 Case-Based Reasoning
8.6 Remarks on Lazy and Eager Learning
8.7 Summary an
Exercises
References
9 Genetic Algorithms
9.1 Modvadon
9.2 Genetic Algorithms
9.2.1 Representing Hypotheses
9.2.2 Genetic Operators
9.2.3 Fialess Function and Selection
9.3 An Illusuative Example
9.3.1 Extensions
9.4 Hypothesis Space Search
9.4.1 Population Evolution and the Schema Theorem
9.5 Oenetic Programming
9.5.1 Representing Programs
9.5.2 Illustrative Example
9.5.3 Remarks on Genetic Programming
9.6 Models of Evolution and Learning
9.6.1 Lamarckian Evolution
9.6.2 Baldwin Effect
9.7 Parallelizing Genetic Algorithms
9.8 Summary and Furaler Reading
Exercises
References
10 Learning Sets of Rules
10.1 Introduction
10.2 Sequential Covering Algorithms
10.2.1 General to Specific Beam Search
10.2.2 Variations
10.3 Learning Rule Sets: Summary
10.4 Learning First-Order Rules
10.4.1 First-Order Horn Clauses
10.4.2 Terminology
10.5 Learning Sets of First-Order Rules: FOIL
10.5.1 Generating Candidate Specializations in FOIL
10.5.2 Guiding the Search in FOIL
10.5.3 Learning Recursive Rule Sets
10.5.4 Summary of FOIL
10.e Induction as Invened Deduction
10.7 Inverting Resolution
10.7.1 First-Order Resolution
10.7.2 Inverting Resolution: First-Order Case
10.7.3 Summary of Inverse Resolution
10.7.4 Generalization, 0-Subsumption, and Entailment
10.7.5 PROGOL
10.8 Summary and Further Reading
Exercises
References
11 Analytical Leaming
11.1 Introduction
11.1.1 Inductive and Analytical Leaming Problems
11.2 Learning with Perfect Domain Theories: PROLOG-EBG
11.2.1 An Illustrative Trace
11.3 Remarks on Explanation-Based Learning
11.3.1 Discovering New Features
11.3.2 Deductive Learning
11.3.3 Inductive Bias in Explanation-Based Learning
11.3.4 Knowledge Level Learning
11.4 Explanation-Based Learning of Search Control Knowledge
11.5 Summary and Further Reading
Exercises
References
12 Combining Inductive and Analytical Learning
12.1 Motivation
12.2 Inductive-Analytical Approaches to Learning
12.2.1 The Learning Problem
12.2.2 Hypothesis Space Search
12.3 Using Prior Knowledge to lnitialize the Hypothesis
12.3.1 The KBANN Algorithm
12.3.2 An Illustrative Example
12.3.3 Remarks
12.4 Using prior Knowledge to Alter the Search Objective
12.4.1 The TANGENTPROP Algorithm
12.4.2 An Illustrative Example
12.4.3 Remarks
12.4.4 The EBNN Algorithm
12.4.5 Remarks
12.5 Using prior Knowledge to Augment Search
Operators
12.5.1 The FOCL Algorithm
12.5.2 Remarks
12.6 State of the Art
12.7 Summary and Further Reading
Exercises
References
13 Reinforcement Learning
13.1 Introduction
13.2 The Learning Task
13.3 Q Learning
13.3.1 The Q Function
13.3.2 An Algorithm for Learning Q
13.3.3 An Illustrative Example
13.3.4 Convergence
13.3.5 Experimentation Strategies
13.3.6 Updating Sequence
13.4 Nondeterministic Rewards and Actions
13.5 Temporal Difference Learning
13.6 Oeneralizing from Examples
13.7 Relationship to Dynamic Pro
13.8 Summary and Further Reading
Exercises
References
Appendix Notation
Indexes
Author Index
Subject Index