Forewordxix
Prefacexxi
Part1.Foundation
Chapter1.IntroductiontoDataWarehousing
1.1WhyAlltheExcitement?
1.2TheNeedforDataWarehousing
1.3ParadigmShift
1.3.1ComputingParadigm
1.3.2BusinessParadigm
1.4BusinessProblemDefinition
1.5OperationalandInformationalDataStores*
1.6DataWarehouseDefinitionandCharacteristics
1.7DataWarehouseArchitecture
1.8ChapterSummary
Chapter2.Client/ServerComputingModelandDataWarehousing
2.1OverviewofClient/ServerArchitecture
2.1.1Host-BasedProcessing
2.1.2Master-SlaveProcessing
2.1.3First-GenerationClient/ServerProcessing
2.1.4Second-GenerationClient/ServerProcessing
2.2ServerSpecializationinClient/ServerComputingEnvironments
2.3ServerFunctions
2.4ServerHardwareArchitecture
2.5SystemConsiderations
2.6RISCversusClSC
2.7MultiprocessorSystems
2.7.1SMPDesign
2.7.2SMPFeatures
2.7.3SMPOperatingSystems
2.8SMPImplementations
Chapter3.ParallelProcessorsandClusterSystems
3.1Distributed-MemoryArchitecture
3.1.1Shared-NothingArchitectures
3.1.2Shared-DiskSystems
3.2ResearchIssues
3.3ClusterSystems
3.4AdvancesinMultiprocessingArchitectures
3.5OptimalHardwareArchitectureforQueryScalability*
3.5.1UniformityofDataAccessTimes
3.5.2SystemArchitectureTaxonomyandQueryExecution
3.6ServerOperatingSystems
3.6.1OperatingSystemRequirements
3.6.2MicrokernelTechnology
3.7OperatingSystemImplementations
3.7.1UNIX
3.7.2Windows/NT
3.7.3OS/2
3.7.4NetWare
3.7.5OSSummary
Chapter4.DistributedDBMSImplementations
4.1ImplementationTrendsandFeaturesofDistributedClient/ServerDBMS
4.1.1RDBMSArchitectureforScalability
4.1.2RDBMSPerformanceandEfficiencyFeatures
4.1.3TypesofParallelism
4.2DBMSConnectivity
4.3AdvancedRDBMSFeatures
4.4RDBMSReliabilityandAvailability
4.4.1Robustness,TransactionsRecovery,andConsistency
4.4.2FaultTolerance
4.5RDBMSAdministration
Chapter5.Client/ServerRDBMSSolutions
5.1State-of-the-MarketOverview
5.2Oracle
5.2.1SystemManagement
5.2.2OracleUniversalServer
5.2.3OracleConTextOption
5.2.4OracleSpatialDataOption
5.3Informix
5.3.1Features
5.3.2InformixUniversalServer
5.4Sybase
5.4.1SYBASESQLServer
5.4.2PerformanceImprovementsinSYBASESystem11
5.5IBM
5.5.1Background
5.5.2DB2UniversalDatabase
5.6Microsoft
5.6.1Background
5.6.2MSSQLServer
5.6.3DataWarehousingandMarketPositioning
Part2DataWarehousing
Chapter6.DataWarehousingComponents
6.1OverallArchitecture
6.2DataWarehouseDatabase
6.3Sourcing,Acquisition,Cleanup,andTransformationTools
6.4Metadata
6.5AccessTools
6.5.1QueryandReportingTools
6.5.2Applications
6.5.3OLAP
6.5.4DataMining
6.5.5DataVisualization
6.6DataMarts
6.7DataWarehouseAdministrationandManagement
6.8InformationDeliverySystem
Chapter7.BuildingaDatawarehouse
7.1BusinessConsiderations:ReturnonInvestment
7.1.1Approach
7.1.2OrganizationalIssues
7.2DesignConsiderations
7.2.1DataContent
7.2.2Metadata
7.2.3DataDistribution
7.2.4Tools
7.2.5PerformanceConsiderations
7.2.6NineDecisionsintheDesignofaDataWarehouse
7.3TechnicalConsiderations
7.3.1HardwarePlatforms
7.3.2DataWarehouseandDBMSSpecialization
7.3.3CommunicationsInfrastructure
7.4ImplementationConsiderations
7.4.1AccessTools
7.4.2DataExtraction,Cleanup,Transformation,andMigration
7.4.3DataPlacementStrategies
7.4.4Metadata
7.4.5UserSophisticationLevels
7.5IntegratedSolutions
7.6BenefitsofDataWarehousing
7.6.1TangibleBenefits
7.6.2IntangibleBenefits
Chapter8.MappingtheDataWarehousetoaMultiprocessorArchitecture
8.1RelationalDatabaseTechnologyforDataWarehouse
8.1.1TypesofParallelism
8.1.2DataPartitioning
8.2DatabaseArchitecturesforParallelProcessing
8.2.1Shared-MemoryArchitecture
8.2.2Shared-DiskArchitecture
8.2.3Shared-NothingArchitecture
8.2.4CombinedArchitecture
8.3ParallelRDBMSFeatures
8.4AlternativeTechnologies
8.5ParallelDBMSVendors
8.5.1Oracle
8.5.2Informix
8.5.3IBM
8.5.4Sybase
8.5.5Microsoft
8.5.6OtherRDBMSProducts
8.5.7SpecializedDatabaseProducts
Chapter9.DBMSSchemasforDecisionSupport
9.1DataLayoutforBestAccess
9.2MultidimensionalDataModel
9.3StarSchema
9.3.1DBAViewpoint
9.3.2PotentialPerformanceProblemswithStarSchemas
9.3.3SolutionstoPerformanceProblems
9.4STARjoinandSTARindex
9.5BitmappedIndexing
9.5.1SYBASEIQ
9.5.2Conclusion
9.6ColumnLocalStorage
9.7ComplexDataTypes
Chapter10.DataExtraction,Cleanup,andTransformationTools
10.1ToolRequirements
10.2VendorApproaches
10.3AccesstoLegacyData
10.4VendorSolutions
10.4.1PrismSolutions
10.4.2SASInstitute
10.4.3CarletonCorporation'sPassportandMetaCenter
10.4.4ValityCorporation
10.4.5EvolutionaryTechnologies
10.4.6InformationBuilders
10.5TransformationEngines
10.5.1Informatica
10.5.2Constellar
Chapter11.Metadata
11.1MetadataDefined
11.2MetadataInterchangeInitiative
11.3MetadataRepository
11.4MetadataManagement
11.5ImplementationExamples
11.5.1PlatinumRepository
11.5.2R&O:TheROCHADERepository
11.5.3PrismSolutions
11.5.4LogicWorksUniversalDirectory
11.6MetadataTrends
Part3.BusinessAnalysis
Chapter12.ReportingandQueryToolsandApplications
12.1ToolCategories
12.1.1ReportingTools
12.1.2ManagedQueryTools
12.1.3ExecutiveInformationSystemTools
12.1.4OLAPTools
12.1.5DataMiningTools
12.2TheNeedforApplications
12.3CognosImpromptu
12.4Applications
12.4.1PowerBuilder
12.4.2Forte
12.4.3InformationBuilders
Chapter13.On-LineAnalyticalProcessing(OLAP)
13.1NeedforOLAP
13.2MultidimensionalDataModel
13.3OLAPGuidelines
13.4MultidimensionalversusMulfirelationalOLAP
13.5CategorizationofOLAPTools
13.5.1MOLAP
13.5.2ROLAP
13.5.3ManagedQueryEnvironment(MQE)
13.6StateoftheMarket
13.6.1CognosPowerPlay
13.6.2IBIFOCUSFusion
13.6.3PilotSoftware
13.7OLAPToolsandtheInternet
13.8Conclusion
Chapter14.PatternsandModels
14.1Definitions
14.1.1WhatIsaPattern?WhatIsaModel?
14.1.2VisualizingaPattern
14.2ANoteonTerminology
14.3WhereAreModelsUsed?
14.3.1Problem1:Selection
14.3.2Problem2:Acquisition
14.3.3Problem3:Retention
14.3.4Problem4:Extension
14.4WhatIsthe"Right"Model?
14.4.1ThePerfectModel
14.4.2MissingData
14.5Sampling
14.5.1TheNecessityofSampling
14.5.2RandomSampling
14.6ExperimentalDesign
14.6.1AvoidingBias
14.6.2MoreonSampling
14.7Computer-IntensiveStatistics
14.7.1Cross-validation
14.7.2JackknifeandBootstrapResampling
14.8PickingtheBestModel
Chapter15.Statistics
15.1Data,Counting,andProbability
15.1.1Histograms
15.1.2TypesofCategoricalPredictors
15.1.3Probability
15.1.4Bayes'Theorem
15.1.5Independence
15.1.6CausalityandCollinearity
15.1.7SimplifyingthePredictors
15.2HypothesisTesting
15.2.1HypothesisTestingonaReal-WorldProblem
15.2.2HypothesisTesting,PValues,andAlpha
15.2.3MakingMistakesinRejectingtheNullHypothesis
15.2.4DegreesofFreedom
15.3ContingencyTables,theChiSquareTest,andNoncausalRelationships
15.3.1ContingencyTables
15.3.2TheChiSquareTest
15.3.3SometimesStrongRelationshipsAreNotCausal
15.4Prediction
15.4.1LinearRegression
15.4.2OtherFormsofRegression
15.5SomeCurrentOfferingsofStatisticsTools
15.5.1SASInstitute
15.5.2SPSS
15.5.3MathSoft
Chapter16.ArtificialIntelligence
16.1DefiningArtificialIntelligence
16.2ExpertSystems
16.3FuzzyLogic
16.4TheRiseandFallofAl
Part4.DataMining
Chapter17.IntroductiontoDataMining
17.1DataMiningHasComeofAge
17:2TheMotivationforDataMiningIsTremendous
17.3LearningfromYourPastMistakes
17.4DataMining?Don'tNeedIt--I'veGotStatistics
17.5MeasuringDataMiningEffectiveness:Accuracy,Speed,andCost
17.6EmbeddingDataMiningintoYourBusinessProcess
17.7TheMoreThingsChange,theMoreTheyRemaintheSame
17.8DiscoveryversusPrediction
17.8.1GoldinThemTharHills
17.8.2Discovery--FindingSomethingYouWeren'tLookingFor
17.8.3Prediction
17.9Overfitting
17.10StateoftheIndustry
17.10.1TargetedSolutions
17.10.2BusinessTools
17.10.3BusinessAnalystTools
17.10.4ResearchAnalystTools
17.11ComparingtheTechnologies
17.11.1BusinessScoreCard
17.11.2ApplicationsScoreCard
17.11.3AlgorithmicScoreCard
Chapter18.DecisionTrees
18.1WhatIsaDecisionTree?
18.2BusinessscoreCard
18.3WheretoUseDecisionTrees
18.3.1Exploration
18.3.2DataPreprocessing
18.3.3Prediction
18.3.4ApplicationsScoreCard
18.4TheGeneralIdea
18.4.1GrowingtheTree
18.4.2WhenDoestheTreeStopGrowing?
18.4.3WhyWouldaDecisionTreeAlgorithmPreventtheTreeFromGrowingIfThereWeren'tEnoughData?
18.4.4DecisionTreesAren'tNecessarilyFinishedafterTheyAreFullyGrown
18.4.5AretheSplitsatEachLeveloftheTreeAlwaysBinaryYes/NoSplits?
18.4.6PickingtheBestPredictors
18.4.7PickingtheRightPredictorValuefortheSplit
18.5HowtheDecisionTreeWorks
18.5.1HandlingHigh-CardinalityPredictorsinID3
18.5.2C4.5EnhancesID3
18.5.3CARTDefinition
18.5.4PredictorsArePickedasTheyDecreasetheDisorderoftheData
18.5.5CARTSplitsUnorderedPredictorsbyImposingOrderonThem
18.5.6CARTAutomaticallyValidatestheTree
18.5.7CARTSurrogatesHandleMissingData
18.5.8CHAID
18.6CaseStudy:PredictingWirelessTelecommunicationsChumwithCART
18.7StrengthsandWeaknesses
18.7.1AlgorithmScoreCard
18.7.2StateoftheIndustry
Chapter19.NeuralNetworks
19.1WhatIsaNeural'Network?
19.1.1Don'tNeuralNetworksLearntoMakeBetterPredictions?
19.1.2AreNeuralNetworksEasytoUse?
19.1.3BusinessscoreCard
19.2WheretoUseNeuralNetworks
19.2.1NeuralNetworksforClustering
19.2.2NeuralNetworksforFeatureExtraction
19.2.3ApplicationsScoreCard
19.3TheGeneralIdea
19.3.1WhatDoesaNeuralNetworkLookLike?
19.3.2HowDoesaNeuralNetworkMakeaPrediction?
19.3.3HowIstheNeuralNetworkModelCreated?
19.3.4HowComplexCantheNeuralNetworkModelBecome?
19.3.5HiddenNodesAreLikeTrustedAdvisorstotheOutputNodes
19.3.6DesignDecisionsinArchitectingaNeuralNetwork
19.3.7DifferentTypesofNeuralNetworks
19.3.8KohonenFeatureMaps
19.3.9HowDoestheNeuralNetworkResembletheHumanBrain?
19.3.10ANeuralNetworkLearnstoSpeak
19.3.11ANeuralNetworkLearnstoDrive
19.3.12TheHumanBrainIsStillMuchMorePowerful
19.4HowtheNeuralNetworkWorks
19.4.1HowPredictionsAreMade
19.4.2HowBackpropagafionLearningWorks
19.4.3DataPreparation
19.4.4CombattingOverfitting
19.4.5ApplyingandTrainingtheNeuralNetwork
19.4.6ExplainingtheNetwork
19.5CaseStudy:PredictingCurrencyExchangeRates
19.5.1TheProblem
19.5.2Implementation
19.5.3Theresults
19.6StrengthsandWeaknessess
19.6.1AlgorithmScoreCard
19.6.2SomeCurrentMarketOfferings
19.6.3Radial-Basis-FunctionNetworks
19.6.4GeneticAlgorithmsandNeuralNetworks
19.6.5SimulatedAnnealingandNeuralNetworks
Chapter20.NearestNeighborandClustering
20.1BusinessScoreCard
20.2WheretoUseClusteringandNearest-NeighborPrediction
20.2.1ClusteringforClarity
20.2.2ClusteringforOutlierAnalysis
20.2.3NearestNeighborforPrediction
20.2.4ApplicationsScoreCard
20.3TheGeneralIdea
20.3.1ThereIsNoBestWaytoCluster
20.3.2HowAreTradeoffsMadeWhenDeterminingWhichRecordsFallintoWhichCluster
20.3.3ClusteringIstheHappyMediumbetweenHomogeneousClustersandtheLowestNumberofClusters
20.3.4WhatIstheDifferencebetweenClusteringandNearest-NeighborPrediction?
20.3.5WhatIsann-DimensionalSpace?
20.3.6HowIstheSpaceforClusteringandNearestNeighborDefined?
20.4HowClusteringandNearest-NeighborPredictionWork
20.4.1Lookingatann-DimensionalSpace
20.4.2HowIs"Nearness'Defined?
20.4.3WeightingtheDimensions:DistancewithaPurpose
20.4.4CalculatingDimensionWeights
20.4.5HierarchicalandNonhierarchicalClustering
20.4.6Nearest-NeighborPrediction
20.4.7KNearestNeighbors--VotingIsBetter
20.4.8GeneralizingtheSolution:PrototypesandSentries
20.5CaseStudy:ImageRecognitionforHumanHandwriting
20.5.1TheProblem
20.5.2SolutionUsingNearest-NeighborTechniques
20.6StrengthsandWeaknessess
20.6.1AlgorithmScoreCard
20.6.2PredictingFutureTrends
Chapter21.GeneticAlgorithms
21.1WhatAreGeneticAlgorithms?
21.1.1HowDoTheyRelatetoEvolution?
21.1.2GeneticAlgorithms,ArtificialLife,andSimulatedEvolution
21.1.3HowCanTheyBeUsedinBusiness?
21.1.4BusinessScoreCard
21.2WheretoUseGeneticAlgorithms
21.2.1GeneticAlgorithmsforOptimization
21.2.2GeneticAlgorithmsforDataMining
21.2.3ApplicationsScoreCard
21.3TheGeneralIdea
21.3.1DoGeneticAlgorithmsGuesstheRightAnswer?
21.3.2AreGeneticAlgorithmsFullyAutomated?
21.3.3CostMinimization:TravelingSalesperson
21.3.4CooperationStrategies:Prisoner'sDilemma
21.4HowtheGeneticAlgorithmWorks
21.4.1TheOverallProcess
21.4.2SurvivaloftheFittest
21.4.3Mutation
21.4.4SexualReproductionandCrossover
21.4.5ExplorationversusExploitation
21.4.6TheSchemaTheorem
21.4.7Epistasis
21.4.8ClassifierSystems
21.4.9RemainingChallenges
21.4.10Sharing:ASolutiontoPrematureConvergence
21.4.11MetalevelEvolution:TheAutomationofParameterChoice
21.4.12ParallelImplementation
21.5CaseStudy:OptimizingPredictiveCustomerSegments
21.6StrengthsandWeaknessess
21.6.1AlgorithmScoreCard
21.6.2StateoftheMarketplace
21.6.3PredictingFutureTrends
Chapter22.RuleInduction
22.1BusinessScoreCard
22.2WheretoUseRuleInduction
22.2.1WhatIsaRule?
22.2.2WhattoDowithaRule
22.2.3Caveat:RulesDoNotImplyCausality
22.2.4TypesofDatabasesUsedforRuleInduction
22.2.5Discovery
22.2.6Prediction
22.2.7ApplicationsScoreCard
22.3TheGeneralIdea
22.3.1HowtoEvaluatetheRule
22.3.2ConjunctionsandDisjunctions
22.3.3Defining"Interestingness"
22.3.4OtherMeasuresofUsefulness
22.3.5RulesversusDecisionTrees
22.4HowRuleInductionWorks
22.4.1ConstructingRules
22.4.2ABrute-ForceAlgorithmforGeneratingRules
22.4.3CombiningEvidence
22.5CaseStudy:ClassifyingU.S.CensusReturns
22.6StrengthsandWeaknesses
22.7CurrentOfferingsandFutureImprovements
Chapter23.SelectingandUsingtheRightTechnique
23.1UsingtheRightTechnique
23.1.1TheDataMiningProcess
23.1.2WhatAlltheDataMiningTechniquesHaveinCommon
23.1.3CasesinWhichDecisionTreesAreLikeNearestNeighbors
23.1.4RuleInductionIsLikeDecisionTrees
23.1.5CouldYouDoLinkAnalysiswithaNeuralNetwork?
23.2DataMiningintheBusinessProcess
23.2.1AvoidingSomeBigMistakesinDataMining
23.2.2UnderstandingtheData
23.3TheCaseforEmbeddedDataMining
23.3.1TheCostofaDistributedBusinessProcess
23.3.2TheBestWaytoMeasureaDataMiningTool
23.3.3TheCaseforEmbeddedDataMining
23.4HowtoMeasureAccuracy,Explanation,andIntegration
23.4.1MeasuringAccuracy
23.4.2MeasuringExplanation
23.4.3MeasuringIntegration
23.5WhattheFutureHoldsforEmbeddedDataMining
Part5.DataVisualizationandOverallPerspective
Chapter24.DataVisualization
24.1DataVisualizationPrinciples
24.2ParallelCoordinates
24.3VisualizingNeuralNetworks
24.4VisualizationofTrees
24.5StateoftheIndustry
24.5.1AdvancedVisualSystems
24.5.2AltaAnalytics
24.5.3BusinessObjects
24.5.4IBM
24.5.5PilotSoftware
24.5.6SiliconGraphics
Chapter25.PuttingItAllTogether
25.1DesignforScalability
25.2DataQuality
25.3ImplementationNotes
25.3.1OperationalDataStores
25.3.2DataMarts
25.3.3StarSchema
25.4MakingtheMostofYourWarehouse
25.5TheDataWarehousingMarket
25.6CostsandBenefits
25.6.1BigData--BiggerReturns
25.6.2LawofDiminishingReturns
25.7AUnifyingViewofBusinessInformation
25.8What'sNext
25.8.1DistributedWarehouseEnvironments
25.8.2UsingtheInternetorIntranetforInformationDelivery
25.8.3Object-RelationalDatabases
25.8.4VeryLargeDatabases(VLDBs)
25.9Conclusion
AppendixA.Glossary
AppendixB.BigData--BetterReturns:LeveragingYourHiddenDataAssetstoImproveROI
AppendixC.Dr.E.F.Codd's12GuidelinesforOLAP
AppendixD.10MistakesforDataWarehousingManagerstoAvoidBibliography605
Index609