强化学习与最优控制（国际知名大学原版教材英文版）

定　价：¥149.00

作　者：	（美）德梅萃·P.博赛卡斯
出版社：	清华大学出版社
丛编项：	信息技术学科与电气工程学科系列
标　签：	暂缺

购买这本书可以去

京东 (¥149.00)

ISBN：	9787302540328	出版时间：	2020-06-01	包装：
开本：	16开	页数：	392	字数：

内容简介

　　《强化学习与最优控制（英文版）》的目的是考虑大型且具有挑战性的多阶段决策问题，这些问题原则上可以通过动态规划和优控制来解决，但它们的解决方案在计算上是难以处理的。《强化学习与最优控制（英文版）》讨论依赖于近似的解决方法，以产生具有足够性能的次优策略。这些方法统称为增强学习，也可以叫做近似动态规划和神经动态规划等。《强化学习与最优控制（英文版）》的主题产生于优控制和人工智能思想的相互作用。《强化学习与最优控制（英文版）》的目的之一是探索这两个领域之间的共同边界，并架设一座具有任一领域背景的人士都可以访问的桥梁。

作者简介

　　Dimitri P. Bertseka，美国MIT终身教授，美国国家工程院院士，清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者，著有《非线性规划》《网络优化》《凸优化》等十几本畅销教材和专著。

图书目录

Contents

1. Exact Dynamic Programming

1.1. DeterministicDynamicProgramming . . . . . . . . . . . p. 2

1.1.1. DeterministicProblems . . . . . . . . . . . . . . p. 2

1.1.2. TheDynamicProgrammingAlgorithm . . . . . . . . p. 7

1.1.3. Approximation inValue Space . . . . . . . . . . . p. 12

1.2. StochasticDynamicProgramming . . . . . . . . . . . . . p. 14

1.3. Examples,Variations, and Simplifications . . . . . . . . . p. 18

1.3.1. Deterministic ShortestPathProblems . . . . . . . . p. 19

1.3.2. DiscreteDeterministicOptimization . . . . . . . . . p. 21

1.3.3. Problemswith aTermination State . . . . . . . . . p. 25

1.3.4. Forecasts . . . . . . . . . . . . . . . . . . . . . p. 26

1.3.5. Problems with Uncontrollable State Components . . . p. 29

1.3.6. PartialState Information andBelief States . . . . . . p. 34

1.3.7. LinearQuadraticOptimalControl . . . . . . . . . . p. 38

1.3.8. SystemswithUnknownParameters -Adaptive . . . . . .

Control . . . . . . . . . . . . . . . . . . . . . p. 40

1.4. ReinforcementLearning andOptimalControl - Some . . . . . .

Terminology . . . . . . . . . . . . . . . . . . . . . . p. 43

1.5. Notes and Sources . . . . . . . . . . . . . . . . . . . p. 45

2. Approximation in Value Space

2.1. ApproximationApproaches inReinforcementLearning . . . . p. 50

2.1.1. General Issues ofApproximation inValue Space . . . . p. 54

2.1.2. Off-Line andOn-LineMethods . . . . . . . . . . . p. 56

2.1.3. Model-Based Simplification of the Lookahead . . . . . .

Minimization . . . . . . . . . . . . . . . . . . . p. 57

2.1.4. Model-Free off-Line Q-Factor Approximation . . . . p. 58

2.1.5. Approximation inPolicy Space onTop of . . . . . . . .

ApproximationinValue Space . . . . . . . . . . . p. 61

2.1.6. When is Approximation in Value Space Effective? . . . p. 62

2.2. Multistep Lookahead . . . . . . . . . . . . . . . . . . p. 64

??ii

viii Contents

2.2.1. Multistep Lookahead and Rolling Horizon . . . . . . p. 65

2.2.2. Multistep Lookahead and Deterministic Problems . . . p. 67

2.3. Problem Approximation . . . . . . . . . . . . . . . . . p. 69

2.3.1. Enforced Decomposition . . . . . . . . . . . . . . p. 69

2.3.2. Probabilistic Approximation - Certainty Equivalent . . . .

Control . . . . . . . . . . . . . . . . . . . . . p. 76

2.4. Rollout and the Policy Improvement Principle . . . . . . . p. 83

2.4.1. On-Line Rollout for Deterministic Discrete . . . . . . . .

Optimization . . . . . . . . . . . . . . . . . . . p. 84

2.4.2. Stochastic Rollout and Monte Carlo Tree Search . . . p. 95

2.4.3. Rollout with an Expert . . . . . . . . . . . . . p. 104

2.5. On-Line Rollout for Deterministic Infinite-Spaces Problems - . . .

Optimization Heuristics . . . . . . . . . . . . . . . . p. 106

2.5.1. Model Predictive Control . . . . . . . . . . . . . p. 108

2.5.2. Target Tubes and the Constrained Controllability . . . . .

Condition . . . . . . . . . . . . . . . . . . . p. 115

2.5.3. Variants of Model Predictive Control . . . . . . . p. 118

2.6. Notes and Sources . . . . . . . . . . . . . . . . . . p. 120

3. Parametric Approximation

3.1. Approximation Architectures . . . . . . . . . . . . . . p. 126

3.1.1. Linear and Nonlinear Feature-Based Architectures . . p. 126

3.1.2. Training of Linear and Nonlinear Architectures . . . p. 134

3.1.3. Incremental Gradient and Newton Methods . . . . . p. 135

3.2. Neural Networks . . . . . . . . . . . . . . . . . . . p. 149

3.2.1. Training of Neural Networks . . . . . . . . . . . p. 153

3.2.2. Multilayer and Deep Neural Networks . . . . . . . p. 157

3.3. Sequential Dynamic Programming Approximation . . . . . p. 161

3.4. Q-Factor Parametric Approximation . . . . . . . . . . . p. 162

3.5. Parametric Approximation in Policy Space by . . . . . . . . .

Classification . . . . . . . . . . . . . . . . . . . . . p. 165

3.6. Notes and Sources . . . . . . . . . . . . . . . . . . p. 171

4. Infinite Horizon Dynamic Programming

4.1. An Overview of Infinite Horizon Problems . . . . . . . . p. 174

4.2. Stochastic Shortest Path Problems . . . . . . . . . . . p. 177

4.3. Discounted Problems . . . . . . . . . . . . . . . . . p. 187

4.4. Semi-Markov Discounted Problems . . . . . . . . . . . p. 192

4.5. Asynchronous Distributed Value Iteration . . . . . . . . p. 197

4.6. Policy Iteration . . . . . . . . . . . . . . . . . . . p. 200

4.6.1. Exact Policy Iteration . . . . . . . . . . . . . . p. 200

4.6.2. Optimistic and Multistep Lookahead Policy . . . . . . .

Iteration . . . . . . . . . . . . . . . . . . . . p. 205

4.6.3. Policy Iteration for Q-factors . . . . . . . . . . . p. 208

Contents i??

4.7. Notes and Sources . . . . . . . . . . . . . . . . . . p. 209

4.8. Appendix: MathematicalAnalysis . . . . . . . . . . . p. 211

4.8.1. Proofs for Stochastic ShortestPathProblems . . . . p. 212

4.8.2. Proofs forDiscountedProblems . . . . . . . . . . p. 217

4.8.3. ConvergenceofExact andOptimistic . . . . . . . . . .

Policy Iteration . . . . . . . . . . . . . . . . p. 218

5. Infinite Horizon Reinforcement Learning

5.1. Approximation in Value Space - Performance Bounds . . . p. 222

5.1.1. LimitedLookahead . . . . . . . . . . . . . . . p. 224

5.1.2. Rollout and Approximate Policy Improvement . . . p. 227

5.1.3. ApproximatePolicy Iteration . . . . . . . . . . . p. 232

5.2. FittedValue Iteration . . . . . . . . . . . . . . . . . p. 235

5.3. Simulation-BasedPolicy IterationwithParametric . . . . . . .

Approximation . . . . . . . . . . . . . . . . . . . . p. 239

5.3.1. Self-Learning andActor-CriticMethods . . . . . . p. 239

5.3.2. Model-Based Variant of a Critic-Only Method . . . p. 241

5.3.3. Model-FreeVariant of aCritic-OnlyMethod . . . . p. 243

5.3.4. Implementation Issues ofParametricPolicy . . . . . . .

Iteration . . . . . . . . . . . . . . . . . . . . p. 246

5.3.5. Convergence Issues ofParametricPolicy Iteration - . . . .

Oscillations . . . . . . . . . . . . . . . . . . . p. 249

5.4. Q-Learning . . . . . . . . . . . . . . . . . . . . . p. 253

5.4.1. Optimistic Policy Iteration with Parametric Q-Factor . . .

Approximation- SARSAandDQN . . . . . . . . p. 255

5.5. AdditionalMethods -TemporalDifferences . . . . . . . p. 256

5.6. Exact andApproximateLinearProgramming . . . . . . p. 267

5.7. Approximation inPolicy Space . . . . . . . . . . . . . p. 270

5.7.1. Training byCostOptimization -PolicyGradient, . . . . .

Cross-Entropy,andRandomSearchMethods . . . . p. 276

5.7.2. Expert-BasedSupervisedLearning . . . . . . . . p. 286

5.7.3. ApproximatePolicy Iteration,Rollout, and . . . . . . .

ApproximationinPolicySpace . . . . . . . . . . p. 288

5.8. Notes and Sources . . . . . . . . . . . . . . . . . . p. 293

5.9. Appendix: MathematicalAnalysis . . . . . . . . . . . p. 298

5.9.1. Performance Bounds for Multistep Lookahead . . . . p. 299

5.9.2. Performance Bounds for Rollout . . . . . . . . . . p. 301

5.9.3. Performance Bounds for Approximate Policy . . . . . . .

Iteration . . . . . . . . . . . . . . . . . . . . p. 304

6. Aggregation

6.1. AggregationwithRepresentativeStates . . . . . . . . . p. 308

6.1.1. Continuous State and Control Space Discretization . p. 314

6.1.2. Continuous State Space - POMDP Discretization . . p. 315

?? Contents

6.2. AggregationwithRepresentativeFeatures . . . . . . . . p. 317

6.2.1. Hard Aggregation and Error Bounds . . . . . . . . p. 320

6.2.2. AggregationUsingFeatures . . . . . . . . . . . . p. 322

6.3. Methods for Solving theAggregateProblem . . . . . . . p. 328

6.3.1. Simulation-BasedPolicy Iteration . . . . . . . . . p. 328

6.3.2. Simulation-Based Value Iteration . . . . . . . . . p. 331

6.4. Feature-BasedAggregationwith aNeuralNetwork . . . . p. 332

6.5. BiasedAggregation . . . . . . . . . . . . . . . . . . p. 334

6.6. Notes and Sources . . . . . . . . . . . . . . . . . . p. 337

6.7. Appendix: MathematicalAnalysis . . . . . . . . . . . p. 340

References . . . . . . . . . . . . . . . . . . . . . . . p. 345

Index . . . . . . . . . . . . . . . . . . . . . . . . . . p. 369

强化学习与最优控制（国际知名大学原版教材英文版）

购买这本书可以去

内容简介

作者简介

图书目录

本目录推荐

电气工程自动化与新能源利用研究…

物联网和工业4.0实用指南：物流…

协作机器人集成应用

航空航天遥感（第2版）

智能机器人：从技能传递到人机协…

网络与人工智能法治：跨学科的研…

自动控制原理与系统（第3版）

电气自动化专业骨干教师培训教程…

基于FPGA的深度学习加速器的设计…

铁路区间自动控制系统维护

强化学习与最优控制（国际知名大学原版教材 英文版）

购买这本书可以去

内容简介

作者简介

图书目录

本目录推荐

强化学习与最优控制（国际知名大学原版教材英文版）