Contents
Preface xv
CHAPTERS
Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Below Your Program 10
1.3 Under the Covers 13
1.4 Performance 26
1.5 The Power Wall 39
1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors 41
1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4 44
1.8 Fallacies and Pitfalls 51
1.9 Concluding Remarks 54
1.10 Historical Perspective and Further Reading 55
1.11 Exercises 56
Instructions: Language of the Computer 74
2.1 Introduction 76
2.2 Operations of the Computer Hardware 77
2.3 Operands of the Computer Hardware 80
2.4 Signed and Unsigned Numbers 86
2.5 Representing Instructions in the Computer 93
2.6 Logical Operations 100
2.7 Instructions for Making Decisions 104
2.8 Supporting Procedures in Computer Hardware 113
2.9 Communicating with People 122
2.10 ARM Addressing for 32-Bit Immediates and More Complex Addressing Modes 127
2.11 Parallelism and Instructions: Synchronization 133
2.12 Translating and Starting a Program 135
2.13 A C Sort Example to Put It All Together 143
: This icon identi.es material on the CD
2.14 Arrays versus Pointers 152
2.15 Advanced Material: Compiling C and Interpreting Java 156
2.16 Real Stuff: MIPS Instructions 156
2.17 Real Stuff: x86 Instructions 161
2.18 Fallacies and Pitfalls 170
2.19 Concluding Remarks 171
2.20 Historical Perspective and Further Reading 174
2.21 Exercises 174
Arithmetic for Computers 214
3.1 Introduction 216
3.2 Addition and Subtraction 216
3.3 Multiplication 220
3.4 Division 226
3.5 Floating Point 232
3.6 Parallelism and Computer Arithmetic: Associativity 258
3.7 Real Stuff: Floating Point in the x86 259
3.8 Fallacies and Pitfalls 262
3.9 Concluding Remarks 265
3.10 Historical Perspective and Further Reading 268
3.11 Exercises 269
The Processor 284
4.1 Introduction 286
4.2 Logic Design Conventions 289
4.3 Building a Datapath 293
4.4 A Simple Implementation Scheme 302
4.5 An Overview of Pipelining 316
4.6 Pipelined Datapath and Control 330
4.7 Data Hazards: Forwarding versus Stalling 349
4.8 Control Hazards 361
4.9 Exceptions 370
4.10 Parallelism and Advanced Instruction-Level Parallelism 377
4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline 390
4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 392
4.13 Fallacies and Pitfalls 393
4.14 Concluding Remarks 394
4.15 Historical Perspective and Further Reading 395
4.16 Exercises 395
Large and Fast: Exploiting Memory Hierarchy 436
5.1 Introduction 438
5.2 The Basics of Caches 443
5.3 Measuring and Improving Cache Performance 461
5.4 Virtual Memory 478
5.5 A Common Framework for Memory Hierarchies 504
5.6 Virtual Machines 511
5.7 Using a Finite-State Machine to Control a Simple Cache 515
5.8 Parallelism and Memory Hierarchies: Cache Coherence 520
5.9 Advanced Material: Implementing Cache Controllers 524
5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies 525
5.11 Fallacies and Pitfalls 529
5.12 Concluding Remarks 533
5.13 Historical Perspective and Further Reading 534
5.14 Exercises 534
Storage and Other I/O Topics 554
6.1 Introduction 556
6.2 Dependability, Reliability, and Availability 559
6.3 Disk Storage 561
6.4 Flash Storage 566
6.5 Connecting Processors, Memory, and I/O Devices 568
6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System 572
6.7 I/O Performance Measures: Examples from Disk and File Systems 582
6.8 Designing an I/O System 584
6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks 585
6.10 Real Stuff: Sun Fire x4150 Server 592
6.11 Advanced Topics: Networks 598
6.12 Fallacies and Pitfalls 599
6.13 Concluding Remarks 603
6.14 Historical Perspective and Further Reading 604
6.15 Exercises 605
Multicores, Multiprocessors, and Clusters 616
7.1 Introduction 618
7.2 The Dif.culty of Creating Parallel Processing Programs 620
7.3 Shared Memory Multiprocessors 624
7.4 Clusters and Other Message-Passing Multiprocessors 627
7.5 Hardware Multithreading 631
7.6 SISD, MIMD, SIMD, SPMD, and Vector 634
7.7 Introduction to Graphics Processing Units 640
7.8 Introduction to Multiprocessor Network Topologies 646
7.9 Multiprocessor Benchmarks 650
7.10 Roo.ine: A Simple Performance Model 653
7.11 Real Stuff: Benchmarking Four Multicores Using the Roo. ine Model 661
7.12 Fallacies and Pitfalls 670
7.13 Concluding Remarks 672
7.14 Historical Perspective and Further Reading 674
7.15 Exercises 674 Index I-1
CD-ROM CONTENT
Graphics and Computing GPUs A-2
A.1 Introduction A-3
A.2 GPU System Architectures A-7
A.3 Scalable Parallelism – Programming GPUs A-12
A.4 Multithreaded Multiprocessor Architecture A-25
A.5 Parallel Memory System G.6 Floating Point A-36
A.6 Floating Point Arithmetic A-41
A.7 Real Stuff: The NVIDIA GeForce 8800 A-46
A.8 Real Stuff: Mapping Applications to GPUs A-55
A.9 Fallacies and Pitfalls A-72
A.10 Concluding Remarks A-76
A.11 Historical Perspective and Further Reading A-77
ARM and Thumb Assembler Instructions B1-2
B1.1 Using This Appendix B1-3 B1.2 Syntax B1-4 B1.3 Alphabetical List of ARM and Thumb Instructions B1-8 B1.4 ARM Assembler Quick Reference B1-49 B1.5 GNU Assembler Quick Reference B1-60
ARM and Thumb Instruction Encodings B2-2
B2.1 ARM Instruction Set Encodings B2-3
B2.2 Thumb Instruction Set Encodings B2-9
B2.3 Program Status Registers B2-11
Instruction Cycle Timings B3-2
B3.1 Using the Instruction Set Cycle Timing Tables B3-3 B3.2 ARM7TDMI Instruction Cycle Timings B3-5 B3.3 ARM9TDMI Instruction Cycle Timings B3-6 B3.4 StrongARM1 Instruction Cycle Timings B3-8 B3.5 ARM9E Instruction Cycle Timings B3-9 B3.6 ARM10E Instruction Cycle Timings B3-11 B3.7 Intel XScale Instruction Cycle Timings B3-12 B3.8 ARM11 Cycle Timings B3-14
C The Basics of Logic Design C-2
C.1 Introduction C-3
C.2 Gates, Truth Tables, and Logic Equations C-4
C.3 Combinational Logic C-9
C.4 Using a Hardware Description Language C-20
C.5 Constructing a Basic Arithmetic Logic Unit C-26
C.6 Faster Addition: Carry Lookahead C-38
C.7 Clocks C-48
C.8 Memory Elements: Flip-Flops, Latches, and Registers C-50
C.9 Memory Elements: SRAMs and DRAMs C-58
C.10 Finite-State Machines C-67
C.11 Timing Methodologies C-72
C.12 Field Programmable Devices C-78
C.13 Concluding Remarks C-79
C.14 Exercises C-80
D Mapping Control to Hardware D-2
D.1 Introduction D-3
D.2 Implementing Combinational Control Units D-4
D.3 Implementing Finite-State Machine Control D-8
D.4 Implementing the Next-State Function with a Sequencer D-22
D.5 Translating a Microprogram to Hardware D-28
D.6 Concluding Remarks D-32
D.7 Exercises D-33
ADVANCED CONTENT
Section 2.15 Compiling C and Interpreting Java Section 4.12 An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations Section 5.9 Implementing Cache Controllers Section 6.11 Networks
HISTORICAL PERSPECTIVES & FURTHER READING
Chapter 1 Computer Abstractions and Technology: Section 1.10 Chapter 2 Instructions: Language of the Computer: Section 2.20 Chapter 3 Arithmetic for Computers: Section 3.10 Chapter 4 The Processor: Section 4.15 Chapter 5 Large and Fast: Exploiting Memory Hierarchy: Section 5.13 Chapter 6 Storage and Other I/O Topics: Section 6.14 Chapter 7 Multicores, Multiprocessors, and Clusters: Section 7.14 Appendix A Graphics and Computing GPUs: Section A.11
TUTORIALS
VHDL
Verilog
SOFTWARE
Xilinx FPGA Design, Simulation and Synthesis Software QEMU http://www.nongnu.org/qemu/about.html
Glossary G-1 Index I-1 Further Reading FR-1