注冊 | 登錄讀書好,好讀書,讀好書!
讀書網(wǎng)-DuShu.com
當(dāng)前位置: 首頁出版圖書科學(xué)技術(shù)計(jì)算機(jī)/網(wǎng)絡(luò)軟件與程序設(shè)計(jì)其他編程語言/工具并行程序設(shè)計(jì)導(dǎo)論(英文版)

并行程序設(shè)計(jì)導(dǎo)論(英文版)

并行程序設(shè)計(jì)導(dǎo)論(英文版)

定 價:¥65.00

作 者: (美)帕切克 著
出版社: 機(jī)械工業(yè)出版社
叢編項(xiàng):
標(biāo) 簽: 程序設(shè)計(jì)

ISBN: 9787111358282 出版時間: 2011-10-01 包裝: 平裝
開本: 16開 頁數(shù): 370 字?jǐn)?shù):  

內(nèi)容簡介

  采用教程形式,從簡短的編程實(shí)例起步,一步步編寫更有挑戰(zhàn)性的程序。重點(diǎn)介紹分布式內(nèi)存和共享式內(nèi)存的程序設(shè)計(jì)、調(diào)試和性能評估。使用MPI、PTrlread和OperIMP等編程模型,強(qiáng)調(diào)實(shí)際動手開發(fā)并行程序。并行編程已不僅僅是面向?qū)I(yè)技術(shù)人員的一門學(xué)科。如果想要全面開發(fā)機(jī)群和多核處理器的計(jì)算能力,那么學(xué)習(xí)分布式內(nèi)存和共享式內(nèi)存的并行編程技術(shù)是不可或缺的。由Peter S.Pacheco編著的《并行程序設(shè)計(jì)導(dǎo)論(英文版)》循序漸進(jìn)地展示了如何利用MPI、PThread和OperlMP開發(fā)高效的并行程序,教給讀者如何開發(fā)、調(diào)試分布式內(nèi)存和共享式內(nèi)存的程序,以及對程序進(jìn)行性能評估。

作者簡介

  帕切克(Petm S.Pacheco),擁有佛羅里達(dá)州立大學(xué)數(shù)學(xué)專業(yè)博士學(xué)位。曾擔(dān)任舊金山大學(xué)計(jì)算機(jī)主任,目前是舊金山大學(xué)數(shù)學(xué)系主任。近20年來,一直為本科生和研究生講授并行計(jì)算課程。

圖書目錄

CHAPTER 1 Why Parallel Computing?
1.1 Why We Need Ever-Increasing Performance
1.2 Why We're Building Parallel Systems
1.3 Why We Need to Write Parallel Programs
1.4 How Do We Write Parallel Programs?
1.5 What We'll Be Doing
1.6 Concurrent, Parallel, Distributed
1.7 The Rest of the Book
1.8 A Word of Warning
1.9 Typographical Conventions
1.10 Summary
1.11 Exercises
CHAPTER 2 Parallel Hardware and Parallel Software
2.1 Some Background
2.1.1 The von Neumann architecture
2.1.2 Processes, multitasking, and threads
2.2 Modifications to the von Neumann Model
2.2.1 The basics of caching
2.2.2 Cache mappings
2.2.3 Caches and programs: an example
2.2.4 Virtual memory
2.2.5 Instruction-level parallelism
2.2.6 Hardware multithreading.
2.3 Parallel Hardware
2.3.1 SIMD systems
2.3.2 MIMD systems
2.3.3 Interconnection networks
2.3.4 Cache coherence
2.3.5 Shared-memory versus distributed-memory
2.4 Parallel Software
2.4.1 Caveats
2.4.2 Coordinating the processes/threads
2.4.3 Shared-memory
2.4.4 Distributed-memory
2.4.5 Programming hybrid systems
2.5 Input and Output
2.6 Performance
2.6.1 Speedup and efficiency
2.6.2 Amdahl's law
2.6.3 Scalability
2.6.4 Taking timings
2.7 Parallel Program Design
2.7.1 An example
2.8 Writing and Running Parallel Programs
2.9 Assumptions
2.10 Summary
2.10.1 Serial systems
2.10.2 Parallel hardware
2.10.3 Parallel software
2.10.4 Input and output
2.10.5 Performance.
2.10.6 Parallel program design
2.10.7 Assumptions
2.11 Exercises
CHAPTER 3 Distributed-Memory Programming with MPI
3.1 Getting Started
3.1.1 Compilation and execution
3.1.2 MPI programs
3.1.3 MPI Init and MPI Finalize
3.1.4 Communicators, MPI Comm size and MPI Comm rank
3.1.5 SPMD programs
3.1.6 Communication
3.1.7 MPI Send
3.1.8 MPI Recv
3.1.9 Message matching
3.1.10 The status p argument
3.1.11 Semantics of MPI Send and MPI Recv
3.1.12 Some potential pitfalls
3.2 The Trapezoidal Rule in MPI
3.2.1 The trapezoidal rule
3.2.2 Parallelizing the trapezoidal rule
Contents xiii
3.3 Dealing with I/O
3.3.1 Output
3.3.2 Input
3.4 Collective Communication
3.4.1 Tree-structured communication
3.4.2 MPI Reduce
3.4.3 Collective vspoint-to-point communications
3.4.4 MPI Allreduce
3.4.5 Broadcast
3.4.6 Data distributions
3.4.7 Scatter
3.4.8 Gather
3.4.9 Allgather
3.5 MPI Derived Datatypes
3.6 Performance Evaluation of MPI Programs
3.6.1 Taking timings
3.6.2 Results
3.6.3 Speedup and efficiency
3.6.4 Scalability
3.7 A Parallel Sorting Algorithm
3.7.1 Some simple serial sorting algorithms
3.7.2 Parallel odd-even transposition sort
3.7.3 Safety in MPI programs
3.7.4 Final details of parallel odd-even sort
3.8 Summary
3.9 Exercises
3.10 Programming Assignments .
CHAPTER 4 Shared-Memory Programming with Pthreads .
4.1 Processes, Threads, and Pthreads
4.2 Hello, World
4.2.1 Execution
4.2.2 Preliminaries
4.2.3 Starting the threads
4.2.4 Running the threads
4.2.5 Stopping the threads
4.2.6 Error checking
4.2.7 Other approaches to thread startup
4.3 Matrix-Vector Multiplication
4.4 Critical Sections
xiv Contents
4.5 Busy-Waiting
4.6 Mutexes .
4.7 Producer-Consumer Synchronization and Semaphores
4.8 Barriers and Condition Variables
4.8.1 Busy-waiting and a mutex
4.8.2 Semaphores
4.8.3 Condition variables
4.8.4 Pthreads barriers
4.9 Read-Write Locks
4.9.1 Linked list functions
4.9.2 A multi-threaded linked list
4.9.3 Pthreads read-write locks
4.9.4 Performance of the various implementations
4.9.5 Implementing read-write locks
4.10 Caches, Cache Coherence, and False Sharing
4.11 Thread-Safety
4.11.1 Incorrect programs can produce correct output
4.12 Summary
4.13 Exercises
4.14 Programming Assignments .
CHAPTER 5 Shared-Memory Programming with OpenMP .
5.1 Getting Started
5.1.1 Compiling and running OpenMP programs
5.1.2 The program
5.1.3 Error checking
5.2 The Trapezoidal Rule
5.2.1 A first OpenMP version
5.3 Scope of Variables
5.4 The Reduction Clause .
5.5 The parallel for Directive
5.5.1 Caveats
5.5.2 Data dependences
5.5.3 Finding loop-carried dependences
5.5.4 Estimating
5.5.5 More on scope
5.6 More About Loops in OpenMP: Sorting .
5.6.1 Bubble sort
5.6.2 Odd-even transposition sort
5.7 Scheduling Loops
5.7.1 The schedule clause
5.7.3 The dynamic and guided schedule types
5.7.4 The runtime schedule type
5.7.5 Which schedule?
5.8 Producers and Consumers
5.8.1 Queues
5.8.2 Message-passing
5.8.3 Sending messages
5.8.4 Receiving messages
5.8.5 Termination detection
5.8.6 Startup
5.8.7 The atomic directive
5.8.8 Critical sections and locks
5.8.9 Using locks in the message-passing program
5.8.10 critical directives, atomic directives, or locks?
5.8.11 Some caveats
5.9 Caches, Cache Coherence, and False Sharing
5.10 Thread-Safety
5.10.1 Incorrect programs can produce correct output
5.11 Summary
5.12 Exercises
5.13 Programming Assignments .
CHAPTER 6 Parallel Program Development
6.1 Two n-Body Solvers
6.1.1 The problem
6.1.2 Two serial programs
6.1.3 Parallelizing the n-body solvers
6.1.4 A word about I/O
6.1.5 Parallelizing the basic solver using OpenMP
6.1.6 Parallelizing the reduced solver using OpenMP
6.1.7 Evaluating the OpenMP codes
6.1.8 Parallelizing the solvers using pthreads
6.1.9 Parallelizing the basic solver using MPI
6.1.10 Parallelizing the reduced solver using MPI
6.1.11 Performance of the MPI solvers
6.2 Tree Search
6.2.1 Recursive depth-first search
6.2.2 Nonrecursive depth-first search
6.2.3 Data structures for the serial implementations
6.2.6 A static parallelization of tree search using pthreads
6.2.7 A dynamic parallelization of tree search using pthreads
6.2.8 Evaluating the pthreads tree-search programs
6.2.9 Parallelizing the tree-search programs using OpenMP
6.2.10 Performance of the OpenMP implementations
6.2.11 Implementation of tree search using MPI and static
partitioning
6.2.12 Implementation of tree search using MPI and dynamic
partitioning
6.3 A Word of Caution
6.4 Which API?
6.5 Summary
6.5.1 Pthreads and OpenMP
6.5.2 MPI
6.6 Exercises
6.7 Programming Assignments
CHAPTER 7 Where to Go from Here
References
Index

本目錄推薦

掃描二維碼
Copyright ? 讀書網(wǎng) ranfinancial.com 2005-2020, All Rights Reserved.
鄂ICP備15019699號 鄂公網(wǎng)安備 42010302001612號