Parallel Processors from Client to Cloud

50 questions available

Summary unavailable.

Questions

Question 1

What is the primary challenge in parallel programming that is analogous to reporters spending too much time communicating instead of writing their own parts of a story?

View answer and explanation
Question 2

To achieve a speed-up of 90 times with 100 processors, what is the maximum allowable percentage of the original computation that can be sequential, according to Amdahl's Law?

View answer and explanation
Question 3

What is the key difference between strong scaling and weak scaling in the context of parallel performance measurement?

View answer and explanation
Question 4

In the Flynn taxonomy of computer architectures, which category represents a conventional uniprocessor with a single instruction stream and a single data stream?

View answer and explanation
Question 5

What is the primary purpose of hardware multithreading in a processor?

View answer and explanation
Question 6

How does coarse-grained multithreading differ from fine-grained multithreading?

View answer and explanation
Question 7

What is the defining characteristic of a shared memory multiprocessor (SMP)?

View answer and explanation
Question 8

What is the primary difference between Uniform Memory Access (UMA) and Nonuniform Memory Access (NUMA) multiprocessors?

View answer and explanation
Question 9

Which characteristic is a primary way in which a Graphics Processing Unit (GPU) differs architecturally from a Central Processing Unit (CPU)?

View answer and explanation
Question 10

What is the term for the programming model that uses a single program running on all processors of a MIMD computer, with conditional statements to differentiate behavior?

View answer and explanation
Question 11

What is the primary motivation behind the Single Instruction, Multiple Thread (SIMT) architecture used in GPUs?

View answer and explanation
Question 12

In the context of Warehouse-Scale Computers (WSCs), what is meant by Request-Level Parallelism?

View answer and explanation
Question 13

What is the bisection bandwidth of a ring network topology with P processors, where each link has a bandwidth of B?

View answer and explanation
Question 14

In the Roofline performance model, what does the term 'arithmetic intensity' represent?

View answer and explanation
Question 15

In the comparison of the Intel Core i7 960 and the NVIDIA GTX 280, what was a key architectural feature present in the GPU but missing from the CPU's SIMD extensions that provided a significant performance advantage for the GJK kernel?

View answer and explanation
Question 16

When optimizing the DGEMM routine on a 16-core system, what was the performance impact of adding threads for a small matrix size (32x32) that fits entirely in the first-level data cache?

View answer and explanation
Question 17

What is the primary fallacy associated with the statement 'Amdahl's Law doesn’t apply to parallel computers'?

View answer and explanation
Question 18

What is the key advantage of a vector architecture's approach to memory access for adjacent elements compared to a scalar architecture?

View answer and explanation
Question 19

What does the OpenMP pragma '#pragma omp parallel for' accomplish when placed before a for loop in C?

View answer and explanation
Question 20

What is the primary reason that clusters, which are composed of independent computers, have become the dominant architecture for large-scale internet services?

View answer and explanation
Question 21

In a hypothetical speed-up scenario involving a matrix sum, increasing the matrix dimension from 10x10 to 20x20 resulted in a much better speed-up on 40 processors. What concept does this illustrate?

View answer and explanation
Question 22

What is a primary advantage of vector instructions over the multimedia SIMD extensions found in architectures like the x86?

View answer and explanation
Question 23

On a single core of an Intel i7 processor with Simultaneous Multithreading (SMT), what was the average speed-up and energy efficiency improvement for the PARSEC benchmarks?

View answer and explanation
Question 24

What is the primary function of the `__syncthreads()` intrinsic in the CUDA programming model?

View answer and explanation
Question 25

In the context of the Roofline Model for an AMD Opteron X2, a kernel with an arithmetic intensity of 0.5 FLOPs/byte is primarily limited by what factor?

View answer and explanation
Question 26

What is the term for a multiprocessor in which the latency to any word in main memory is approximately the same regardless of which processor initiates the access?

View answer and explanation
Question 27

What is the primary purpose of a 'reduction' operation in parallel programming?

View answer and explanation
Question 28

In the CUDA memory model, which memory space is private to each individual thread and is used for purposes like stack frames and register spilling?

View answer and explanation
Question 29

What is the primary drawback of a fully connected network topology in a multiprocessor system?

View answer and explanation
Question 30

Which of these is NOT one of the 'three Cs' used as a model for classifying cache misses?

View answer and explanation
Question 31

In a load balancing scenario where a 400t parallel workload is distributed among 40 processors, what is the execution time if one processor has 12.5 percent of the load and the sequential overhead is 10t?

View answer and explanation
Question 32

The DAXPY loop from the Linpack benchmark is used as an example of vectorization. What does the acronym DAXPY stand for?

View answer and explanation
Question 33

What is the primary advantage that Simultaneous Multithreading (SMT) has over both fine-grained and coarse-grained multithreading on a superscalar processor?

View answer and explanation
Question 34

What is a 'thread block' in the CUDA programming model?

View answer and explanation
Question 35

In the MapReduce framework, what are the two primary programmer-supplied functions?

View answer and explanation
Question 36

What does a single CUDA thread correspond to in the SIMT hardware execution model?

View answer and explanation
Question 37

Which of these is NOT a key architectural difference between multicore CPUs with SIMD and GPUs as summarized in Figure 6.11?

View answer and explanation
Question 38

What is the term for the collection of all data transfer instructions, arithmetic/logical instructions, control instructions, and floating-point instructions that a processor can execute?

View answer and explanation
Question 39

What is the main purpose of register windows, a feature unique to the SPARC architecture?

View answer and explanation
Question 40

In the ARM architecture, what is the purpose of the BX (Branch and Exchange) instruction?

View answer and explanation
Question 41

Which statement accurately describes the characteristics of the Thumb instruction set compared to the ARM instruction set?

View answer and explanation
Question 42

What is the primary trade-off in SIMD multiprocessors, as exemplified by the comparison between the Connection Machine and the Illiac IV?

View answer and explanation
Question 43

In the scenario of baking three blueberry pound cakes with only one oven, one bowl, and one mixer, what is the primary bottleneck?

View answer and explanation
Question 44

In the context of the loop `for (j=2;j<=1000;j++) D[j] = D[j-1]+D[j-2];`, what is the term for the dependency of one iteration's calculation on the results of previous iterations?

View answer and explanation
Question 45

What is the primary purpose of using 'Software as a Service (SaaS)' in the context of cloud computing and WSCs?

View answer and explanation
Question 46

In a symmetric multicore processor (SMP) with four cores, if Core 1 executes `x = 2;` and Core 2 executes `y = 2;` while Core 4 executes `z = x + y;`, which of the following is a possible resulting value for z if there is no synchronization?

View answer and explanation
Question 47

What is the primary reason the Illiac IV, an early SIMD supercomputer, is considered to have failed as a computer project?

View answer and explanation
Question 48

The final optimized C version of the DGEMM routine for the Intel Core i7, incorporating subword parallelism, instruction-level parallelism, cache blocking, and thread-level parallelism, was how many times faster than the unoptimized version for a 960x960 matrix?

View answer and explanation
Question 49

What does the term 'vector lane' refer to in the context of a modern vector processor architecture?

View answer and explanation
Question 50

What is the primary reason that a multicore processor with a single physical address space is referred to as a Shared Memory Multiprocessor (SMP)?

View answer and explanation