Parallel Processors from Client to Cloud
50 questions available
Questions
What is the primary challenge in parallel programming that is analogous to reporters spending too much time communicating instead of writing their own parts of a story?
View answer and explanationTo achieve a speed-up of 90 times with 100 processors, what is the maximum allowable percentage of the original computation that can be sequential, according to Amdahl's Law?
View answer and explanationWhat is the key difference between strong scaling and weak scaling in the context of parallel performance measurement?
View answer and explanationIn the Flynn taxonomy of computer architectures, which category represents a conventional uniprocessor with a single instruction stream and a single data stream?
View answer and explanationWhat is the primary purpose of hardware multithreading in a processor?
View answer and explanationHow does coarse-grained multithreading differ from fine-grained multithreading?
View answer and explanationWhat is the defining characteristic of a shared memory multiprocessor (SMP)?
View answer and explanationWhat is the primary difference between Uniform Memory Access (UMA) and Nonuniform Memory Access (NUMA) multiprocessors?
View answer and explanationWhich characteristic is a primary way in which a Graphics Processing Unit (GPU) differs architecturally from a Central Processing Unit (CPU)?
View answer and explanationWhat is the term for the programming model that uses a single program running on all processors of a MIMD computer, with conditional statements to differentiate behavior?
View answer and explanationWhat is the primary motivation behind the Single Instruction, Multiple Thread (SIMT) architecture used in GPUs?
View answer and explanationIn the context of Warehouse-Scale Computers (WSCs), what is meant by Request-Level Parallelism?
View answer and explanationWhat is the bisection bandwidth of a ring network topology with P processors, where each link has a bandwidth of B?
View answer and explanationIn the Roofline performance model, what does the term 'arithmetic intensity' represent?
View answer and explanationIn the comparison of the Intel Core i7 960 and the NVIDIA GTX 280, what was a key architectural feature present in the GPU but missing from the CPU's SIMD extensions that provided a significant performance advantage for the GJK kernel?
View answer and explanationWhen optimizing the DGEMM routine on a 16-core system, what was the performance impact of adding threads for a small matrix size (32x32) that fits entirely in the first-level data cache?
View answer and explanationWhat is the primary fallacy associated with the statement 'Amdahl's Law doesn’t apply to parallel computers'?
View answer and explanationWhat is the key advantage of a vector architecture's approach to memory access for adjacent elements compared to a scalar architecture?
View answer and explanationWhat does the OpenMP pragma '#pragma omp parallel for' accomplish when placed before a for loop in C?
View answer and explanationWhat is the primary reason that clusters, which are composed of independent computers, have become the dominant architecture for large-scale internet services?
View answer and explanationIn a hypothetical speed-up scenario involving a matrix sum, increasing the matrix dimension from 10x10 to 20x20 resulted in a much better speed-up on 40 processors. What concept does this illustrate?
View answer and explanationWhat is a primary advantage of vector instructions over the multimedia SIMD extensions found in architectures like the x86?
View answer and explanationOn a single core of an Intel i7 processor with Simultaneous Multithreading (SMT), what was the average speed-up and energy efficiency improvement for the PARSEC benchmarks?
View answer and explanationWhat is the primary function of the `__syncthreads()` intrinsic in the CUDA programming model?
View answer and explanationIn the context of the Roofline Model for an AMD Opteron X2, a kernel with an arithmetic intensity of 0.5 FLOPs/byte is primarily limited by what factor?
View answer and explanationWhat is the term for a multiprocessor in which the latency to any word in main memory is approximately the same regardless of which processor initiates the access?
View answer and explanationWhat is the primary purpose of a 'reduction' operation in parallel programming?
View answer and explanationIn the CUDA memory model, which memory space is private to each individual thread and is used for purposes like stack frames and register spilling?
View answer and explanationWhat is the primary drawback of a fully connected network topology in a multiprocessor system?
View answer and explanationWhich of these is NOT one of the 'three Cs' used as a model for classifying cache misses?
View answer and explanationIn a load balancing scenario where a 400t parallel workload is distributed among 40 processors, what is the execution time if one processor has 12.5 percent of the load and the sequential overhead is 10t?
View answer and explanationThe DAXPY loop from the Linpack benchmark is used as an example of vectorization. What does the acronym DAXPY stand for?
View answer and explanationWhat is the primary advantage that Simultaneous Multithreading (SMT) has over both fine-grained and coarse-grained multithreading on a superscalar processor?
View answer and explanationWhat is a 'thread block' in the CUDA programming model?
View answer and explanationIn the MapReduce framework, what are the two primary programmer-supplied functions?
View answer and explanationWhat does a single CUDA thread correspond to in the SIMT hardware execution model?
View answer and explanationWhich of these is NOT a key architectural difference between multicore CPUs with SIMD and GPUs as summarized in Figure 6.11?
View answer and explanationWhat is the term for the collection of all data transfer instructions, arithmetic/logical instructions, control instructions, and floating-point instructions that a processor can execute?
View answer and explanationWhat is the main purpose of register windows, a feature unique to the SPARC architecture?
View answer and explanationIn the ARM architecture, what is the purpose of the BX (Branch and Exchange) instruction?
View answer and explanationWhich statement accurately describes the characteristics of the Thumb instruction set compared to the ARM instruction set?
View answer and explanationWhat is the primary trade-off in SIMD multiprocessors, as exemplified by the comparison between the Connection Machine and the Illiac IV?
View answer and explanationIn the scenario of baking three blueberry pound cakes with only one oven, one bowl, and one mixer, what is the primary bottleneck?
View answer and explanationIn the context of the loop `for (j=2;j<=1000;j++) D[j] = D[j-1]+D[j-2];`, what is the term for the dependency of one iteration's calculation on the results of previous iterations?
View answer and explanationWhat is the primary purpose of using 'Software as a Service (SaaS)' in the context of cloud computing and WSCs?
View answer and explanationIn a symmetric multicore processor (SMP) with four cores, if Core 1 executes `x = 2;` and Core 2 executes `y = 2;` while Core 4 executes `z = x + y;`, which of the following is a possible resulting value for z if there is no synchronization?
View answer and explanationWhat is the primary reason the Illiac IV, an early SIMD supercomputer, is considered to have failed as a computer project?
View answer and explanationThe final optimized C version of the DGEMM routine for the Intel Core i7, incorporating subword parallelism, instruction-level parallelism, cache blocking, and thread-level parallelism, was how many times faster than the unoptimized version for a 960x960 matrix?
View answer and explanationWhat does the term 'vector lane' refer to in the context of a modern vector processor architecture?
View answer and explanationWhat is the primary reason that a multicore processor with a single physical address space is referred to as a Shared Memory Multiprocessor (SMP)?
View answer and explanation