(Due in Class on September 30th)
Q. 1 A program on a computer has an average datacache hit rate of 96%. The cache hit time is 1 cycle; the cache miss latency is 20 cycles. Assume 25% of instructions access memory on average, and nonmemory instructions are all initiated at a rate of 1 per cycle without stalls. If the program runtime would have been 1 unit in a hypothetical situation without any cache misses, what is the runtime with the cache misses?
Q. 2. Suppose you have a machine with a 16 entry direct-mapped cache. Each cache entry can hold 256 bytes of data. A program is run on this machine that makes the following memory accesses in hexadecimal (each memory address addresses a byte of data):
3F8E, 4F8E, 4F8D, 4FC2, 0000, 0000, 2F01, 4F8E, 2F67, 2268, 2F69, 2E32
a. How many bits wide is the cache's tag field?
b. Assuming the cache is initially empty, what are the final contents of the cache after the program runs? Draw a diagram of the cache similar to the one on page 267 of the textbook, and fill in the tag fields.
c. Suppose a cache hit takes 20 ns, a cache miss takes 200 ns, and a memory access without a cache takes 160 ns. How much time is saved by this cache on this program over an equivalent cacheless machine running this program?
d. Give an example each of spatial and temporal memory access locality exhibited by this program.
Q. 3. An 1.5Ghz computer contains a disk which has a average seek time of 4ms (milliseconds), a rotational speed of 7200 rotations per minute, and a transfer time per 4Kbyte sector of 2ms.
a. What is the average access time of this disk for reading one sector?
b. The computer containing the above disk loads a program of size 20Kb stored in sequential sectors on the same track. No disk cache is present. Thereafter it executes the program for 500 million cycles. What percentage of the total (disk + CPU) time is spent accessing the disk? Assume an average seek time and rotational latency.
Q. 4. a. Write a DLX assembly code fragment (not a complete program) that sums all the elements of an integer array. Use register indirect addressing only for the array. Assume the base address of the array has been assigned to register R1 before the entry to the fragment, and the number of elements in the array has been assigned to register R2. Compute the sum of all the elements into register R3. Use no additional registers in your code beyond R1, R2, R3, and R4. Remember that integers are 4 bytes long. You can assume that the array contains at least one element.
b. Re-do part a, but this time using only "based-index addressing" for the array. Assume such a mode is added to DLX using a load instruction of the form LW Rx, Ry(Rz) having the semantic Rx <- MEM[Ry+Rz]. All the register constraints in part a still apply. [Hint: Your answer should have fewer instructions inside the loop as compared to part a.]