ENEE350H - Fall, 2004

Homework 5

(Due in class on Tuesday, November 2)

Practice questions (do not submit your solution):

1.   What is the main difference between a trap and an interrupt?

2.   In most computer systems, the L1 cache is usually not on the system bus; instead it has a separate dedicated link to the CPU.  By considering the behavior on a DMA transfer, describe one advantage of a having a separate link instead of a bus connection. (2-3 sentences max.)

3.   In order to perform a disk or network access, it is typically necessary for a user to have the operating system communicate with the disk or network controllers.  Suppose that in a particular 500 MHz computer, it takes 10,000 cycles to trap to the OS, 20 ms for the OS to perform a disk access, and 25 us for the OS to perform a network access.

(a)  In a disk access, what percentage of the delay time is spent in trapping to the OS?  How about in a network access?

(b)  Suppose that we can somehow reduce the time for the OS to communicate with the disk controller by 60%, and we can reduce the time for the OS to communicate with the network by 40%.  By what percentage can we reduce the total time for a network access?  By what percentage can we reduce the total time for a disk access?  Is it worthwhile for us to spend a lot of effort improving the OS interrupt latency in a computer that performs many disk accesses?  How about in a computer that performs many network accesses?

4.  A 32-bit DLX computer transfers a 512-byte sector (128 words) from the hard disk to the main memory by using DMA.  The disk reads the sector from the platter into a special memory called the disk cache which is part of the disk controller.  Thereafter the data is transmitted from the disk cache to the main memory via the bus by using DMA.  Compute this total DMA transfer time.  The following are the parameters we know:

• The bus is 4 words wide and the time to propagate a signal on the bus is 4 processor cycles.

• The disk cache is 1 word wide and has an access time of 1 processor cycle.

• The main memory is 16 words wide and has an access time of 80 processor cycles.

The process of transfer is as follows.  First, four words are read from the disk cache into hardware registers.  When done, they are sent together on the bus in one transfer.  When four successive bus transfers are received at the memory, they are written together to the memory.  Different stages in this process are overlapped (pipelined) to the extent possible -- carefully think through what can be overlapped

5.    Consider the identical computer to the one in question 4, but now the sectors are transferred from disk cache to main memory by using processor-involved software transfers, instead of DMA.  The transfers are done by a loop in assembly for the form:

_loop: LW  ...                    /* from memory-mapped disk cache address */

SW  ...                    /* to main memory */

ADDI ...                 /* register increment for address register to main memory */

BNEZ ..., _Loop     /* branch back to LW for next iteration for next word */

Memory mapped locations are not cached by the processor caches.  Further, the processor caches use write through for main memory locations.  The branch predictor is perfect.  What is the total transfer time in software for the sector?  How many times faster is DMA in question 4 compared to this software transfer?

6.    A DLX compiler wants to perform a conditional branch of the form ‘BEQZ Rx, long_disp’, where long_disp is a 26-bit pc-relative displacement constant.  Unfortunately in DLX, conditional branch displacements are restricted to 16 bits, so this is not directly possible. Write a sequence of DLX instructions that effectively perform a 26-bit pc-relative branch to long_disp.  (Hint: use the J unconditional jump instruction as part of your sequence.  It allows for displacements up to 26 bits.)