## Computer Architecture I Midterm II

Chinese Name:

Pinyin Name:

E-Mail ... @shanghaitech.edu.cn:

| Question | Points | Score |
|----------|--------|-------|
| 1        | 18     |       |
| 2        | 28     |       |
| 3        | 20     |       |
| 4        | 16     |       |
| 5        | 18     |       |
| Total:   | 100    |       |

- This test contains 7 numbered pages, including the cover page. The back of each page is blank and can be used for scratch-work, but will not be looked at for grading.
- Put your pinyin name on the top of every page.
- Please turn **off** all cell phones, smartwatches, and other mobile devices. Remove all hats and headphones. Put everything in your backpack. Place your backpacks, laptops and jackets under your seat.
- You have 85 minutes to complete this exam. The exam is closed book; no computers, phones, or calculators are allowed. You may use one A4 page (front and back) of notes in addition to the provided green sheet.
- The estimated time needed for each of the 5 topics is given in parenthesis. The total estimated time is 75 minutes.
- There may be partial credit for incomplete answers; write as much of the solution as you can. We will deduct points if your solution is far more complicated than necessary. When we provide a blank, please fit your answer within the space provided.
- Answer all questions in English. Answers in Chinese get 50% of the score deducted.

1. Synchronous Digital Systems (20 minutes)

Consider the finite state machine circuit shown below. It has one input, x, and one output, y. A clock signal (not shown in the circuit diagram) is connected to each flip-flop and has a period of 6 ns.



Figure 1

The xor gate and the inverter each have a propagation delay of 1ns. Flip-flops are *positive* edge-triggered and have a set-up time and clock-to-q delay of 1ns each.

(a) Assume the flip-flips both start out storing a 0 logic value. An input signal is applied as shown below. Neatly, draw the waveforms for the output signal, y, and the signal at the node labeled s1. (You might want to practice first. This might take a little time to get right - maybe do this last).



4

(b) In the diagram below, finish drawing in the state transition diagram for the circuit shown on the previous page (reproduced above). The bubbles below represent the possible states of the circuit as represented by the values held in the flip-flops, s1s0.

The arcs indicate how the finite state machine transitions from one state to another on each clock cycle, based on the input value. Each arc is labeled with the input value (x) that causes the transition (on the rising edge of the clock). FO instance, if the circuit is in the 00 state and x=1, then on the rising edge of the clock the circuit moves to state 01.

| Curr1Curr0 | input | Next1Next0 |
|------------|-------|------------|
| 00         | 1     | 10         |
| 00         | 1     | 01         |
| 01         | 0     | 00         |
| 01         | 1     | 11         |
| 10         | 0     | 11         |
| 10         | 1     | 00         |
| 11         | 0     | 01         |
| 11         | 1     | 10         |

(c) What is the maximum clock frequency for the correct operation of this circuit? Assume that on all cycles where the input signal (x), changes, it changes 1ns after the rising edges of the clock.

(c) \_\_\_\_\_\_ need to wait 4ns => 250 MHz

8

## 2. MIPS Datapath (15 minutes):



Figure 2: datapath stages

Consider adding the following instruction to MIPS (disregard any existing definitions you may see on the green sheet):

| Instruction      | Operation                                     |
|------------------|-----------------------------------------------|
| movz rd, rs, rt  | $if(RF[rs] == 0) RF[rd] \leftarrow RF[rt]$    |
| movnz rd, rs, rt | if(RF[rs] $!= 0$ ) RF[rd] $\leftarrow$ RF[rt] |

(a) Translate the following C code using movz and movnz. Do not use branches.

| C code                       | MIPS                   |
|------------------------------|------------------------|
| // a->\$s0, b->\$s1, c->\$s2 | slt \$t0, \$s1, \$s2   |
| int a = b < c ? b : c;       | movnz \$s0, \$t0, \$s1 |
|                              | movz \$s0, \$t0, \$s2  |

(b) Implement <u>movz</u> (but not movnz) in the datapath. Choose the correct implementation for (a), (b), and (c). Note that you do not need to use all the signals provided to each box, and the control signal MOVZ is 1 if and only if the instruction is <u>movz</u>
(a) <u>3</u>



(b)<u>2</u>

(c)<u>1</u>



6

5

12



- (c) Name each datapath stage in Fig. 2 (fill in the boxes in Fig. 2)
- (d) Implement the next PC logic in figure 2 (draw the circuit maybe first try on a blank piece of paper. You can get a new page 4 with the Datapath if you need it be sure to submit only one page!)
- (e) Generate the control signals for **movz**. The values should be 0, 1, or X (don't care) terms. You **must** use don't care terms where possible.

| MOVZ | RegDst | ExpOp | RegWr | ALUSrc | ALUCtr              | MEMWr | MemToReg | Jump | Branch |
|------|--------|-------|-------|--------|---------------------|-------|----------|------|--------|
| 1    | 1      | Х     | 0     | 0      | 0001, 0010, or 0110 | 0     | х        | 0    | 0      |

This table shows the ALUCtr values for each operation of the ALU:

| Operation | AND  | OR   | ADD  | SUB  | SLT  | NOR  |
|-----------|------|------|------|------|------|------|
| ALUCtr    | 0000 | 0001 | 0010 | 0110 | 0111 | 1100 |

- 3. Cache Operations (10 minutes)
- (a) Consider a 32-bit physical memory space. We have 2 possible data caches: cache X is a 64 KB 4-way associative cache with LRU replacement, while cache Y is a 64 KB direct-mapped cache. Both caches' block sizes are 32B. Write the number of bits in the tag and index fields in the figure below.

| Cache | Tag bits | Index bits | Offset bits |
|-------|----------|------------|-------------|
| X     |          |            |             |
|       |          |            |             |
| Y     |          |            |             |
|       |          |            |             |

| S | ol | lut | ion: |  |
|---|----|-----|------|--|
|   | ~  | ~   | _    |  |

18, 9, 5 16, 11, 5

4

4

4

8

- (b) Assume we are using cache X in part (a).
  - int ARRAY\_SIZE = 64\*1024;
  - int arr[ARRAY\_SIZE]; // \*arr is aligned to a cache block
  - /\* loop 1 \*/ for (int i = 0; i<ARRAY\_SIZE; i+=4) arr[i] = i;
  - /\* loop 2 \*/ for (int i = ARRAY\_SIZE 4; i>=0; i=4) arr[i+1] = arr[i];
  - (a) What is the hit rate of loop 1? What types of misses (of the 3Cs), if any, occur as a result of loop 1?

Solution: 50% hit rate, Compulsory misses.

(b) What is the hit rate of loop 2? What types of misses (of the 3Cs), if any, occur as a result of loop 2?

Solution: 13/16 hit rate, Capacity misses.

4. AMAT Calculation (10 minutes)

Suppose your system consists of:

- A L1 cache that hits in 2 cycles and has a local miss rate of 20%.
- A L2 cache that hits in 10 cycles and has a global miss rate of 5%.

Main memory hits in 100 cycles.

(a) What is the local miss rate of L2 cache? (write down how you calculated it - not just a number)

**Solution:** Local miss rate = 5% / 20% =0.25= 25%.

(b) What is the AMAT of the system? (write down how you calculated it - not just a number)

**Solution:** AMAT =  $2 + 20\% \times 10 + 5\% \times 100 = 9$ 

(c) Suppose we want to reduce the AMAT of the system to 8 or lower by adding in a L3 cache. If the L3 cache has a local miss rate of 30%, what is the largest hit time that the L3 cache can have?

**Solution:** Let H = hit time of the cache. Using the AMAT equation, we can write:  $2 + 20\% \text{ x} (10 + 25\% \text{ x} (\text{ H} + 30\% \text{ x} 100)) \le 8$ Solving for H, we find that H 50. So the largest hit time is 50 cycles.

12

Midterm II, Page 7 of 7

- 5. Number Representation (20 minutes)
- (a) Given 0b10110100, what is this number in decimal if we are using

Unsigned180Signed magnitude-521's complement-752's complement-76

(b) We are defining 8-bit floating-point precision (cs110-precision), with the following format:

Sign (1 bit) | Exponent (3 bits) | Fraction (4 bits)

Assuming that cs110-precision follows the same philosophy as single and double precision defined by IEEE 754 standard.

(a) What should be the bias for this 3-bit exponent in cs110-precision? Leave your answer in decimal.

## 3

(b) What is the binary representation of the smallest cs110-precision float which is strictly larger than 1? What are its values in binary and decimal?

ob <u>0011 0001</u> =  $1 + 2^{-4} = 1.0625$ 

(c) Conversions between decimal and cs110-precision.

 $1.25_{10} = \underline{0b0011\ 0100} \qquad 0b1001\ 0100 = \underline{-1.25 \times 2^{-2} = -0.3125}$ 

(d) What numerical errors can occur when representing numbers as floating-point (name two)?

Solution: rounding error, overflow, underflow

(e) Do you still remember that we discussed the general method of converting a decimal number into the floating-point in the first discussion? Then can you convert  $13.45_{10}$  into cs110-precision in binary? (This is an optional problem, but you will get bonus if you can give the correct answer or show the method.)

**Solution:** See the first discussion.