## CS 110 Computer Architecture Lecture 11: *Finite State Machines, Functional Units*

Instructors: Sören Schwertfeger & Chundong Wang

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

## Midterm I

- Postponed to April 19
  - Re-evaluate situation on April 12...
- Contents stay the same:
   Everything till (including) Datapath (next lecture)

#### Levels of Representation/Interpretation



## Outline

- Timing and Critical Path
- Finite State Machine
- Admin
- Functional Units
  - Multiplexer
  - ALU
  - Adder/ Subtractor

## Representations of Combinational Logic (groups of logic gates)



## Type of Circuits

- Synchronous Digital Systems consist of two basic types of circuits:
  - Combinational Logic (CL) circuits
    - Output is a function of the inputs only, not the history of its execution
    - E.g., circuits to add A, B (ALUs)
  - Sequential Logic (SL)
    - Circuits that "remember" or store information
    - aka "State Elements"
    - E.g., memories and registers (Registers)

## Model for Synchronous Systems



- Collection of Combinational Logic blocks separated by registers
- Feedback is optional
- Clock signal(s) connects only to clock input of registers
- Clock (CLK): steady square wave that synchronizes the system
- Register: several bits of state that samples on rising edge of CLK (positive edge-triggered) or falling edge (negative edge-triggered)

## **Accumulator Example**

Why do we need to control the flow of information?



Assume:

- Each X value is applied in succession, one per cycle
- After n cycles the sum is present on S

## First Try: Does this work?



#### No!

Reason #1: How to control the next iteration of the 'for' loop? Reason #2: How do we say: 'S=0'?

## Second Try: How About This?



## **Register Internals**



- n instances of a "Flip-Flop"
- Flip-flop name because the output flips and flops between 0 and 1
- D is "data input", Q is "data output"
- Also called "D-type Flip-Flop"

## **Flip-Flop Operation**

- Edge-triggered d-type flip-flop

   This one is "positive edge-triggered"
- "On the rising edge of the clock, the input d is sampled and transferred to the output. At all other times, the input d is ignored."



How a flip flop works: <u>https://www.youtube.com/watch?v=-aQH0ybMd3U</u>

## Flip-Flop Timing

- Edge-triggered d-type flip-flop

   This one is "positive edge-triggered"
- "On the rising edge of the clock, the input d is sampled and transferred to the output. At all other times, the input d is ignored."
- Example waveforms (more detail):



## Hardware Timing Terms

- Setup Time: when the input must be stable before the edge of the CLK
- Hold Time: when the input must be stable after the edge of the CLK
- "CLK-to-Q" Delay: how long it takes the output to change, measured from the edge of the CLK

## Accumulator Timing 1/2



- Reset input to register is used to force it to all zeros (takes priority over D input).
- S<sub>i-1</sub> holds the result of the i<sup>th</sup>-1 iteration.
- Analyze circuit timing starting at the output of the register.



## Accumulator Timing 2/2

XotXitXz

XotX.

XЬ



Si

- reset signal shown.
- Also, in practice X might not arrive to the adder at the same time as S<sub>i-1</sub>
- S<sub>i</sub> temporarily is wrong, but register always captures correct value.
- In good circuits, instability never happens around rising edge of clk.

Xu

Ladd

LCLK-TO-9

## Maximum Clock Frequency

• What is the maximum frequency of this circuit?



Hint: Frequency = 1/Period

Max Delay = CLK-to-Q Delay + CL Delay + Setup Time

## **Critical Paths**



Note: delay of 1 clock cycle from input to output. Clock period limited by propagation delay of adder/shifter.

#### Pipelining to improve performance Timing...



- Insertion of register allows higher clock frequency.
- More outputs per second (higher bandwidth)
- But each individual result takes longer (greater latency) <sup>19</sup>

## Recap of Timing Terms

- Clock (CLK) steady square wave that synchronizes system
- Setup Time when the input must be stable <u>before</u> the rising edge of the CLK
- Hold Time when the input must be stable <u>after</u> the rising edge of the CLK
- "CLK-to-Q" Delay how long it takes the output to change, measured from the rising edge of the CLK
- Flip-flop one bit of state that samples every rising edge of the CLK (positive edge-triggered)
- Register several bits of state that samples on rising edge of CLK or on LOAD (positive edge-triggered)

## Question

- Assert (select) the statements that are true:
- A. The Hold Time is essential to calculating the Max Delay of a circuit.
- B. Modern Processors use a lower V<sub>dd</sub>, because it allows a higher clock frequency
- C. Sequential Logic has elements that store information
- D. A Register is a memory element in a CPU

## **Problems with Clocking**

- The clock period *must be* longer than the critical path
  - Otherwise, you will get the wrong answers
  - But it can be even longer than that
- Critical path:
  - clk->q time
  - Necessary to get the output of the registers
  - worst case combinational logic delay
  - Setup time for the next register
- Must meet all of these to be correct

## Hold-Time Violations...

- An alternate problem can occur...
  - Clk->Q + best case combinational delay < Hold time...</p>
- What happens?
  - Clk->Q + data propagates…
  - And now you don't hold the input to the flip flop long enough
- Solution:

- Add delay on the best-case path (e.g. two inverters)

## Finite State Machines (FSM) Intro

- A convenient way to conceptualize computation over time
- We start at a state and given an input, we follow some edge to another (or the same) state
- The function can be represented with a "state transition diagram".
- With combinational logic and registers, any FSM can be implemented in hardware.



#### FSM Example: 3 ones... FSM to detect the occurrence of 3 consecutive 1's in the input. INPUT a 1 1 6 1 1 1 0 1 1 1 0 1 1 1 1 0 OUTPUT Input/output 1/1 Draw the FSM... 0/0 10 10

Assume state transitions are controlled by the clock: on each clock cycle the machine checks the inputs and moves to a new state and produces a new output...

## Hardware Implementation of FSM

... Therefore a register is needed to hold the a representation of which state the machine is in. Use a unique bit pattern for each state.



## **FSM Combinational Logic**

#### Specify CL using a truth table



#### Truth table...

| PS | Input | NS | Output |
|----|-------|----|--------|
| 00 | 0     | 00 | 0      |
| 00 | 1     | 01 | 0      |
| 01 | 0     | 00 | 0      |
| 01 | 1     | 10 | 0      |
| 10 | 0     | 00 | 0      |
| 10 | 1     | 00 | 1      |

## Representations of Combinational Logic (groups of logic gates)



## **Building Standard Functional Units**

- Data multiplexers
- Arithmetic and Logic Unit
- Adder/ Subtractor



## N instances of 1-bit-wide mux How many rows in TT? $c = \overline{s}a\overline{b} + \overline{s}ab + s\overline{a}b + sab$ $= \overline{s}(ab+ab) + s(\overline{a}b+ab)$ $= \overline{s}(a(\overline{b}+b)) + s((\overline{a}+a)b)$ $= \overline{s}(a(1) + s((1)b))$ $=\overline{s}a+sb$

## How do we build a 1-bit-wide mux?

 $\overline{s}a + sb$ 





Another way to build 4-1 mux?



34

## Arithmetic and Logic Unit

- Most processors contain a special logic block called the "Arithmetic and Logic Unit" (ALU)
- We'll show you an easy one that does ADD, SUB, bitwise AND, bitwise OR



when S=00, R=A+B when S=01, R=A-B when S=10, R=A AND B when S=11, R=A OR B

## **Our simple ALU**



## Question

Convert the truth table to a boolean expression (no need to simplify):

A:  $F = xy + x(\sim y)$ B:  $F = xy + (\sim x)y + (\sim x)(\sim y)$ C:  $F = (\sim x)y + x(\sim y)$ D:  $F = xy + (\sim x)y$ 

E:  $F = (x+y)(^x+^y)$ 

xyF(x,y)000011100111

## How to design Adder/Subtractor?

- Truth-table, then determine canonical form, then minimize and implement as we've seen before
- Look at breaking the problem down into smaller pieces that we can cascade or hierarchically layer

## Adder/Subtractor – One-bit adder LSB...

|   |                       |                       |                |                       | . | $\mathbf{a}_0$ | $b_0$ | $\mathbf{s}_{0}$ | $c_1$ |
|---|-----------------------|-----------------------|----------------|-----------------------|---|----------------|-------|------------------|-------|
|   |                       | $a_2$                 |                |                       | - | 0              | 0     | 0                | 0     |
| + | $b_3$                 | $b_2$                 | $b_1$          | $b_0$                 |   | 0              | 1     | 1                | 0     |
|   | <b>S</b> <sub>3</sub> | <b>s</b> <sub>2</sub> | $\mathbf{S}_1$ | <b>S</b> <sub>0</sub> |   |                | 0     |                  |       |
|   | 0                     | 2                     | Ť              |                       | J | 1              | 1     | 0                | 1     |

$$s_0 = c_1 = c_1 = c_1$$

# Adder/Subtractor – One-bit adder (1/2)...

|   |            |                       |                |                       | $\mathbf{a}_i$ | $b_i$ | $c_i$ | $\mathbf{s}_i$ | $c_{i+1}$ |
|---|------------|-----------------------|----------------|-----------------------|----------------|-------|-------|----------------|-----------|
|   | Ca         | Ca                    | C.             |                       | 0              | 0     | 0     | 0              | 0         |
|   |            | <b>c</b> <sub>2</sub> |                |                       | 0              | 0     | 1     | 1              | 0         |
|   | $a_3$      |                       | $a_1$          | $a_0$                 | 0              | 1     | 0     | 1              | 0         |
| + | $b_3$      | $b_2$                 | $b_1$          | $b_0$                 | 0              | 1     | 1     | 0              | 1         |
|   | <b>S</b> 3 | $\mathbf{s}_2$        | $\mathbf{s}_1$ | <b>S</b> <sub>0</sub> | 1              |       | 0     |                |           |
|   | 0          | -                     | -              | J                     | 1              | 0     | 1     | 0              | 1         |
|   |            |                       |                |                       | 1              | 1     | 0     | 0              | 1         |
|   |            |                       |                |                       | 1              | 1     | 1     | 1              | 1         |

$$s_i =$$
  
 $c_{i+1} =$ 

## Adder/Subtractor – One-bit adder (2/2)



 $c_{i+1} = MAJ(a_i, b_i, c_i) = a_i b_i + a_i c_i + b_i c_i$ 

#### N 1-bit adders => 1 N-bit adder



#### What about overflow? Overflow = c<sub>n</sub>?





overflow



#### **Domino Adder**



Explanation: <u>https://www.youtube.com/watch?v=INuPy-r1GuQ</u> 4-bit adder: <u>https://www.youtube.com/watch?v=OpLU\_bhu2w&t=0s</u>

## Question



Clock->Q 1ns Setup 1ns Hold 1ns AND delay 1ns

What is maximum clock frequency?

- A: 5 GHz
- B: 500 MHz
- C: 200 MHz
- D: 250 MHz
- E: 1/6 GHz

## In Conclusion

- Finite State Machines have clocked state elements plus combinational logic to describe transition between states
  - Clocks synchronize D-FF change (Setup and Hold times important!)
- Standard combinational functional unit blocks built hierarchically from subcomponents
- Next lecture: use these blocks to build datapath!