# CS 110 Computer Architecture Lecture 10:

#### Finite State Machines, Functional Units

Instructors:

Sören Schwertfeger & Chundong Wang

https://robotics.shanghaitech.edu.cn/courses/ca/20s/

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

# Levels of Representation/Interpretation



#### Outline

- Timing and Critical Path
- Finite State Machine
- Admin
- Functional Units
  - Multiplexer
  - ALU
  - Adder/ Subtractor

### **Combinational Logic Symbols**

 Common combinational logic systems have standard symbols called logic gates





AND, NAND

$$\frac{A}{B}$$



$$A \longrightarrow Z$$





Inverting versions (NOT, NAND, NOR) easiest to implement with CMOS transistors (the switches we have available and use most)

#### Truth Table Example #2: 2-bit Adder





How Many Rows?

# Truth Table Example #3: 32-bit Unsigned Adder

| A     | В     | C      | _             |
|-------|-------|--------|---------------|
| 000 0 | 000 0 | 000 00 | -             |
| 000 0 | 000 1 | 000 01 |               |
| •     | •     | •      | How           |
| •     | •     | •      | Many<br>Rows? |
| •     | •     | •      |               |
| 111 1 | 111 1 | 111 10 |               |

# Boolean Algebra: Circuit & Algebraic Simplification



$$y = ((ab) + a) + c$$

$$\downarrow = ab + a + c$$

$$= a(b+1) + c$$

$$= a(1) + c$$

$$= a + c$$

$$\downarrow$$

original circuit

equation derived from original circuit

algebraic simplification

simplified circuit

# Representations of Combinational Logic (groups of logic gates)



#### Type of Circuits

- Synchronous Digital Systems consist of two basic types of circuits:
  - Combinational Logic (CL) circuits
    - Output is a function of the inputs only, not the history of its execution
    - E.g., circuits to add A, B (ALUs)
  - Sequential Logic (SL)
    - Circuits that "remember" or store information
    - aka "State Elements"
    - E.g., memories and registers (Registers)

# Model for Synchronous Systems



- Collection of Combinational Logic blocks separated by registers
- Feedback is optional
- Clock signal(s) connects only to clock input of registers
- Clock (CLK): steady square wave that synchronizes the system
- Register: several bits of state that samples on rising edge of CLK (positive edge-triggered) or falling edge (negative edge-triggered)

#### **Accumulator Example**

Why do we need to control the flow of information?



Want: S=0;

for 
$$(i=0; i< n; i++)$$
  
 $S = S + X_i$ 

#### Assume:

- Each X value is applied in succession, one per cycle
- After n cycles the sum is present on S

#### First Try: Does this work?



#### No!

Reason #1: How to control the next iteration of the 'for' loop?

Reason #2: How do we say: 'S=0'?

### Second Try: How About This?



#### Register Internals



- n instances of a "Flip-Flop"
- Flip-flop name because the output flips and flops between 0 and 1
- D is "data input", Q is "data output"
- Also called "D-type Flip-Flop"

### Flip-Flop Operation

- Edge-triggered d-type flip-flop
  - This one is "positive edge-triggered"



 "On the rising edge of the clock, the input d is sampled and transferred to the output. At all other times, the input d is ignored."



How a flip flop works: <a href="https://www.youtube.com/watch?v=-aQH0ybMd3U">https://www.youtube.com/watch?v=-aQH0ybMd3U</a>

## Flip-Flop Timing

- Edge-triggered d-type flip-flop
  - This one is "positive edge-triggered"



- "On the rising edge of the clock, the input d is sampled and transferred to the output. At all other times, the input d is ignored."
- Example waveforms (more detail):





#### Hardware Timing Terms

- Setup Time: when the input must be stable before the edge of the CLK
- Hold Time: when the input must be stable after the edge of the CLK
- "CLK-to-Q" Delay: how long it takes the output to change, measured from the edge of the CLK

### Accumulator Timing 1/2



- Reset input to register is used to force it to all zeros (takes priority over D input).
- S<sub>i-1</sub> holds the result of the i<sup>th</sup>-1 iteration.
- Analyze circuit timing starting at the output of the register.



### Accumulator Timing 2/2



- reset signal shown.
- Also, in practice X might not arrive to the adder at the same time as S<sub>i-1</sub>
- S<sub>i</sub> temporarily is wrong, but register always captures correct value.
- In good circuits, instability never happens around rising edge of clk.



#### Maximum Clock Frequency

What is the maximum frequency of this circuit?



Max Delay = CLK-to-Q Delay + CL Delay + Setup Time

#### **Critical Paths**



Note: delay of 1 clock cycle from input to output. Clock period limited by propagation delay of adder/shifter. Pipelining to improve performance



- Insertion of register allows higher clock frequency.
- More outputs per second (higher bandwidth)
- But each individual result takes longer (greater latency)

#### Recap of Timing Terms

- Clock (CLK) steady square wave that synchronizes system
- Setup Time when the input must be stable <u>before</u> the rising edge of the CLK
- Hold Time when the input must be stable <u>after</u> the rising edge of the CLK
- "CLK-to-Q" Delay how long it takes the output to change, measured from the rising edge of the CLK
- Flip-flop one bit of state that samples every rising edge of the CLK (positive edge-triggered)
- Register several bits of state that samples on rising edge of CLK or on LOAD (positive edge-triggered)

### **Problems with Clocking**

- The clock period must be longer than the critical path
  - Otherwise, you will get the wrong answers
  - But it can be even longer than that
- Critical path:
  - clk->q time
  - Necessary to get the output of the registers
  - worst case combinational logic delay
  - Setup time for the next register
- Must meet all of these to be correct

#### Hold-Time Violations...

- An alternate problem can occur...
  - Clk->Q + best case combinational delay < Hold time...</p>
- What happens?
  - Clk->Q + data propagates...
  - And now you don't hold the input to the flip flop long enough
- Solution:
  - Add delay on the best-case path (e.g. two inverters)

## Finite State Machines (FSM) Intro

 A convenient way to conceptualize computation over time

 We start at a state and given an input, we follow some edge to another (or the same) state

- The function can be represented with a "state transition diagram".
- With combinational logic and registers, any FSM can be implemented in hardware.



#### FSM Example: 3 ones...

FSM to detect the occurrence of 3 consecutive 1's in the input.



Assume state transitions are controlled by the clock: on each clock cycle the machine checks the inputs and moves to a new state and produces a new output...

### Hardware Implementation of FSM

INPUT

... Therefore a register is needed to hold the a representation of which state the machine is in. Use a unique bit pattern for each state.



Combinational logic circuit is used to implement a function that maps from present state and input to next state and output.



#### **FSM Combinational Logic**

#### Specify CL using a truth table



#### Truth table...

| PS | Input | NS | Output |
|----|-------|----|--------|
| 00 | 0     | 00 | 0      |
| 00 | 1     | 01 | 0      |
| 01 | 0     | 00 | 0      |
| 01 | 1     | 10 | 0      |
| 10 | 0     | 00 | 0      |
| 10 | 1     | 00 | 1      |

# Representations of Combinational Logic (groups of logic gates)



#### **Admin**

- P1.1 due tomorrow
- P1.2 will be published this week
- HW3 is out

- Midterm I
  - April 6 during lecture hours
- Contents:
  - Everything till (including) Datapath (next lecture)

#### Midterm I

- Switch cell phones off! (not silent mode – off!)
  - Put them in your bags.
- Bags in the front. On the table: nothing but: pen, 1 drink, 1 snack, your student ID card and your cheat sheet!
- The RISC V green card will be provided
- No other electronic devices are allowed!
  - No ear plugs, music, smartwatch...
- Anybody touching any electronic device will FAIL the course!
- Anybody found cheating (copy your neighbors answers, additional material, ...) will FAIL the course!







# COMPUTER CREANIZATION AND DESIGN

THE HA. WARE/SOFTWA INTERFACE



**EDITION** 





DAVID A. PATTERSON JOHN L. HENNESSY

#### **Cheat Sheet**

- 1 A4 Cheat Sheet allowed (double sided)
  - Midterm II: 2 pages
  - Final: 3 pages
- Rules:
  - Hand-written not printed!
  - Your <u>name</u> in pinyin on the top!
  - Cheat Sheets not complying to this rule will be confiscated!







And the first part of the firs Addition (April Alter The State of the S Appendix Applications of a post of the Applications of the Applica and the control of th smiler use, one tail, where an overther instan 2. Hexadecimal digits: 0,12, -9, A & COEP color Carles, Al-1
charter alex, Al-1
charter are constant relat. Al-2
charter are larger alexandra color
charter are larger alexandra color
charter are larger alexandra color
constant are larger alexandra
constant are larger
constant are larger asi lanetanes 3. over-thew - (pao's complement) effective for the control of the con für-für ercation. Alles analymer falkter in outre fersion Alle. indicates palacter and course of the course Institution set menteren "ISA / Institution set menteren "ISA J-formet bet king (Sel) AL-11 Jung 1781501 (St (a) AL-1 KIB-byte, Kib-met Al-13 Last in first out - LIFO instrumently used = LAU / Sout framety and a like / before hardy from the first before hardy from the first behavior for the first behavior being a like the first being being from the first being being from the first being b met Azist Absted benefin in RISOT Az-S Ocerlina: Istermioned Az-S Az-Felative naturation PA-7 phases of instruction creation A3-8. Profess during 155 Profess dur RESC-Y A2-84 Cipline author-based A3-2 Printains lessons A2-83 points/ A1-10 Moule-instruction replacement, 62educal restriction art properties RLSC APPLY resister consections ALS register our RLSC-V AM relation in adaptate A - So relating that for lated ATL / e-state that a - Rob AST relation that a - Rob AST frequential page of ATL frequential page of ATL solution to the Company of the late of the Company of the late of the Company of the fact page of the ATL fact page of ATL fact page of the ATL fa Thing for influence A1-51 to 61 code capables A1-55 tours completion A1-55 tours completion A1-55 tours completion A1-2 specific A1-9 section physical representation Alta xecommendation Alta xecommendation Alta ~x: incise ofx 1 1-00 int division A328 int multiple A327 mauro (hóng) 82-8 performance of CPU A325 represent too/0 B2-3 represent denorms B2-5 represent Hoat IEEE 154 N-2 represent Houting-point = FP 82represent traction As-24 represent Notaminiber = New 81 represent scientific notationas-a system performance evaluation AS-26 CORPLATIVE



### **Building Standard Functional Units**

- Data multiplexers
- Arithmetic and Logic Unit
- Adder/ Subtractor

# Data Multiplexer ("Mux") (here 2-to-1, n-bit-wide)



#### N instances of 1-bit-wide mux

**How many rows in TT?** 



$$c = \overline{s}a\overline{b} + \overline{s}ab + s\overline{a}b + sab$$

$$= \overline{s}(a\overline{b} + ab) + s(\overline{a}b + ab)$$

$$= \overline{s}(a(\overline{b} + b)) + s((\overline{a} + a)b)$$

$$= \overline{s}(a(1) + s((1)b))$$

$$= \overline{s}a + sb$$

#### How do we build a 1-bit-wide mux?



## 4-to-1 multiplexer?



$$e = \overline{s_1}\overline{s_0}a + \overline{s_1}s_0b + s_1\overline{s_0}c + s_1s_0d$$

## Another way to build 4-1 mux?



**Answer: Hierarchically!** 

## **Arithmetic and Logic Unit**

- Most processors contain a special logic block called the "Arithmetic and Logic Unit" (ALU)
- We'll show you an easy one that does ADD,
   SUB, bitwise AND, bitwise OR



when S=00, R=A+B when S=01, R=A-B when S=10, R=A AND B when S=11, R=A OR B

## Our simple ALU



#### Question

Convert the truth table to a boolean expression (no need to simplify):

A: 
$$F = xy + x(^{\sim}y)$$

B: 
$$F = xy + (^x)y + (^x)(^y)$$

C: 
$$F = (^x)y + x(^y)$$

D: 
$$F = xy + (^x)y$$

E: 
$$F = (x+y)(^x+^y)$$

| X | y | F(x,y) |
|---|---|--------|
| 0 | 0 | 0      |
| 0 | 1 | 1      |
| 1 | 0 | 0      |
| 1 | 1 | 1      |

## How to design Adder/Subtractor?

- Truth-table, then determine canonical form, then minimize and implement as we've seen before
- Look at breaking the problem down into smaller pieces that we can cascade or hierarchically layer

## Adder/Subtractor – One-bit adder LSB...

| - | $\mathbf{a}_0$ | $b_0$ | $\mathbf{s}_0$ | $c_1$ |
|---|----------------|-------|----------------|-------|
| _ | 0              | 0     | 0              | 0     |
|   | 0              | 1     | 1              | 0     |
|   | 1              | 0     | 1              | 0     |
|   | 1              | 1     | 0              | 1     |

$$s_0 = c_1 = c_1 = c_1$$

## Adder/Subtractor – One-bit adder (1/2)...

|   |                |                |                |       |   | $\mathbf{a}_i$ | $b_i$ | $c_i$ | $s_i$ | $c_{i+1}$ |
|---|----------------|----------------|----------------|-------|---|----------------|-------|-------|-------|-----------|
|   | Ca             | Ca             | Ca             |       |   | 0              | 0     | 0     | 0     | 0         |
|   |                | $\mathbf{c}_2$ |                |       |   | 0              | 0     | 1     | 1     | 0         |
|   | $\mathbf{a}_3$ | $\mathbf{a}_2$ |                |       |   | 0              | 1     | 0     | l     |           |
| + | $b_3$          | $b_2$          | $b_1$          | $b_0$ |   | 0              | 1     | 1     | 0     | 1         |
|   | <b>S</b> 3     | $\mathbf{s}_2$ | $\mathbf{s}_1$ | $s_0$ | • | 1              | 0     | 0     | ı     |           |
|   | 0              | 2              | 1              | j     |   | 1              | 0     | 1     | 0     | 1         |
|   |                |                |                |       |   | 1              | 1     | 0     | 0     | 1         |
|   |                |                |                |       |   | 1              | 1     | 1     | 1     | 1         |

$$s_i = c_{i+1} =$$

#### Adder/Subtractor – One-bit adder (2/2)





$$s_i = XOR(a_i, b_i, c_i)$$
  
 $c_{i+1} = MAJ(a_i, b_i, c_i) = a_i b_i + a_i c_i + b_i c_i$ 

#### N 1-bit adders => 1 N-bit adder



What about overflow? Overflow =  $c_n$ ?

#### **Extremely Clever Subtractor:**

$$s = a + (-b)$$



#### **Domino Adder**



Explanation: <a href="https://www.youtube.com/watch?v=INuPy-r1GuQ">https://www.youtube.com/watch?v=INuPy-r1GuQ</a>

4-bit adder: <a href="https://www.youtube.com/watch?v=OpLU">https://www.youtube.com/watch?v=OpLU</a> bhu2w&t=0s

#### Question



Clock->Q 1ns
Setup 1ns
Hold 1ns
AND delay 1ns

What is maximum clock frequency?

• A: 5 GHz

• B: 500 MHz

• C: 200 MHz

• D: 250 MHz

• E: 1/6 GHz

#### In Conclusion

- Finite State Machines have clocked state elements plus combinational logic to describe transition between states
  - Clocks synchronize D-FF change (Setup and Hold times important!)
- Standard combinational functional unit blocks built hierarchically from subcomponents