#### CS 110 Computer Architecture Lecture 6: *RISC-V Instruction Formats*

Instructors: Sören Schwertfeger & Chundong Wang

https://robotics.shanghaitech.edu.cn/courses/ca/20s/

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

## RISC-V ISA so far...

- Registers we know so far (All of them!)
  - a0-a7 for function arguments, a0-a1 for return values
  - sp, stack pointer, ra return address
  - s0-s11 saved registers
  - t0-t6 temporaries
  - zero
- Instructions we know:
  - Arithmetic: add, addi, sub
  - Logical: sll, srl, slli, srli, slai, and, or, xor, andi, ori, xori
  - Decision: beq, bne, blt, bge
  - Unconditional branches (jumps): j, jr
  - Functions called with jal, return with jr ra.
- The stack is your friend: Use it to save anything you need. Just leave it the way you found it!

## 12 Shift Instructions...

- Two versions of of all shift instructions. Shift amount via:
  - Register
  - Immediate
- (On RV64: additional "word" version of instruction: only works on first 32bit of 64bit register)
- Shift Left
- Shift Right Arithmetic: Fill upper bits with msb
- Shift Right Logic: Fill upper bits with 0's

| sll,sllw      | R    | Shift Left (Word)              | $R[rd] = R[rs1] \iff R[rs2]$             | 1)   |
|---------------|------|--------------------------------|------------------------------------------|------|
| slli,slliw    | Ι    | Shift Left Immediate (Word)    | $R[rd] = R[rs1] \ll imm$                 | 1)   |
| sra, sraw     | R    | Shift Right Arithmetic (Word)  | $R[rd] = R[rs1] \gg R[rs2]$              | 1,5) |
| srai,sraiw    | Ι    | Shift Right Arith Imm (Word)   | R[rd] = R[rs1] >> imm                    | 1,5) |
| srl,srlw      | R    | Shift Right (Word)             | R[rd] = R[rs1] >> R[rs2]                 | 1)   |
| srli,srliw    | Ι    | Shift Right Immediate (Word)   | R[rd] = R[rs1] >> imm                    | 1)   |
| Notes: 1) The | Work | I vargion only operator on the | wightwoat 27 hits of a 64 hit was istory |      |

*Solution of the fight of the sign of the* 



bottom of stack frame is
When procedure ends, stack frame is tossed off the stack; frees memory for future stack frames

#### Leaf Function Example

```
int Leaf
  (int g, int h, int i, int j)
{
    int f;
    f = (g + h) - (i + j);
    return f;
}
```

- Parameter variables g, h, i, and j in argument registers a0, a1, a2, and a3, and f in s0
- Assume need one temporary register  ${\tt s1}$

#### **RISC-V Code for Leaf()**

Leaf:

| addi<br>sw<br>sw       |                                  | (sp) | <pre># adjust stack for 2 items # save s1 for use afterwards # save s0 for use afterwards</pre>                                                          |
|------------------------|----------------------------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| add<br>add<br>sub      | <b>s1</b> , a2                   | , a3 | <pre># f = g + h # s1 = i + j # return value (g + h) - (i + j)</pre>                                                                                     |
| lw<br>lw<br>addi<br>jr | s0, 0(<br>s1, 4(<br>sp, sp<br>ra | (sp) | <pre># restore register s0 for caller<br/># restore register s1 for caller<br/># adjust stack to delete 2 items<br/># jump back to calling routine</pre> |

Nested Procedures (1/2)

- int sumSquare(int x, int y) {
   return mult(x,x)+ y;
  }
- Something called sumSquare, now sumSquare is calling mult
- So there's a value in ra that sumSquare wants to jump back to, but this will be overwritten by the call to mult

Need to save **sumSquare** return address before call to **mult** 

### Nested Procedures (2/2)

- In general, may need to save some other info in addition to ra.
- When a C program is run, there are 3 important memory areas allocated:
  - Static: Variables declared once per program, cease to exist only after execution completes - e.g., C globals
  - Heap: Variables declared dynamically via **malloc**
  - Stack: Space to be used by procedure during execution; this is where we can save register values

#### The "ABI" Conventions & Mnemonic Registers

- The "Application Binary Interface" defines our 'calling convention'
  - How to call other functions
- A critical portion is "what do registers mean by convention"
  - We have 32 registers, but how are they used
- Who is responsible for saving registers?
  - ABI defines a contract: When you call another function, that function promises *not* to overwrite certain registers
- We also have more convenient names based on this
   So going forward, no more x3, x6... type notation

### Register Conventions (1/2)

- Calle<u>R</u>: the calling function
- Calle<u>E</u>: the function being called
- When callee returns from executing, the caller needs to know which registers may have changed and which are guaranteed to be unchanged.
- Register Conventions: A set of generally accepted rules as to which registers will be unchanged after a procedure call (jal) and which may be changed.

## Register Conventions (2/2)

To reduce expensive loads and stores from spilling and restoring registers, RISC-V function-calling convention divides registers into two categories:

- 1. Preserved across function call
  - Caller can rely on values being unchanged
  - sp, gp, tp, "saved registers" s0- s11 (s0 is also fp)
- 2. Not preserved across function call
  - Caller *cannot* rely on values being unchanged
  - Argument/return registers a0-a7, ra,
     "temporary registers" t0-t6

#### **RISC-V Symbolic Register Names**

Numbers: hardware understands

## REGISTER NAME, USE, CALLING CONVENTION

| REGISTER | NAME   | USE                              | SAVER  |
|----------|--------|----------------------------------|--------|
| x 0      | zero   | The constant value 0             | N.A.   |
| x1       | ra     | Return address                   | Caller |
| x2       | sp     | Stack pointer                    | Callee |
| xЗ       | db     | Global pointer                   |        |
| x4       | tp     | Thread pointer                   |        |
| x5-x7    | t0-t2  | Temporaries                      | Caller |
| x8       | s0/fp  | Saved register/Frame pointer     | Callee |
| x9       | sl     | Saved register                   | Callee |
| x10-x11  | a0-a1  | Function arguments/Return values | Caller |
| x12-x17  | a2-a7  | Function arguments               | Caller |
| x18-x27  | s2-s11 | Saved registers                  | Callee |
| x28-x31  | t3-t6  | Temporaries                      | Caller |

Human-friendly symbolic names in assembly code

#### **RISC-V Green Card**

3

#### PSEUDO INSTRUCTIONS

|                     |             |                  |                |                                                                 | J            |
|---------------------|-------------|------------------|----------------|-----------------------------------------------------------------|--------------|
| MNEMONIC            | NAME        | 1                | DESCRIPT       | ION                                                             | USES         |
| beqz                | Branch = ze | iro i            | if(R[rs1]      | 0) PC=PC+{imm,1b'0}                                             | beg          |
| bnez                | Branch ≠ ze | ero if(R[rs1]!=0 |                | )) PC=PC+{imm,1b'0}                                             | bne          |
| fabs.s,fabs.d       | Absolute Vi | alue I           | F[rd] = (F[r   | s1]<0)?-F[rs1]:F[rs1]                                           | fsgnx        |
| fmv.s,fmv.d         | FP Move     | 1                | F[rd] = F[rs   | 11                                                              | fsgnj        |
| fneg.s,fneg.d       | FP negate   | 1                | F[rd] = -F[rd] | [[18]                                                           | fsgnjn       |
| Ĵ                   | Jump        | 1                | PC = {imm      | ,16'0}                                                          | jal          |
| jr                  | Jump regist | er l             | PC = R[rs1]    | Í Í                                                             | jalr         |
| la                  | Load addres | ss l             | R[rd] = add    | ress                                                            | auipc        |
| li                  | Load imm    | 1                | R[rd] - imr    | n                                                               | addi         |
| πv                  | Move        | 1                | R[rd] = R[r    | s1]                                                             | addi         |
| neg                 | Negate      | 1                | R[rd] = -R     | rs1]                                                            | sub          |
| nop                 | No operatio | n l              | R[0] = R[0]    |                                                                 | addi         |
| not                 | Not         | 1                | R[rd] = -R     | rs1]                                                            | xori         |
| ret                 | Return      | 1                | PC = R[1]      |                                                                 | jalr         |
| seqz                | Set = zero  | 1                | R[rd] = (R[    | rs1]== 0) ? 1 : 0                                               | sltiu        |
| snez                | Set≠zero    | 1                | R[rd] = (R[    | rs1]!= 0) ? 1 : 0                                               | sltu         |
|                     | CODE        | ompromis         | AL OPP         |                                                                 | 0            |
| ARITHMETIC          | CORE IN     | STRUCTIC         | ON SET         |                                                                 | U U          |
| RV64M Multiply      | Extension   |                  |                |                                                                 |              |
| MNEMONIC            | FMT         | NAME             |                | DESCRIPTION (in Verilog)                                        | NOTE         |
| mul, mulw           | R           | MULtiply (Wo     | rd)            | R[rd] = (R[rs1] * R[rs2])(63:0)                                 | D            |
| mulh                |             | MULtiply High    |                | R[rd] = (R[rs1] * R[rs2])(127:64)                               | ~            |
| mulhu               |             | MULtiply High    |                |                                                                 |              |
|                     |             |                  |                | R[rd] = (R[rs1] * R[rs2])(127:64)                               | 2)           |
| mulhsu              | R           | MULtiply upper   | Half Sign/Uns  | R[rd] = (R[rs1] * R[rs2])(127:64)                               | 6)           |
| div,divw            | R           | DIVide (Word)    |                | R[rd] = (R[rs1] / R[rs2])                                       | 1)           |
| divu                | R           | DIVide Unsign    | ed             | R[rd] = (R[rs1] / R[rs2])                                       | 2)           |
| rem, remw           | R           | REMainder (W     | (ord)          | R[rd] = (R[rs1] % R[rs2])                                       | Ď            |
| remu, remuw         | R           | REMainder Un     | signed         | R[rd] = (R[rs1] % R[rs2])                                       | 1.2)         |
| renu, renuw         | ĸ           | (Word)           | ing work       | K[tu] - (K[ts1] % K[ts2])                                       | (کر1         |
| RV64A Atomtic E     | xtension    |                  |                |                                                                 |              |
| amoadd.w, amoad     |             | ADD              |                | R[rd] = M[R[rs1]],                                              | 9            |
|                     |             |                  |                | M[R[rs1]] = M[R[rs1]] + R[rs2]                                  |              |
| amoand.w, amoan     | d.d R       | AND              |                | R[rd] = M[R[rs1]],                                              | 9            |
|                     |             |                  |                | M[R[rs1]] = M[R[rs1]] & R[rs2]                                  |              |
| amomax.w, amoma:    | x.d R       | MAXimum          |                | R[rd] = M[R[rs1]],                                              | 5            |
|                     |             |                  |                | if (R[rs2] > M[R[rs1]]) M[R[rs1]] =                             |              |
| anonaxu.w, anona    | xu.d R      | MAXimum Ur       | isigned        | R[rd] = M[R[rs1]],                                              | 2,5          |
|                     | n.d R       | MINimum          |                | if (R[rs2] > M[R[rs1]]) M[R[rs1]] =                             |              |
| amomin.w, amomin    | n.a K       | MUNIMUM          |                | R[rd] = M[R[rs1]],                                              | 9<br>9       |
| amominu.w, amomi    | nu.d R      | MINimum Uns      | inned          | if $(R[rs2] \le M[R[rs1]]) M[R[rs1]] = 1$<br>R[rd] = M[R[rs1]]. | 2.9          |
| disoutrierw, disout | nara K      | Min Cite         | signed         | R[ra] = M[R[rs1]],<br>if (R[rs2] < M[R[rs1]]) M[R[rs1]] =)      |              |
| amoor.w, amoor.     | d R         | OR               |                | R[rd] = M[R[rs1]],                                              | 5 (100)<br>S |
|                     | K           |                  |                | M[R[rs1]] = M[R[rs1]]   R[rs2]                                  | ,            |
| amoswap.w, amosw    | ap.d R      | SWAP             |                | R[rd] = M[R[rs1]], M[R[rs1]] = R                                | [rs2] 9      |
| amoxor.w, amoxo:    |             | XOR              |                | R[rd] = M[R[rs1]],                                              | 9            |
|                     |             |                  |                | M[R[rs1]] = M[R[rs1]] ^ R[rs2]                                  |              |
| lr.w,lr.d           | R           | Load Reserved    |                | R[rd] = M[R[rs1]],                                              |              |
|                     |             |                  |                | reservation on M[R[rs1]]                                        |              |
| sc.w,sc.d           | R           | Store Condition  | nal            | if reserved, M[R[rs1]] = R[rs2],                                |              |
|                     |             |                  |                | R[rd] = 0; else R[rd] = 1                                       |              |
|                     |             |                  |                |                                                                 |              |

#### CORE INSTRUCTION FORMATS

| 31 2 | .7 | 26 | 25 | 24 | 20 | 19 | 15 | 14 | 12 | 11 | 7 | 6 ( | ) |
|------|----|----|----|----|----|----|----|----|----|----|---|-----|---|
|------|----|----|----|----|----|----|----|----|----|----|---|-----|---|

| R  | funct7       | rs2           | rsl    | funct3 | rd          | Opcode |
|----|--------------|---------------|--------|--------|-------------|--------|
| 1  | imm[11:0]    | rsl           | funct3 | rd     | Opcode      |        |
| s  | imm[11:5]    | rs2           | rsl    | funct3 | imm[4:0]    | opcode |
| SB | imm[12 10:5] | rs2           | rsl    | funct3 | imm[4:1 11] | opcode |
| U  | in           | rd            | opcode |        |             |        |
| UJ | imm[20       | ) 10:1 11 19: | 12]    |        | rd          | opcode |

| REGISTER   | NAME     | USE                                 | SAVE  |
|------------|----------|-------------------------------------|-------|
| x0         | zero     | The constant value 0                | N.A.  |
| xl         | τa       | Return address                      | Calle |
| ж2         | sp       | Stack pointer                       | Calle |
| <b>x</b> 3 | gp       | Global pointer                      |       |
| x4         | tp       | Thread pointer                      |       |
| x5-x7      | t0-t2    | Temporaries                         | Calle |
| жB         | s0/fp    | Saved register/Frame pointer        | Calle |
| ×9         | 51       | Saved register                      | Calle |
| x10-x11    | a0-a1    | Function arguments/Return values    | Calle |
| x12-x17    | a2-a7    | Function arguments                  | Calle |
| x18-x27    | 82-811   | Saved registers                     | Calle |
| ж28-ж31    | t3-t6    | Temporaries                         | Calle |
| f0-f7      | ft0-ft7  | FP Temporaries                      | Calle |
| f8-f9      | fs0-fs1  | FP Saved registers                  | Calle |
| f10-f11    | fa0-fal  | FP Function arguments/Return values | Calle |
| f12=f17    | £a2=fa7  | FP Function arguments               | Calle |
| f18-f27    | fs2-fs11 | FP Saved registers                  | Calle |
| f28 f31    | ft8 ft11 | R[rd] = R[rs1] + R[rs2]             | Calle |

#### IEEE 754 FLOATING-POINT STANDARD

(-1)<sup>5</sup> × (1 + Fraction) × 2<sup>(13) potent - Hao</sup> where Half-Precision Bias = 15, Single-Precision Bias = 127, Double-Precision Bias = 1023, Quad-Precision Bias = 16383

#### IEEE Half-, Single-, Double-, and Quad-Precision Formats:

|     |     | d rounder da |          | e, ma qui |          | n i orminer. |   |   |
|-----|-----|--------------|----------|-----------|----------|--------------|---|---|
| s   | E   | xponent      | Fra      | ction     |          |              |   |   |
| 15  | 14  | 10           | 9        |           | 0        |              |   |   |
| S   |     | Exponent     | Exponent |           | Fraction |              |   |   |
| 31  | 30  |              | 23       | 22        |          | 0            |   |   |
| S   |     | Exponent     |          |           | Fraction |              |   |   |
| 63  | 62  | 52 51        |          |           |          |              | 0 |   |
| S   |     | Ex           | ponen    | t         |          | Fraction     |   |   |
| 127 | 126 |              |          | 112       | 2 111    |              |   | 0 |

#### MEMORY ALLOCATION STACK FRAME SP 🔶 0000 000r mm mu Stack Higher Argument 9 Memory Addresses Argument 8 FP -Saved Register Stack Dynamic Data irows 0000 0000 1000 0000<sub>M</sub> Static Data Local Variables SP -Text PC -> 0000 0000 0040 0000. Lower Memory Reserved Addresses

#### SIZE PREFIXES AND SYMBOLS

| SIZE             | PREFIX | SYMBOL | SIZE  | PREFIX | SYMBOL |
|------------------|--------|--------|-------|--------|--------|
| 103              | Kilo-  | K      | 21    | Kibi-  | Ki     |
| 10°              | Mega-  | M      | 278   | Mebi-  | Mi     |
| 10°              | Giga-  | G      | 25    | Gibi-  | Gi     |
| 1012             | Tera-  | Т      | 20    | Tebi-  | Ti     |
| 1015             | Peta-  | Р      | 29    | Pebi-  | Pi     |
| 10 <sup>18</sup> | Exa-   | E      | 2%    | Exbi-  | Ei     |
| 10 <sup>21</sup> | Zetta- | Z      | 27    | Zebi-  | Zi     |
| 10 <sup>34</sup> | Yotta- | Y      | 259   | Yobi-  | Yi     |
| 103              | milli- | m      | 10'15 | femto- | f      |
| 10*              | micro- | μ      | 10.18 | atto-  | 8      |
| 10.8             | nano-  | n      | 10.20 | zepto- | z      |
| 10-12            | pico-  | р      | 10.24 | yocto- | У      |

#### Question

- Which statement is FALSE?
  - A: RISC-V uses jal to invoke a function and jr to return from a function
  - B: jal saves PC+1 in ra
  - C: The callee can use temporary registers (ti) without saving and restoring them
  - D: The caller can rely on save registers (si) without fear of callee changing them

#### Leaf() from before:

#### Leaf:

| addi<br>sw<br>sw       |                                             | <pre>8 # adjust stack for 2 items<br/># save s1 for use afterwards<br/># save s0 for use afterwards</pre>                                    |
|------------------------|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| add<br>add<br>sub      | s1, a2, a                                   | 1 # f = g + h<br>3 # s1 = i + j<br>1 # return value (g + h) - (i + j)                                                                        |
| lw<br>lw<br>addi<br>jr | <pre>s0, 0(sp) s1, 4(sp) sp, sp, 8 ra</pre> | <pre># restore register s0 for caller # restore register s1 for caller # adjust stack to delete 2 items # jump back to calling routine</pre> |

#### We could have optimized...

We could have just as easily used t0 and t1 instead...

#### Allocating Space on Stack

- C has two storage classes: automatic and static
  - Automatic variables are local to function and discarded when function exits
  - Static variables exist across exits from and entries to procedures
- Use stack for automatic (local) variables that don't fit in registers
- *Procedure frame* or *activation record*: segment of stack with saved registers and local variables

#### Stack Before, During, After Function



## Using the Stack (1/2)

- We have a register **sp** which always points to the last used space in the stack.
- To use stack, we decrement this pointer by the amount of space we need and then fill it with info.
- So, how do we compile this?

int sumSquare(int x, int y) {
 return mult(x,x)+ y;

#### Using the Stack (2/2)

int sumSquare(int x, int y) {
 return mult(x,x)+ y; }

| sumSqu | Jare: |     |        |                         |  |  |
|--------|-------|-----|--------|-------------------------|--|--|
| "push" | addi  | sp, | sp, -8 | <i># space on stack</i> |  |  |
| push   | SW    | ra, | 4(sp)  | <i># save ret addr</i>  |  |  |
|        | SW    | a1, | 0(sp)  | <i># save y</i>         |  |  |
|        | mv    | a1, | a0     | <pre># mult(x,x)</pre>  |  |  |
|        | jal   | mul | t      | <i># call mult</i>      |  |  |
|        | lw    | a1, | 0(sp)  | <i># restore y</i>      |  |  |
| "pop"  | add   | a0, | a0, a1 | <pre># mult()+y</pre>   |  |  |
|        | lw    | ra, | 4(sp)  | <i># get ret addr</i>   |  |  |
|        | addi  | sp, | sp, 8  | <i># restore stack</i>  |  |  |
|        | jr ra |     |        |                         |  |  |
| mult:  | • • • |     |        |                         |  |  |

#### **Basic Structure of a Function**

# Prologue entry\_label: addi sp,sp, -framesize sw ra, framesize-4(sp) # save ra save other regs if need be

**Body** ... (call other functions...)



memory

#### Epilogue

## restore other regs if need be lw ra, framesize-4(sp) # restore \$ra addi sp, sp, framesize jr ra

#### Where is the Stack in Memory?

- RV32 convention (RV64 and RV128 have different memory layouts)
- Stack starts in high memory and grows down
  - Hexadecimal: bfff\_fff0<sub>hex</sub>
  - Stack must be aligned on 16-byte boundary (not true in examples above)
- RV32 programs (*text segment*) in low end
  - 0001\_0000<sub>hex</sub>
- static data segment (constants and other static variables) above text for static variables
  - RISC-V convention *global pointer* (**gp**) points to static
  - $RV32 gp = 1000_{hex}$
- Heap above static for data structures that grow and shrink ; grows up to high addresses



#### Frame Pointer!?

- As a reminder, we shove all the C local variables etc. on the stack...
  - Combined with space for all the saved registers
  - This is called the "activation record" or "call frame" or "call record"
- But a naive compiler may cause the stack pointer to bounce up and down during a function call
  - Can be a lot simpler to have a compiler do a bunch of pushes and pops when it needs a bit of temporary space: more so on a CISC rather than a RISC however
- Plus: not all programming languages can store all activation records on the stack:
  - The use of lambda in Scheme, Python, Go, etc. requires that some call frames are allocated on the heap since variables may last beyond the function call!

#### Convention: Use **s0** as a Frame Pointer (**fp**)

• At the start, save **s0 (x8)** and then have the Frame pointer point to one below the sp when you were called...

addi sp sp -20 # Initially grabbing 5 words of space sw ra 16(sp) # sw fp 12(sp) # save fp/s0/x8 addi fp sp 20 # Points to the start of this call record ...

- Now we can address local variables off the frame pointer rather than the stack pointer
  - Simplifies the compiler
    - Since it can now move the stack up and down easily
  - Simplifies the *debugger*

#### But note...

- It isn't necessary in C...
  - Most C compilers has a -f-omit-frame-pointer option on most architectures
    - It just fubars debugging a bit
- So for our hand-written assembly, we will generally ignore the frame pointer
- The calling convention says it doesn't matter if you use a frame pointer or not!
  - It is just a callee saved register, so if you use it as a frame pointer...
    - It will be preserved just like any other saved register But if you just use it as **s0**, that makes no difference!

The Stack Is Also For Local Variables...

- e.g. char[20] foo;
- Requires enough space on the stack
   May need padding
- So then to pass foo to something in a0...
   addi a0 sp offset-for-foo-off-sp
   addi a0 fp offset-for-foo-off-fp
  - If you are using the frame pointer...

#### The Stack Is Also For Arguments

- Arguments 1-8 are passed in a0-a7
- But what about a 9th argument or more?
- But what about complex structs as arguments?
  - Pass those on the stack!
  - When the function is called,
    - **0(sp)** -> arg #9 **4(sp)** -> arg #10...
- ALWAYS keep sp the lowest address used!



#### Stack Before, During, After Function



#### **Register Allocation**

- We have some set of registers that are useful for local variables, temporaries that last across function calls, etc...
- We have some other set of registers that are just for temporary use
- Which ones do we use? What do we instead save on the stack?
- This is the "Register Allocation" problem

   Experience it in great detail in CS 131 Compilers ...
- Can either be trivial or NP-complete!

#### Levels of Representation/Interpretation



## Big Idea: Stored-Program Computer

First Draft of a Report on the EDVAC by John von Neumann Contract No. W–670–ORD–4926 Between the United States Army Ordnance Department and the University of Pennsylvania Moore School of Electrical Engineering University of Pennsylvania

June 30, 1945

- Instructions are represented as bit patterns can think of these as numbers
- Therefore, entire programs can be stored in memory to be read or written just like data
- Can reprogram quickly (seconds), don't have to rewire computer (days)
- Known as the "von Neumann" computers after widely distributed tech report on EDVAC project
  - Wrote-up discussions of Eckert and Mauchly
  - Anticipated earlier by Turing and Zuse

#### **Consequence #1: Everything Addressed**

- Since all instructions and data are stored in memory, everything has a memory address: instructions, data words
  - both branches and jumps use these
- C pointers are just memory addresses: they can point to anything in memory
  - Unconstrained use of addresses can lead to nasty bugs; up to you in C; limited in Java by language design
- One register keeps address of instruction being executed: "Program Counter" (PC)
  - Basically a pointer to memory: Intel calls it Instruction Pointer (a better name)

#### **Consequence #2: Binary Compatibility**

- Programs are distributed in binary form
  - Programs bound to specific instruction set
  - Different version for ARM (phone) and PCs
- New machines want to run old programs ("binaries") as well as programs compiled to new instructions
- Leads to "backward-compatible" instruction set evolving over time
- Selection of Intel 8086 in 1981 for 1<sup>st</sup> IBM PC is major reason latest PCs still use 80x86 instruction set; could still run program from 1981 PC today

## Instructions as Numbers (1/2)

- Currently most data we work with is in words (32bit chunks):
  - Each register is a word.
  - lw and sw both access memory one word at a time.
- So how do we represent instructions?
  - Remember: Computer only understands 1s and 0s, so
     "add x10,x11,x0" is meaningless.
  - RISC-V seeks simplicity: since data is in words, make instructions be fixed-size 32-bit words, too
    - Same 32-bit instructions used for RV32, RV64, RV128

### Instructions as Numbers (2/2)

- One word is 32 bits, so divide instruction word into "fields".
- Each field tells processor something about instruction.
- We could define different fields for each instruction, but RISC-V seeks simplicity, so define 6 basic types of instruction formats:
  - R-format for register-register arithmetic operations
  - I-format for register-immediate arithmetic operations and loads
  - S-format for stores
  - B-format for branches (minor variant of S-format, called SB before)
  - U-format for 20-bit upper immediate instructions
  - J-format for jumps (minor variant of U-format, called UJ before)

# Summary of RISC-V Instruction Formats

| <u>31 30 25</u> | 24 21 20 | 19 15  | 14 12  | 2 11 8 7    | 6 0    |        |
|-----------------|----------|--------|--------|-------------|--------|--------|
| funct7          | rs2      | rs1    | funct3 | rd          | opcode | R-type |
| imm[11          | L:0]     | rs1    | funct3 | rd          | opcode | l-type |
| imm[11:5]       | rs2      | rs1    | funct3 | imm[4:0]    | opcode | S-type |
| imm[12 10:5]    | rs2      | rs1    | funct3 | imm[4:1 11] | opcode | B-type |
|                 | imm[3    | 31:12] |        | rd          | opcode | U-type |
| imm[20 10:      | 1 11]]   | imm[   | 19:12] | rd          | opcode | J-type |

#### **R-Format Instruction Layout**



- 32-bit instruction word divided into six fields of varying numbers of bits each: 7+5+5+3+5+7 = 32
- Examples
  - opcode is a 7-bit field that lives in bits 6-0 of the instruction
  - rs2 is a 5-bit field that lives in bits 24-20 of the instruction

#### R-Format Instructions opcode/funct fields

| 3 | 1 25   | 5 <b>24</b> 20 | 19 15 | 14 12  | 11 7 | 760    |
|---|--------|----------------|-------|--------|------|--------|
|   | funct7 | rs2            | rs1   | funct3 | rd   | opcode |
|   | 7      | 5              | 5     | 3      | 5    | 7      |

- opcode: partially specifies what instruction it is

- Note: This field is equal to 0110011<sub>two</sub> for all R-Format register-register arithmetic instructions
- funct7+funct3: combined with opcode, these two fields describe what operation to perform
- Question: You have been professing simplicity, so why aren't opcode and funct7 and funct3 a single 17-bit field?
  - We'll answer this later

#### **R-Format Instructions register specifiers**

| 31 |        | 25 24 | 20  | 19 1. | 514   | 12 11 | 76 | 0     |
|----|--------|-------|-----|-------|-------|-------|----|-------|
| f  | funct7 |       | rs2 | rs1   | funct | .3 rd | 0  | pcode |
|    | 7      |       | 5   | 5     | 3     | 5     |    | 7     |

- <u>rs1</u> (Source Register #1): specifies register containing first operand
- <u>**rs2</u>** : specifies second register operand</u>
- <u>rd</u> (Destination Register): specifies register which will receive result of computation
- Each register field holds a 5-bit unsigned integer (0-31) corresponding to a register number (x0-x31)

### **R-Format Example**

 RISC-V Assembly Instruction: add x18,x19,x10

| 3 | 1       | 25 24 | 2     | 0 19 | 1!    | 514 | 12    | 2 11 |      | 76 | 0       |    |
|---|---------|-------|-------|------|-------|-----|-------|------|------|----|---------|----|
|   | funct7  |       | rs2   |      | rs1   | f   | unct3 |      | rd   |    | opcode  |    |
|   | 7       |       | 5     |      | 5     |     | 3     |      | 5    |    | 7       |    |
|   |         |       |       |      |       |     |       |      |      |    |         |    |
|   | 0000000 |       | 01010 | 1    | .0011 |     | 000   | 1    | 0010 |    | 0110011 |    |
| _ | add     | r     | s2=10 | ) rs | s1=19 | 9   | add   | r    | d=18 | F  | Reg-Reg | OP |

#### All RV32 R-format instructions

| add  | 0110011 | rd | 000 | rs1 | rs2 | 0000000 |
|------|---------|----|-----|-----|-----|---------|
| sub  | 0110011 | rd | 000 | rs1 | rs2 | 0100000 |
| sll  | 0110011 | rd | 001 | rs1 | rs2 | 0000000 |
| slt  | 0110011 | rd | 010 | rs1 | rs2 | 0000000 |
| sltu | 0110011 | rd | 011 | rs1 | rs2 | 0000000 |
| xor  | 0110011 | rd | 100 | rs1 | rs2 | 0000000 |
| srl  | 0110011 | rd | 101 | rs1 | rs2 | 0000000 |
| sra  | 0110011 | rd | 101 | rs1 | rs2 | 0100000 |
| or   | 0110011 | rd | 110 | rs1 | rs2 | 0000000 |
| and  | 0110011 | rd | 111 | rs1 | rs2 | 0000000 |
| -    |         |    |     |     |     |         |

Different encoding in funct7 + funct3 selects different operations

# Question

- What is correct encoding of add x4, x3, x2 ?
  - A: 4021 8233<sub>hex</sub>
  - B: 0021  $82b3_{hex}$
  - C: 4021  $82b3_{hex}$
  - D: 0021  $8233_{hex}$
  - E: 0021 8234<sub>hex</sub>

| 31 | 25      | 24 20 | 19 15 | 14 12 | 11 7 | 6 0     | _   |
|----|---------|-------|-------|-------|------|---------|-----|
|    | 0000000 | rs2   | rs1   | 000   | rd   | 0110011 | add |
|    | 0100000 | rs2   | rs1   | 000   | rd   | 0110011 | sub |
|    | 0000000 | rs2   | rs1   | 100   | rd   | 0110011 | xor |
|    | 0000000 | rs2   | rs1   | 110   | rd   | 0110011 | or  |
|    | 0000000 | rs2   | rs1   | 111   | rd   | 0110011 | and |



# Admin

- HW 2: due in about a week start early!
- Project 1.1 will be posted today!
  - Work together with your partner!
  - Push to gitlab very often every day you work on the project even if it doesn't complie!
  - We will evaluate each partners contribution based on gitlab statistics!
- Venus Tutorial Videos available on the website.
   From last year's TA Ze Song

# Admin

- Lecture schedule slightly changed...
- Midterm I and II Dates:
  - April 6
  - May 11
  - During lecture hours (10:15 12:15)
  - Rooms: tbd.
- Midterm I content:
  - Everything till (including): RISC-V Datapath
  - Material: 1 A4 cheat-sheet handwritten by you

# **I-Format Instructions**

- What about instructions with immediates?
  - 5-bit field only represents numbers up to the value
     31: immediates may be much larger than this
  - Ideally, RISC-V would have only one instruction format (for simplicity): unfortunately, we need to compromise
- Define new instruction format that is mostly consistent with R-format
  - Notice if instruction has immediate, then uses at most 2 registers (one source, one destination)

#### **I-Format Instruction Layout**

| <u>31</u> |        | 25 24   | 2             | 0 19 | 15  | 14 12  | 2 11 | 76     | 0 |
|-----------|--------|---------|---------------|------|-----|--------|------|--------|---|
|           | functi | hm [11: | 0 <b>r</b> s2 |      | rs1 | funct3 | rd   | opcode |   |
|           | 7      | 12      | 5             |      | 5   | 3      | 5    | 7      |   |

- Only one field is different from R-format, rs2 and funct7 replaced by 12-bit signed immediate, imm[11:0]
- Remaining fields (rs1, funct3, rd, opcode) same as before
- imm[11:0] can hold values in range [-2048<sub>ten</sub> , +2047<sub>ten</sub>]
- Immediate is always sign-extended to 32-bits before use in an arithmetic operation
- We'll later see how to handle immediates > 12 bits

### I-Format Example

• RISC-V Assembly Instruction:

addi x15,x1,-50

| 31 | L 2          | 0 19 | 15  | 14 | 12   | 2 11 | -     | 76  |       | 0 |
|----|--------------|------|-----|----|------|------|-------|-----|-------|---|
|    | imm[11:0]    | :    | rs1 | fu | nct3 |      | rd    | opo | code  |   |
|    | 12           |      | 5   |    | 3    |      | 5     |     | 7     |   |
|    |              |      |     |    |      |      |       |     |       |   |
|    | 111111001110 | 00   | 001 | C  | 00   | 0    | 1111  | 001 | 0011  |   |
|    | imm=-50      | rs   | 1=1 | ä  | add  | r    | :d=15 | OP  | – Imn | 1 |

# All RV32 I-format Arithmetic Instructions

| imm[1:  | 1:0]  | rs1 | 000 | rd | 0010011 | addi  |
|---------|-------|-----|-----|----|---------|-------|
| imm[1:  | 1:0]  | rs1 | 010 | rd | 0010011 | slti  |
| imm[1:  | 1:0]  | rs1 | 011 | rd | 0010011 | sltiu |
| imm[1]  | 1:0]  | rs1 | 100 | rd | 0010011 | xori  |
| imm[1:  | 1:0]  | rs1 | 110 | rd | 0010011 | ori   |
| imm[1:  | 1:0]  | rs1 | 111 | rd | 0010011 | andi  |
| 0000000 | shamt | rs1 | 001 | rd | 0010011 | slli  |
| 900000  | shamt | rs1 | 101 | rd | 0010011 | srli  |
| 0100000 | shamt | rs1 | 101 | rd | 0010011 | srai  |

One of the higher-order immediate bits is used to distinguish "shift right logical" (SRLI) from "shift right arithmetic" (SRAI) "Shift-by-immediate" instructions only use lower 5 bits of the immediate value for shift amount (can only shift by 0-31 bit positions)

### Load Instructions are also I-Type

| 31 | 2            | 0 19 | 1514 | 4 12   | 11 ' | 7 6    | 0 |
|----|--------------|------|------|--------|------|--------|---|
|    | imm[11:0]    | r    | s1   | funct3 | rd   | opcode |   |
|    | 12           |      | 5    | 3      | 5    | 7      |   |
|    | offset[11:0] | ba   | ase  | width  | dest | LOAD   |   |

- The 12-bit signed immediate is added to the base address in register rs1 to form the memory address
  - This is very similar to the add-immediate operation but used to create address not to create final result
- The value loaded from memory is stored in register rd

#### I-Format Load Example

• RISC-V Assembly Instruction:

lw x14, 8(x2)

| 31 |                    | 20 | <u>19 15</u> | 14 12      | 11 7      | 0         |
|----|--------------------|----|--------------|------------|-----------|-----------|
|    | imm[11:0]          |    | rs1          | funct3     | rd        | opcode    |
|    | 12<br>offset[11:0] |    | 5<br>base    | 3<br>width | 5<br>dest | 7<br>LOAD |
|    | 00000001000        |    | 00010        | 010        | 01110     | 0000011   |
|    | imm=+8             |    | rs1=2        | lw         | rd=14     | LOAD      |
|    |                    |    | (1           | oad wor    | :d)       |           |

# All RV32 Load Instructions

| rs1 | 001 | rd      | 0000011    | lh                 |
|-----|-----|---------|------------|--------------------|
|     |     |         |            |                    |
| rs1 | 010 | rd      | 0000011    | lw                 |
| rs1 | 100 | rd      | 0000011    | lbu                |
| rs1 | 101 | rd      | 0000011    | lhu                |
|     | rs1 | rs1 100 | rs1 100 rd | rs1 100 rd 0000011 |

funct3 field encodes size and 'signedness' of load data

- LBU is "load unsigned byte"
- LH is "load halfword", which loads 16 bits (2 bytes) and sign-extends to fill destination 32bit register
- LHU is "load unsigned halfword", which zero-extends 16 bits to fill destination 32-bit register
- There is no LWU in RV32, because there is no sign/zero extension needed when copying 32 bits from a memory location into a 32-bit register

### **S-Format Used for Stores**

| 31 :       | 25 24  | 20 19 | 1514    | 12 11       | 76 0     |
|------------|--------|-------|---------|-------------|----------|
| Imm[11:5]  | rs2    | rs1   | funct   | 3 imm[4:0]  | opcode   |
| 7          | 5      | 5     | 3       | 5           | 7        |
| offset[11: | 5] src | base  | e widtl | n offset[4: | 0] STORE |

- Store needs to read two registers, rs1 for base memory address, and rs2 for data to be stored, as well immediate offset!
- Can't have both rs2 and immediate in same place as other instructions!
- Note that stores don't write a value to the register file, *no rd*!
- RISC-V design decision is move low 5 bits of immediate to where rd field was in other instructions keep rs1/rs2 fields in same place
  - register names more critical than immediate bits in hardware design