## CS 110 Computer Architecture Lecture 6: *RISC-V Instruction Formats*

Instructors: Sören Schwertfeger & Chundong Wang

https://robotics.shanghaitech.edu.cn/courses/ca/20s/

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

## Admin



Add/ drop is over =>

Participation will be checked more rigorous now!

• Midterm I will be canceled.

Adjustment of grading scaling
 – Still might change in the future!



## **Course Grading**

- Projects: 30% (-3%)
- Homework: 15% (-2%)
- Lab: 5%
- Exams: 30%
  - -- Midterm 1: 10%? (-10%)
  - Midterm 2: 10%
  - Final: 20%
- Participation: 10% (+5%)
- Quizzes: 10% (+10%)

## Participation

- Participation: 10%
  - Maybe each item 2%:
  - Video Lecture Poll Participation
  - Online Lecture Participation (join zoom lecture)
  - Online Lecture Quiz Participation
  - Lab, Homework, Project Attendance
  - Piazza statistics:

#### Student Participation Report

| Name, Email | days online | posts viewed* | contributions** |
|-------------|-------------|---------------|-----------------|
|             | 37          | 160           | 20              |
|             | 32          | 159           | 30              |
|             | 25          | 159           | 55              |
|             |             |               | 4               |

## Quiz



- Quizzes: 10%
  - Video Lecture piazza polls (maybe 3%)
  - Online Lecture Quizzes (maybe 7%)
- Online Lecture Quizzes Grading:
  - Graded generously
  - Test of paying attention & understanding basic concepts
  - Submitting (almost) empty solution 15 minutes before deadline will get 0 points.

## Admin



- Head TA Yanjie Song keeps track of students with technical difficulties – contact him if there are such problems! We will find a solution.
- HW3 is published due March 27 start early!
- Project 1.1 will be published today!
- Never share your code with anybody!

## RISC-V ISA so far...

- Registers we know so far (All of them!)
  - a0-a7 for function arguments, a0-a1 for return values
  - sp, stack pointer, ra return address
  - s0-s11 saved registers
  - t0-t6 temporaries
  - zero
- Instructions we know:
  - Arithmetic: add, addi, sub
  - Logical: sll, srl, slli, srli, slai, and, or, xor, andi, ori, xori
  - Decision: beq, bne, blt, bge
  - Unconditional branches (jumps): j, jr
  - Functions called with jal, return with jr ra.
- The stack is your friend: Use it to save anything you need. Just leave it the way you found it!

# 12 Shift Instructions...

- Two versions of of all shift instructions. Shift amount via:
  - Register
  - Immediate
- (On RV64: additional "word" version of instruction: only works on first 32bit of 64bit register)
- Shift Left
- Shift Right Arithmetic: Fill upper bits with msb
- Shift Right Logic: Fill upper bits with 0's

| sll,sllw       | R   | Shift Left (Word)              | $R[rd] = R[rs1] \iff R[rs2]$            | 1)   |
|----------------|-----|--------------------------------|-----------------------------------------|------|
| slli,slliw     | Ι   | Shift Left Immediate (Word)    | $R[rd] = R[rs1] \ll imm$                | 1)   |
| sra,sraw       | R   | Shift Right Arithmetic (Word)  | R[rd] = R[rs1] >> R[rs2]                | 1,5) |
| srai,sraiw     | Ι   | Shift Right Arith Imm (Word)   | R[rd] = R[rs1] >> imm                   | 1,5) |
| srl,srlw       | R   | Shift Right (Word)             | $R[rd] = R[rs1] \gg R[rs2]$             | 1)   |
| srli,srliw     | Ι   | Shift Right Immediate (Word)   | R[rd] = R[rs1] >> imm                   | 1)   |
| Notes 1) The I | Nor | l version only onergies on the | wightmost 32 hits of a 61 hit variators |      |

Notes: 1) The Word version only operates on the rightmost 32 bits of a 64-bit registers 5) Replicates the sign bit to fill in the leftmost bits of the result during right shift

## Frame Pointer!?

- As a reminder, we shove all the C local variables etc. on the stack...
  - Combined with space for all the saved registers
  - This is called the "activation record" or "call frame" or "call record"
- But a naive compiler may cause the stack pointer to bounce up and down during a function call
  - Can be a lot simpler to have a compiler do a bunch of pushes and pops when it needs a bit of temporary space: more so on a CISC rather than a RISC however
- Plus: not all programming languages can store all activation records on the stack:
  - The use of lambda in Scheme, Python, Go, etc. requires that some call frames are allocated on the heap since variables may last beyond the function call!

## Convention: Use **s0** as a Frame Pointer (**fp**)

• At the start, save **s0 (x8)** and then have the Frame pointer point to one below the sp when you were called...

addi sp sp -20 # Initially grabbing 5 words of space sw ra 16(sp) # sw fp 12(sp) # save fp/s0/x8 addi fp sp 20 # Points to the start of this call record ...

- Now we can address local variables off the frame pointer rather than the stack pointer
  - Simplifies the compiler
    - Since it can now move the stack up and down easily
  - Simplifies the *debugger*

## But note...

- It isn't necessary in C...
  - Most C compilers has a -f-omit-frame-pointer option on most architectures
    - It just fubars debugging a bit
- So for our hand-written assembly, we will generally ignore the frame pointer
- The calling convention says it doesn't matter if you use a frame pointer or not!
  - It is just a callee saved register, so if you use it as a frame pointer...
    - It will be preserved just like any other saved register But if you just use it as **s0**, that makes no difference!

The Stack Is Also For Local Variables...

- e.g. char[20] foo;
- Requires enough space on the stack
   May need padding
- So then to pass foo to something in a0...
   addi a0 sp offset-for-foo-off-sp
   addi a0 fp offset-for-foo-off-fp
  - If you are using the frame pointer...

## The Stack Is Also For Arguments

- Arguments 1-8 are passed in a0-a7
- But what about a 9th argument or more?
- But what about complex structs as arguments?
  - Pass those on the stack!
  - When the function is called,
    - **0(sp)** -> arg #9 **4(sp)** -> arg #10...
- ALWAYS keep sp the lowest address used!



#### Stack Before, During, After Function



## **Register Allocation**

- We have some set of registers that are useful for local variables, temporaries that last across function calls, etc...
- We have some other set of registers that are just for temporary use
- Which ones do we use? What do we instead save on the stack?
- This is the "Register Allocation" problem

   Experience it in great detail in CS 131 Compilers ...
- Can either be trivial or NP-complete!

#### Levels of Representation/Interpretation



## Big Idea: Stored-Program Computer

First Draft of a Report on the EDVAC by John von Neumann Contract No. W–670–ORD–4926 Between the United States Army Ordnance Department and the University of Pennsylvania Moore School of Electrical Engineering University of Pennsylvania

June 30, 1945

- Instructions are represented as bit patterns can think of these as numbers
- Therefore, entire programs can be stored in memory to be read or written just like data
- Can reprogram quickly (seconds), don't have to rewire computer (days)
- Known as the "von Neumann" computers after widely distributed tech report on EDVAC project
  - Wrote-up discussions of Eckert and Mauchly
  - Anticipated earlier by Turing and Zuse

#### **Consequence #1: Everything Addressed**

- Since all instructions and data are stored in memory, everything has a memory address: instructions, data words
  - both branches and jumps use these
- C pointers are just memory addresses: they can point to anything in memory
  - Unconstrained use of addresses can lead to nasty bugs; up to you in C; limited in Java by language design
- One register keeps address of instruction being executed: "Program Counter" (PC)
  - Basically a pointer to memory: Intel calls it Instruction Pointer (a better name)

#### **Consequence #2: Binary Compatibility**

- Programs are distributed in binary form
  - Programs bound to specific instruction set
  - Different version for ARM (phone) and PCs
- New machines want to run old programs ("binaries") as well as programs compiled to new instructions
- Leads to "backward-compatible" instruction set evolving over time
- Selection of Intel 8086 in 1981 for 1<sup>st</sup> IBM PC is major reason latest PCs still use 80x86 instruction set; could still run program from 1981 PC today

## Instructions as Numbers (1/2)

- Currently most data we work with is in words (32bit chunks):
  - Each register is a word.
  - lw and sw both access memory one word at a time.
- So how do we represent instructions?
  - Remember: Computer only understands 1s and 0s, so
     "add x10, x11, x0" is meaningless.
  - RISC-V seeks simplicity: since data is in words, make instructions be fixed-size 32-bit words, too
    - Same 32-bit instructions used for RV32, RV64, RV128

## Instructions as Numbers (2/2)

- One word is 32 bits, so divide instruction word into "fields".
- Each field tells processor something about instruction.
- We could define different fields for each instruction, but RISC-V seeks simplicity, so define 6 basic types of instruction formats:
  - R-format for register-register arithmetic operations
  - I-format for register-immediate arithmetic operations and loads
  - S-format for stores
  - B-format for branches (minor variant of S-format, called SB before)
  - U-format for 20-bit upper immediate instructions
  - J-format for jumps (minor variant of U-format, called UJ before)

## Summary of RISC-V Instruction Formats

| <u>31 30 25</u> | 24 21 20 | 19 15  | 14 12  | 2 11 8 7    | 6 0    |        |
|-----------------|----------|--------|--------|-------------|--------|--------|
| funct7          | rs2      | rs1    | funct3 | rd          | opcode | R-type |
| imm[11          | L:0]     | rs1    | funct3 | rd          | opcode | l-type |
| imm[11:5]       | rs2      | rs1    | funct3 | imm[4:0]    | opcode | S-type |
| imm[12 10:5]    | rs2      | rs1    | funct3 | imm[4:1 11] | opcode | B-type |
|                 | imm[3    | 31:12] |        | rd          | opcode | U-type |
| imm[20 10:      | 1 11]]   | imm[   | 19:12] | rd          | opcode | J-type |

#### **R-Format Instruction Layout**



- 32-bit instruction word divided into six fields of varying numbers of bits each: 7+5+5+3+5+7 = 32
- Examples
  - opcode is a 7-bit field that lives in bits 6-0 of the instruction
  - rs2 is a 5-bit field that lives in bits 24-20 of the instruction

#### R-Format Instructions opcode/funct fields

| 3 | 1 25   | 5 <b>24</b> 20 | 19 15 | 14 12  | 11 7 | 760    |
|---|--------|----------------|-------|--------|------|--------|
|   | funct7 | rs2            | rs1   | funct3 | rd   | opcode |
|   | 7      | 5              | 5     | 3      | 5    | 7      |

- opcode: partially specifies what instruction it is

- Note: This field is equal to 0110011<sub>two</sub> for all R-Format register-register arithmetic instructions
- funct7+funct3: combined with opcode, these two fields describe what operation to perform
- Question: You have been professing simplicity, so why aren't opcode and funct7 and funct3 a single 17-bit field?
  - We'll answer this later

#### **R-Format Instructions register specifiers**

| 31 |        | 25 24 | 20  | 19 1. | 514   | 12 11 | 76 | 0     |
|----|--------|-------|-----|-------|-------|-------|----|-------|
| f  | funct7 |       | rs2 | rs1   | funct | .3 rd | 0  | pcode |
|    | 7      |       | 5   | 5     | 3     | 5     |    | 7     |

- <u>rs1</u> (Source Register #1): specifies register containing first operand
- <u>**rs2</u>** : specifies second register operand</u>
- <u>rd</u> (Destination Register): specifies register which will receive result of computation
- Each register field holds a 5-bit unsigned integer (0-31) corresponding to a register number (x0-x31)

#### **R-Format Example**

 RISC-V Assembly Instruction: add x18,x19,x10

| 3 | 1       | 25 24 | 2     | 0 19 | 1!    | 514 | 12    | 2 11 |      | 76 | 0       |    |
|---|---------|-------|-------|------|-------|-----|-------|------|------|----|---------|----|
|   | funct7  |       | rs2   |      | rs1   | f   | unct3 |      | rd   |    | opcode  |    |
|   | 7       |       | 5     |      | 5     |     | 3     |      | 5    |    | 7       |    |
|   |         |       |       |      |       |     |       |      |      |    |         |    |
|   | 0000000 |       | 01010 | 1    | .0011 |     | 000   | 1    | 0010 |    | 0110011 |    |
| _ | add     | r     | s2=10 | ) rs | s1=19 | 9   | add   | r    | d=18 | F  | Reg-Reg | OP |

#### All RV32 R-format instructions

| add  | 0110011 | rd | 000 | rs1 | rs2 | 0000000 |
|------|---------|----|-----|-----|-----|---------|
| sub  | 0110011 | rd | 000 | rs1 | rs2 | 0100000 |
| sll  | 0110011 | rd | 001 | rs1 | rs2 | 0000000 |
| slt  | 0110011 | rd | 010 | rs1 | rs2 | 0000000 |
| sltu | 0110011 | rd | 011 | rs1 | rs2 | 0000000 |
| xor  | 0110011 | rd | 100 | rs1 | rs2 | 0000000 |
| srl  | 0110011 | rd | 101 | rs1 | rs2 | 0000000 |
| sra  | 0110011 | rd | 101 | rs1 | rs2 | 0100000 |
| or   | 0110011 | rd | 110 | rs1 | rs2 | 0000000 |
| and  | 0110011 | rd | 111 | rs1 | rs2 | 0000000 |
| -    |         |    |     |     |     |         |

Different encoding in funct7 + funct3 selects different operations

## Question

- What is correct encoding of add x4, x3, x2 ?
  - A: 4021 8233<sub>hex</sub>
  - B: 0021 82b3<sub>hex</sub>
  - C: 4021 82b3<sub>hex</sub>
  - D: 0021 8233<sub>hex</sub>
  - E: 0021 8234<sub>hex</sub>



| 31 25   | 24 20 | 19 15 | 14 12 | 11 7 | 6 0     | -   |
|---------|-------|-------|-------|------|---------|-----|
| 0000000 | rs2   | rs1   | 000   | rd   | 0110011 | add |
| 0100000 | rs2   | rs1   | 000   | rd   | 0110011 | sub |
| 0000000 | rs2   | rs1   | 100   | rd   | 0110011 | xor |
| 0000000 | rs2   | rs1   | 110   | rd   | 0110011 | or  |
| 0000000 | rs2   | rs1   | 111   | rd   | 0110011 | and |

## **I-Format Instructions**

- What about instructions with immediates?
  - 5-bit field only represents numbers up to the value
     31: immediates may be much larger than this
  - Ideally, RISC-V would have only one instruction format (for simplicity): unfortunately, we need to compromise
- Define new instruction format that is mostly consistent with R-format
  - Notice if instruction has immediate, then uses at most 2 registers (one source, one destination)

#### **I-Format Instruction Layout**

| <u>31</u> |        | 25 24   | 2             | 0 19 | 15  | 14 12  | 2 11 | 76     | 0 |
|-----------|--------|---------|---------------|------|-----|--------|------|--------|---|
|           | functi | hm [11: | 0 <b>r</b> s2 |      | rs1 | funct3 | rd   | opcode |   |
|           | 7      | 12      | 5             |      | 5   | 3      | 5    | 7      |   |

- Only one field is different from R-format, rs2 and funct7 replaced by 12-bit signed immediate, imm[11:0]
- Remaining fields (rs1, funct3, rd, opcode) same as before
- imm[11:0] can hold values in range [-2048<sub>ten</sub> , +2047<sub>ten</sub>]
- Immediate is always sign-extended to 32-bits before use in an arithmetic operation
- We'll later see how to handle immediates > 12 bits

#### I-Format Example

• RISC-V Assembly Instruction:

addi x15,x1,-50

| 31 | L 2          | 0 19 | 15  | 14 | 12   | 2 11 | -     | 76  |       | 0 |
|----|--------------|------|-----|----|------|------|-------|-----|-------|---|
|    | imm[11:0]    | :    | rs1 | fu | nct3 |      | rd    | opo | code  |   |
|    | 12           |      | 5   |    | 3    |      | 5     |     | 7     |   |
|    |              |      |     |    |      |      |       |     |       |   |
|    | 111111001110 | 00   | 001 | C  | 00   | 0    | 1111  | 001 | 0011  |   |
|    | imm=-50      | rs   | 1=1 | ä  | add  | r    | :d=15 | OP  | – Imn | 1 |

## All RV32 I-format Arithmetic Instructions

| imm[1:  | 1:0]         | rs1 | 000 | rd | 0010011 | addi  |
|---------|--------------|-----|-----|----|---------|-------|
| imm[1:  | imm[11:0]    |     | 010 | rd | 0010011 | slti  |
| imm[1:  | 1:0]         | rs1 | 011 | rd | 0010011 | sltiu |
| imm[1]  | 1:0]         | rs1 | 100 | rd | 0010011 | xori  |
| imm[1:  | imm[11:0]    |     | 110 | rd | 0010011 | ori   |
| imm[1]  | 1:0]         | rs1 | 111 | rd | 0010011 | andi  |
| 0000000 | shamt        | rs1 | 001 | rd | 0010011 | slli  |
| 900000  | 000000 shamt |     | 101 | rd | 0010011 | srli  |
| 0100000 | shamt        | rs1 | 101 | rd | 0010011 | srai  |

One of the higher-order immediate bits is used to distinguish "shift right logical" (SRLI) from "shift right arithmetic" (SRAI) "Shift-by-immediate" instructions only use lower 5 bits of the immediate value for shift amount (can only shift by 0-31 bit positions)

### Load Instructions are also I-Type

| 31 | 2            | 0 19 | 1514 | 4 12   | 11 ' | 7 6    | 0 |
|----|--------------|------|------|--------|------|--------|---|
|    | imm[11:0]    | r    | s1   | funct3 | rd   | opcode |   |
|    | 12           |      | 5    | 3      | 5    | 7      |   |
|    | offset[11:0] | ba   | ase  | width  | dest | LOAD   |   |

- The 12-bit signed immediate is added to the base address in register rs1 to form the memory address
  - This is very similar to the add-immediate operation but used to create address not to create final result
- The value loaded from memory is stored in register rd

#### I-Format Load Example

• RISC-V Assembly Instruction:

lw x14, 8(x2)

| 31 |                    | 20 | <u>19 15</u> | 14 12      | 11 7      | 76 0      |  |  |
|----|--------------------|----|--------------|------------|-----------|-----------|--|--|
|    | imm[11:0]          |    | rs1          | funct3     | rd        | opcode    |  |  |
|    | 12<br>offset[11:0] |    | 5<br>base    | 3<br>width | 5<br>dest | 7<br>LOAD |  |  |
|    | 00000001000        |    | 00010        | 010        | 01110     | 0000011   |  |  |
|    | imm=+8             |    | rs1=2        | lw         | rd=14     | LOAD      |  |  |
|    | (load word)        |    |              |            |           |           |  |  |

## All RV32 Load Instructions

| rs1 | 000               | rd                                                        | 0000011                                                                              | lb                                                                                                                             |
|-----|-------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| rs1 | 001               | rd                                                        | 0000011                                                                              | lh                                                                                                                             |
| rs1 | 010               | rd                                                        | 0000011                                                                              | lw                                                                                                                             |
| rs1 | 100               | rd                                                        | 0000011                                                                              | lbu                                                                                                                            |
| rs1 | 101               | rd                                                        | 0000011                                                                              | lhu                                                                                                                            |
|     | rs1<br>rs1<br>rs1 | rs1       001         rs1       010         rs1       100 | rs1       001       rd         rs1       010       rd         rs1       100       rd | rs1       001       rd       0000011         rs1       010       rd       0000011         rs1       100       rd       0000011 |

funct3 field encodes size and 'signedness' of load data

- LBU is "load unsigned byte"
- LH is "load halfword", which loads 16 bits (2 bytes) and sign-extends to fill destination 32bit register
- LHU is "load unsigned halfword", which zero-extends 16 bits to fill destination 32-bit register
- There is no LWU in RV32, because there is no sign/zero extension needed when copying 32 bits from a memory location into a 32-bit register

## **S-Format Used for Stores**

| 31 :       | 25 24  | 20 19 | 1514    | 12 11       | 76 0     |
|------------|--------|-------|---------|-------------|----------|
| Imm[11:5]  | rs2    | rs1   | funct   | 3 imm[4:0]  | opcode   |
| 7          | 5      | 5     | 3       | 5           | 7        |
| offset[11: | 5] src | base  | e widtl | n offset[4: | 0] STORE |

- Store needs to read two registers, rs1 for base memory address, and rs2 for data to be stored, as well immediate offset!
- Can't have both rs2 and immediate in same place as other instructions!
- Note that stores don't write a value to the register file, *no rd*!
- RISC-V design decision is move low 5 bits of immediate to where rd field was in other instructions keep rs1/rs2 fields in same place
  - register names more critical than immediate bits in hardware design

# Keeping Registers always in the Same Place...

- The critical path for *all operations* includes fetching values from the registers
- By always placing the read sources in the same place, the register file can read without hesitation
  - If the data ends up being unnecessary (e.g. I-Type), it can be ignored
- Other RISCs have had slightly different encodings
  - Necessitating the logic to look at the instruction to determine which registers to read
- Example of one of the (many) little tweaks done in RISC-V to make things work better

### S-Format Example

• RISC-V Assembly Instruction:

sw x14, 8(x2)



### All RV32 Store Instructions

| Imm[11:5] | rs2 | rs1 | 000 | imm[4:0] | 0100011 | sb |
|-----------|-----|-----|-----|----------|---------|----|
| Imm[11:5] | rs2 | rs1 | 001 | imm[4:0] | 0100011 | sh |
| Imm[11:5] | rs2 | rs1 | 010 | imm[4:0] | 0100011 | SW |

width

• Store byte, halfword, word



### Q & A



# Very nice Venus Tutorial

Ze Song

Will be available shortly after the lecture.



### Quiz

### Prepare for another Programming PDF Quiz on Thursday!

# Translate Machine Instruction to RISC-V Assembly



Piazza: "Online Lecture 6 Quiz"

Select ALL Assembly instructions that produce this machine instruction!

| •  | 1,074,3 | 32,8 | 51 <sub>ten</sub> |      | ] [ | • 0 | x FF F3 | 43 1 | 3     |          |         |     |
|----|---------|------|-------------------|------|-----|-----|---------|------|-------|----------|---------|-----|
|    |         |      |                   |      |     | 9.  | For 13- | 18 : | inste | ad of t( | ) use t | :1  |
| 1. | add     | s4   | x18               | zero |     | 10. | For 13- | 18 : | inste | ad of t( | ) use t | :2  |
| 2. | sub     | s4   | s2                | x0   |     | 11. | For 13- | 18 : | inste | ad of t( | ) use s | s 0 |
| 3. | add     | s1   | x18               | zero |     | 12. | For 13- | 18 : | inste | ad of t( | ) use s | 51  |
| 4. | sub     | s1   | s2                | x0   |     | 13. | xor     | t0   | t0    | -1       |         |     |
| 5. | neg     | s4   | x18               |      |     | 14. | xori    | t0   | t0    | -1       |         |     |
| 6. | neg     | s1   | s2                |      |     | 15. | xor     | t0   | t0    | 0xFFF    | I       |     |
| 7. | mv      | s4   | x18               |      |     | 16. | xori    | t0   | t0    | 0xFFF    | 1       |     |
| 8. | mv      | s1   | s2                |      |     | 17. | neg     | t0   | t0    |          |         |     |
|    |         |      |                   |      |     | 18. | not     | _    | _     |          |         | 43  |

# CS 110 Computer Architecture Lecture 6: Branch Formats Video 2

Instructors: Sören Schwertfeger & Chundong Wang

https://robotics.shanghaitech.edu.cn/courses/ca/20s/

School of Information Science and Technology SIST

ShanghaiTech University

Slides based on UC Berkley's CS61C

## **RISC-V Conditional Branches**

- E.g., BEQ x1, x2, Label
- Branches read two registers but don't write a register (similar to stores)
- How to encode label, i.e., where to branch to?

# **Branching Instruction Usage**

- Branches typically used for loops (if-else, while, for)
  - Loops are generally small (< 50 instructions)</li>
  - Function calls and unconditional jumps handled with jump instructions (J-Format)
- **Recall:** Instructions stored in a localized area of memory (Code/Text)
  - Largest branch distance limited by size of code
  - Address of current instruction stored in the program counter (PC)

## **PC-Relative Addressing**

- PC-Relative Addressing: Use the immediate field as a two's-complement offset to PC
  - Branches generally change the PC by a small amount
  - Can specify  $\pm 2^{11}$  'unit' addresses from the PC
- Why not use byte as a unit of offset from PC?
  - Because instructions are 32-bits (4-bytes)
  - We don't branch into middle of instruction

# Scaling Branch Offset

- One idea: To improve the reach of a single branch instruction, multiply the offset by four bytes before adding to PC
- This would allow one branch instruction to reach ± 2<sup>11</sup> × 32-bit instructions either side of PC
  - Four times greater reach than using byte offset

### RISC-V Feature, n×16-bit instructions

- Extensions to RISC-V base ISA support 16-bit compressed instructions and also variable-length instructions that are multiples of 16-bits in length
- To enable this, RISC-V scales the branch offset by 2 bytes even when there are no 16-bit instructions
- Reduces branch reach by half and means that ½ of possible targets will be errors on RISC-V processors that only support 32-bit instructions (as used in this class)
- RISC-V conditional branches can only reach ± 2<sup>10</sup> × 32-bit instructions on either side of PC

# **Branch Calculation**

- If we don't take the branch:
   PC = PC + 4 (i.e., next instruction)
- If we do take the branch: PC = PC + immediate\*2

### • Observations:

 immediate is number of instructions to jump (remember, specifies words) either forward (+) or backwards (-)

### **RISC-V B-Format for Branches**

| 31      | 30 25     | 24 20 | 19 15 | 14 12  | 11 8     | 7       | 6 0    |
|---------|-----------|-------|-------|--------|----------|---------|--------|
| imm[12] | imm[10:5] | rs2   | rs1   | funct3 | imm[4:1] | imm[11] | opcode |
| 1       | 6         | 5     | 5     | 3      | 4        | 1       | 7      |
| offset  | [12 10:5] | rs2   | rs1   | funct3 | offset[  | 4:1 11] | BRANCH |

- B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate imm[12:1]
- But now immediate represents values -4096 to +4094 in 2-byte increments
- The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it)

## Branch Example, Determine Offset

• RISC-V Code:

| Loop: | beq    | x19,x10,End   | 0        | Count        |
|-------|--------|---------------|----------|--------------|
|       | add    | x18,x18,x10   | > 1      | instructions |
|       | addi   | x19,x19,-1    | $\sum 2$ | from branch  |
|       | j      | Loop          | ζ 3      |              |
| End:  | # targ | et instructio | )n 4     |              |

- Branch offset = 4×32-bit instructions = 16 bytes
- (Branch with offset of 0, branches to itself)

### Branch Example, Determine Offset

### • RISC-V Code:

| Loop: beq | x19,x10,End    |            | Count        |
|-----------|----------------|------------|--------------|
| add       | x18,x18,x10    |            | instructions |
| add       | i x19,x19,-1   | <b>S</b> 2 | from branch  |
| j         | Loop           | $\leq$ 3   |              |
| End: # ta | rget instructi | Lon 4      |              |

| <u></u> | 01010  | 10011  | 000 | ?????? | 1100011 |
|---------|--------|--------|-----|--------|---------|
| imm     | rs2=10 | rs1=19 | BEQ | imm    | BRANCH  |

## Branch Example, Encode Offset

### • RISC-V Code:

| <b></b>     | <b>x19,x10,End</b> |                                |
|-------------|--------------------|--------------------------------|
| add         | x18,x18,x10        |                                |
| addi        | x19,x19,-1         | Soffset = 16 bytes = 8x2 bytes |
| j           | Loop               | 5                              |
| End: # targ | et instructio      | n                              |

| <u>;;;;;;;</u> | 01010      | 10011 | 000 | <u>;;;;;</u> | 1100011 |
|----------------|------------|-------|-----|--------------|---------|
| imm            | imm rs2=10 |       | BEQ | imm          | BRANCH  |

## **RISC-V** Immediate Encoding

#### Instruction encodings, inst[31:0]

| <u>31</u> | 30                | 25  | 24   | 20 | 19  | 1514 | 12     | 2 11 8  | 376        | 0 | _      |
|-----------|-------------------|-----|------|----|-----|------|--------|---------|------------|---|--------|
|           | funct7            |     | rs2  |    | rs1 | :    | funct3 | rd      | opcode     |   | R-type |
|           | imm               | [11 | L:0] |    | rs1 | :    | funct3 | rd      | opcode     |   | l-type |
| in        | <b>nm [11:5</b> ] |     | rs2  |    | rs1 |      | funct3 | imm[4:0 | ] opcode   |   | S-type |
| im        | m [12   10        | :5] | rs2  |    | rs1 |      | funct3 | imm[4:1 | 11] opcode |   | B-type |

#### 32-bit immediates produced, imm[31:0]

| 31       | 25            | 24       | 12    | 11         | 10   | 5                      | 4     | 1      | 0                  |        |
|----------|---------------|----------|-------|------------|------|------------------------|-------|--------|--------------------|--------|
|          | -ir           | nst[31   | ] –   |            | inst | [30:25]                | inst[ | 24:21] | inst[20]           | l-imm. |
|          |               |          |       |            |      |                        |       |        |                    |        |
|          | -ir           | nst[31   | ] -   |            | inst | [30:25]                | inst  | [11:8] | inst[7]            | S-imm. |
|          |               |          |       |            |      |                        |       |        |                    |        |
| <u> </u> | -inst[3       | 31]-     | i     | .nst[7]    | inst | [30:25]                | inst  | [11:8] | 0                  | B-imm. |
| Upper b  | bits sign-ext | ended fr | rom i | nst[31] al |      | Only bit 7<br>immediat |       |        | anges role in<br>B |        |

### Branch Example, complete encoding

#### beq x19,x10, offset = 16 bytes



### All RISC-V Branch Instructions

| imm[12 10:5] | rs2 | rs1 | 000 | imm[4:1 11] | 1100011 | BEQ  |
|--------------|-----|-----|-----|-------------|---------|------|
| imm[12 10:5] | rs2 | rs1 | 001 | imm[4:1 11] | 1100011 | BNE  |
| imm[12 10:5] | rs2 | rs1 | 100 | imm[4:1 11] | 1100011 | BLT  |
| imm[12 10:5] | rs2 | rs1 | 101 | imm[4:1 11] | 1100011 | BGE  |
| imm[12 10:5] | rs2 | rs1 | 110 | imm[4:1 11] | 1100011 | BLTU |
| imm[12 10:5] | rs2 | rs1 | 111 | imm[4:1 11] | 1100011 | BGEU |

# **Questions on PC-addressing**

- Does the value in branch immediate field change if we move the code?
  - If moving individual lines of code, then yes
  - If moving all of code, then no ('position-independent code')
- What do we do if destination is > 2<sup>10</sup> instructions away from branch?
  - Other instructions save us

## U-Format for "Upper Immediate" Instructions

| 31 12              | 11 7 | 6 0    |
|--------------------|------|--------|
| imm[31:12]         | rd   | opcode |
| 20                 | 5    | 7      |
| U-immediate[31:12] | dest | LUI    |
| U-immediate[31:12] | dest | AUIPC  |

- Has 20-bit immediate in upper 20 bits of 32-bit instruction word
- One destination register, rd
- Used for two instructions
  - LUI Load Upper Immediate
  - AUIPC Add Upper Immediate to PC

## LUI to Create Long Immediates

- LUI writes the upper 20 bits of the destination with the immediate value, and clears the lower 12 bits.
- Together with an ADDI to set low 12 bits, can create any 32-bit value in a register using two instructions (LUI/ADDI).

LUI x10, 0x87654 # x10 = 0x87654000ADDI x10, x10, 0x321# x10 = 0x87654321

### **One Corner Case**

How to set 0xDEADBEEF? LUI x10, 0xDEADB # x10 = 0xDEADB000ADDI x10, x10, 0xEEF # x10 = 0xDEADAEEF

ADDI 12-bit immediate is always sign-extended, if top bit is set, will subtract 1 from upper 20 bits

## Solution

```
How to set 0xDEADBEEF?
LUI x10, 0xDEADC # x10 = 0xDEADC000
ADDI x10, x10, 0xEEF # x10 = 0xDEADBEEF
```

Pre-increment value placed in upper 20 bits, if sign bit will be set on immediate in lower 12 bits.

Assembler pseudo-op handles all of this:

li x10, 0xDEADBEEF # Creates two instructions

### **Actually: Important!**

The assembler treats the provided number for ADDI as signed number. So in order to get 0xEEF, we have to provide the according negative number! So actually, only this works:

ADDI x10, x10, -273 # -273 = 0xffffffeef

### AUIPC

- Adds upper immediate value to PC and places result in destination register
- Used for PC-relative addressing

Label: AUIPC x10, 0 # Puts address of label in x10

### J-Format for Jump Instructions



- JAL saves PC+4 in register rd (the return address)
  - Assembler "j" jump is pseudo-instruction, uses JAL but sets rd=x0 to discard return address
- Set PC = PC + offset (PC-relative jump)
- Target somewhere within ±2<sup>19</sup> locations, 2 bytes apart
   ±2<sup>18</sup> 32-bit instructions
- Immediate encoding optimized similarly to branch instruction to reduce hardware cost

### Uses of JAL

- # j pseudo-instruction
- j Label = jal x0, Label # Discard return address
- # Call function within  $2^{18}$  instructions of PC jal ra, FuncName

# JALR Instruction (I-Format)

| 31 |              | 20 19 | 1514 | 4     | 12 | 11 7 | 760    |
|----|--------------|-------|------|-------|----|------|--------|
|    | imm[11:0]    | r     | s1   | func3 |    | rd   | opcode |
|    | 12           |       | 5    | 3     |    | 5    | 7      |
|    | offset[11:0] | ba    | ase  | 0     |    | dest | JALR   |

- JALR rd, rs, immediate
  - Writes PC+4 to rd (return address)
  - Sets PC = rs + immediate
  - Uses same immediates as arithmetic and loads
    - *no* multiplication by 2 bytes
    - In contrast to branches and JAL

### Uses of JALR

# ret and jr psuedo-instructions
ret = jr ra = jalr x0, ra, 0

# Call function at any 32-bit absolute address
lui x1, <hi20bits>
jalr ra, x1, <lo12bits>

# Jump PC-relative with 32-bit offset auipc x1, <hi20bits> jalr x0, x1, <lo12bits>

### Summary of RISC-V Instruction Formats

| <u>31</u> 30 25 | 24 21 20 | 19 15  | 14 12  | 2 11 8 7    | 6 0    | _      |
|-----------------|----------|--------|--------|-------------|--------|--------|
| funct7          | rs2      | rs1    | funct3 | rd          | opcode | R-type |
| imm[11          | L:0]     | rs1    | funct3 | rd          | opcode | l-type |
| imm[11:5]       | rs2      | rs1    | funct3 | imm[4:0]    | opcode | S-type |
| imm[12 10:5]    | rs2      | rs1    | funct3 | imm[4:1 11] | opcode | B-type |
|                 | imm[3    | 31:12] |        | rd          | opcode | U-type |
| imm[20 10:      | 1 11]]   | imm[   | 19:12] | rd          | opcode | J-type |

#### CORE INSTRUCTION FORMATS

|    | 31                    | 27 | 26 | 25 | 24  | 20    | 19 | 15     | 14  | 12          | 11 | 7      | 6    | 0  |
|----|-----------------------|----|----|----|-----|-------|----|--------|-----|-------------|----|--------|------|----|
| R  | funct7                |    |    | rs | 2   | rsl   |    | funct3 |     | rd          |    | Opcode |      |    |
| I  | imm[11:0]             |    |    |    | rs1 |       |    | funct3 |     | rd          |    | Opcode |      |    |
| s  | imm[11:5]             |    |    | rs | 2   | rsl   |    | funct3 |     | imm[4:0]    |    | opco   | de   |    |
| SB | imm[12 10:5]          |    |    | rs | 2   | 2 rs1 |    |        | ct3 | imm[4:1 11] |    | opco   | de   |    |
| U  | imm[31:12]            |    |    |    |     |       |    |        |     |             | rd |        | opco | de |
| UJ | imm[20 10:1 11 19:12] |    |    |    |     |       |    |        |     |             | rc | 1      | opco | de |

### Complete RV32I ISA

| $\operatorname{imm}[31:12]$   |                 |     |     | rd          | 0110111 | LUI   |                                       | · · · · · · · · · · · · · · · · · · ·   | _     | +    | +     |         |            |
|-------------------------------|-----------------|-----|-----|-------------|---------|-------|---------------------------------------|-----------------------------------------|-------|------|-------|---------|------------|
|                               | imm[31:12]      |     |     | rd          | 0010111 | AUIPC | 000000                                | shamt                                   | rs1   | 001  | rd    | 0010011 | SLLI       |
| imr                           | m[20 10:1 11 19 |     |     | rd          | 1101111 | JAL   | 0000000                               | shamt                                   | rs1   | 101  | rd    | 0010011 | SRLI       |
| imm[11:0                      |                 | rs1 | 000 | rd          | 1100111 | JALR  | 0100000                               | shamt                                   | rs1   | 101  | rd    | 0010011 | SRAI       |
| imm[12 10:5]                  | rs2             | rs1 | 000 | imm[4:1 11] | 1100011 | BEQ   | 0000000                               | rs2                                     | rs1   | 000  | rd    | 0110011 | ADD        |
| imm[12 10:5]                  | rs2             | rs1 | 000 | imm[4:1 11] | 1100011 | BNE   | 0100000                               | rs2                                     | rs1   | 000  | rd    | 0110011 | SUB        |
| imm[12 10:5]                  | rs2             | rs1 | 100 | imm[4:1 11] | 1100011 | BLT   | 0000000                               | rs2                                     | rs1   | 001  | rd    | 0110011 | SLL        |
| mm[12 10.5]<br>mm[12 10:5]    | rs2             | rs1 | 100 | imm[4:1 11] | 1100011 | BGE   | 0000000                               | rs2                                     | rs1   | 010  | rd    | 0110011 | SLT        |
| $\operatorname{imm}[12 10.5]$ | rs2             | rs1 | 110 | imm[4:1 11] | 1100011 | BLTU  | 0000000                               | rs2                                     | rs1   | 011  | rd    | 0110011 | SLTU       |
| $\operatorname{imm}[12 10:5]$ | rs2             | rs1 | 110 |             | 1100011 | BGEU  | 0000000                               | rs2                                     | rs1   | 100  | rd    | 0110011 | XOR        |
|                               |                 |     |     | imm[4:1 11] |         |       | 0000000                               | rs2                                     | rs1   | 101  | rd    | 0110011 | SRL        |
| imm[11:0                      | 1               | rs1 | 000 | rd          | 0000011 |       | 0100000                               | rs2                                     | rs1   | 101  | rd    | 0110011 | SRA        |
| imm[11:0                      |                 | rs1 | 001 | rd          | 0000011 | LH    | 0000000                               | rs2                                     | rs1   | 110  | rd    | 0110011 | OR         |
| imm[11:0                      |                 | rs1 | 010 | rd          | 0000011 | LW    | 0000000                               | rs2                                     | rs1   | 111  | rd    | 0110011 | AND        |
| imm[11:0                      | 1               | rs1 | 100 | rd          | 0000011 | LBU   | 0000                                  | pred succ                               |       | 000  | 00000 | 0001111 | FENCE      |
| imm[11:0                      |                 | rs1 | 101 | rd          | 0000011 | LHU   | 0000                                  | 0000 0000                               |       | 000  | 00000 | 0001111 | FENCE.I    |
| imm[11:5]                     | rs2             | rs1 | 000 | imm[4:0]    | 0100011 | SB    |                                       | 000000000000000000000000000000000000000 | 00000 | 000  | 00000 | 1110011 | ECALL      |
| imm[11:5]                     | rs2             | rs1 | 001 | imm[4:0]    | 0100011 | SH    |                                       | 00000001                                | 00000 | 000  | 00000 | 1110011 | EBREAK     |
| imm[11:5]                     | rs2             | rs1 | 010 | imm[4:0]    | 0100011 | SW    |                                       |                                         |       |      |       |         | CSRRW      |
| imm[11:0                      | J <u> </u>      | rs1 | 000 | rd          | 0010011 | ADDI  | · · · · · · · · · · · · · · · · · · · | csrNot ir                               |       |      | rd    | 1110011 |            |
| imm[11:0                      | 1               | rs1 | 010 | rd          | 0010011 | SLTI  |                                       |                                         | ILAP  | 2011 | lles_ | 1110011 | CSRRS      |
| imm[11:0                      |                 | rs1 | 011 | rd          | 0010011 | SLTIU |                                       | CST                                     | rsi   | 011  | ra    | 1110011 | CSRRC      |
| imm[11:0                      |                 | rs1 | 100 | rd          | 0010011 | XORI  |                                       | CST                                     | zimm  | 101  | rd    | 1110011 | CSRRWI     |
| imm[11:0                      |                 | rs1 | 110 | rd          | 0010011 | ORI   |                                       | CST                                     | zimm  | 110  | rd    | 1110011 | CSRRSI     |
| imm[11:0                      |                 | rs1 | 110 | rd          | 0010011 | ANDI  | (                                     | csr                                     | zimm  | 111  | rd    | 1110011 | CSRRCI     |
|                               | <u> </u>        | 191 | 111 | Iu          | 0010011 | ANDI  |                                       |                                         |       |      |       |         | <b>J</b> 1 |

# "And in Conclusion..."

- Simplification works for RISC-V: Instructions are same size as data word (one word) so that they can use the same memory.
- Computer actually stores programs as a series of these 32-bit numbers.
- We have covered all RISC-V instructions and registers
  - R-type, I-type, S-type, B-type, U-type and J-type instructions
  - Practice assembling and disassembling

### **Question:**

#### Piazza: "Lecture 6 RISC-V poll"

- Select (check) the machine instructions that correctly correspond to the Assembly instruction
- Venus is your friend!
  - But solve at least I(jal x1 44) by hand!

| A | 0 x | FF 01 | 01 | 13 | addi | sp sp -16 |
|---|-----|-------|----|----|------|-----------|
| В | 0 x | 00 05 | 04 | 13 | mv   | t0 a0     |
| C | 0 x | 00 00 | 05 | 13 | mv   | a0 x0     |
| D | 0 x | 00 11 | 26 | 23 | SW   | x1 16(x2) |
| Е | 0 x | 00 81 | 24 | 23 | SW   | s0 8(sp)  |
| F | 0 x | 01 21 | 20 | 23 | SW   | x18 0(x4) |
| G | 0 x | 00 03 | 16 | 63 | bne  | x10 x0 12 |
| Н | 0 x | 03 00 | 00 | 6F | jal  | x1 48     |
| I | 0 x | FD 5F | FO | EF | jal  | x1 -44    |
| J | 0 x | 00 01 | 02 | В7 | li   | t0 65536  |
| K | 0 x | 00 00 | 80 | 67 | ret  |           |