Computer Architecture I ShanghaiTech University
Project 1.1 Project 1.2 Project 2.1
In project 1.2, you will implement a disassembler which supports a part of the RISC-V instructions. The disassembler will translate 32-bits machine code to RISC-V code. This project is easy as long as you took the courses, Lab 3 and Lab 4 seriously.
Implementing a real disassembler which supports all RISC-V instructions is exhausting. So in this project we only need to disassemble the part of the RISC-V instructions which were mentioned in project 1.1. The instruction set is listed below.
INSTRUCTION | TYPE | OPCODE | FUNCT3 | FUNCT7 / IMM | OPERATION |
---|---|---|---|---|---|
add | R | 0x33 | 0x0 | 0x00 | R[rd] ← R[rs1] + R[rs2] |
or | R | 0x33 | 0x6 | 0x00 | R[rd] ← R[rs1] | R[rs2] |
slt | R | 0x33 | 0x2 | 0x00 | R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0 |
sltu | R | 0x33 | 0x3 | 0x00 | R[rd] ← (U(R[rs1]) < U(R[rs2])) ? 1 : 0 |
sll | R | 0x33 | 0x1 | 0x00 | R[rd] ← R[rs1] << R[rs2] |
jalr | I | 0x67 | 0x0 | R[rd] ← PC + 4 PC ← R[rs1] + imm |
|
addi | I | 0x13 | 0x0 | R[rd] ← R[rs1] + imm | |
ori | I | 0x13 | 0x6 | R[rd] ← R[rs1] | imm | |
lb | I | 0x03 | 0x0 | R[rd] ← SignExt(Mem(R[rs1] + offset, byte)) | |
lbu | I | 0x03 | 0x4 | R[rd] ← U(Mem(R[rs1] + offset, byte)) | |
lw | I | 0x03 | 0x2 | R[rd] ← Mem(R[rs1] + offset, word) | |
sb | S | 0x23 | 0x0 | Mem(R[rs1] + offset) ← R[rs2][7:0] | |
sw | S | 0x23 | 0x2 | Mem(R[rs1] + offset) ← R[rs2] | |
beq | SB | 0x63 | 0x0 | if(R[rs1] == R[rs2]) PC ← PC + {offset, 1b’0} |
|
bne | SB | 0x63 | 0x1 | if(R[rs1] != R[rs2]) PC ← PC + {offset, 1b’0} |
|
blt | SB | 0x63 | 0x4 | if(R[rs1] < R[rs2]) PC ← PC + {offset, 1b’0} |
|
bge | SB | 0x63 | 0x5 | if(R[rs1] >= R[rs2]) PC ← PC + {offset, 1b’0} |
|
jal | UJ | 0x6f | R[rd] ← PC + 4 PC ← PC + {imm, 1b’0} |
||
lui | U | 0x37 | R[rd] ← {offset, 12b’0} |
For further reference, here are the bit lengths of the instruction components.
inst rd, rs1, rs2
R-TYPE | funct7 | rs2 | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|---|
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
I-TYPE: | imm[11:0] | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
Bits | 12 | 5 | 3 | 5 | 7 |
S-TYPE | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
---|---|---|---|---|---|---|
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
SB-TYPE | imm[12] | imm[10:5] | rs2 | rs1 | funct3 | imm[4:1] | imm[11] | opcode |
---|---|---|---|---|---|---|---|---|
Bits | 1 | 6 | 5 | 5 | 3 | 4 | 1 | 7 |
U-TYPE | imm[31:12] | rd | opcode |
---|---|---|---|
Bits | 20 | 5 | 7 |
UJ-TYPE | imm[20] | imm[10:1] | imm[11] | imm[19:12] | rd | opcode |
---|---|---|---|---|---|---|
Bits | 1 | 10 | 1 | 8 | 5 | 7 |
The reference is RISC-V Green Card. If there are some mistakes above, the RISC-V Green Card would prevail.
When you disassemble the RISC-V instructions, you have to use named registers. The registers’ names are defined as below. If you don’t use the defined name, you may fail in some testcases.
REGISTER | x0 | x1 | x2 | x3 | x4 | x5-x7 | x8 | x9 | x10-x11 | x12-x17 | x18-x27 | x28-x31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NAME | x0 | ra | sp | gp | tp | t0-t2 | s0 | s1 | a0-a1 | a2-a7 | s2-s11 | t3-t6 |
Here is a very simple template to begin with.
A reference input is already provided to you in the input.S
file. The final tests’ input have the same format as the provided input except the number of machine instructions and the contents of machine code. You may find that the format of input is very similar to Homework 3.
.data
# Constant integer specifying the lines of machine codes
# DO NOT MODIFY THIS VARIABLE
.globl lines_of_machine_codes
lines_of_machine_codes:
.word 8
# 32-bits machine codes
# A 32-bits hexadecimal number represents one line of machine code.
# You can suppose all of the input machine codes are valid.
# DO NOT MODIFY THIS VARIABLE
.globl machine_codes
machine_codes:
.word 0x000502B3 # add t0, a0, x0
.word 0x00100313 # addi t1, x0, 1
.word 0x00028863 # beq t0, x0, 16
.word 0x01DE13B3 # sll t2, t3, t4
.word 0xFFF28293 # addi t0, t0, -1
.word 0xFF5FF06F # jal x0, -12 Here we use offset instead of label.
.word 0x00600533 # add a0, x0, t1
.word 0x00008067 # jalr x0, ra, 0
You may assume that all the machine codes are valid and each of them can be disassembled to one of the instructions mentioned above.
It’s usually the duty of the supervisor (operating system) to deal with input/output and halting program execution. Venus, being a simple emulator, does not offer us such luxury, but supports a list of primitive environmental calls. A snippet of assembly of how to exit with error code 0 is already provided to you in disassembler.S
. The output line ‘ Exited with error code 0’ is neccessary. So please use ID17(exit2) environmental environmental call instead of ID10(exit) environmental call.
In this project, you need to output RISC-V instructions. We will compare your output to the correct RISC-V code. Although the output format isn’t so strict, there are also some rules.
Your output should be recognized as correct input to venus. Your output code may not be able to run by venus, it depends on the testcases. But venus must be able to translate your output to machine code.
Each line can only have one instruction and don’t put semicolon at the end of each instruction.
When dealing with SB type instructions, you don’t need to create labels. You just need to put the offset there instead of label. e.g. bge a0, x0, -12
.
The commas between registers and immdiate are necessary. e.g. addi x0, a1, 4
lw s0, 4(sp)
are accepted; addi x0 a1 4
lw s0 4(sp)
will not be accepted;
The immediate and offset should be represented as a decimal number. The other such as hex and binary won’t be accepted.
For R-Type (inst rd, rs1, rs2), I-Type (inst rd, rs1, imm / inst rd, offset(rs1)),UJ-Type (inst rd, imm),
U-Type (inst rd, imm). There must be at least one whitespace between inst
and rd
. Besides that, you can add any whitespace before or after rd
, rs1
, rs2
, offset
and imm
for your convinience. e.g. inst rd, offset( rs1)
inst rd, rs1 , imm
both will be accpeted.
For S-Type (inst rs2, offset(rs1)), SB-Type (inst rs1, rs2, offset). There must be at least one whitespace between inst
and rs1
/ rs2
. The other rules are same as above.
When we test your output, we will transform your output to our predefined format. This work is done by us. So don’t worry so much about the format. The format we will use are R-Type (inst rd, rs1, rs2), I-Type (inst rd, rs1, imm / inst rd, offset(rs1)),UJ-Type (inst rd, imm), U-Type (inst rd, imm), SB-Type (inst rs1, rs2, offset) and S-Type(inst rs2, offset(rs1)). It means that we may only remove some whitespace from your output.
e.g. for the final format. (This work is done by us.)
add x0, a1, a2 -> add x0,x1,x24 -> addi a1,a2,4
addi a1, a2 , 4( a3 ) -> lw a2,4(a3) lw a2,
The command that we use to test your program’s correctness is
diff <your_transformed_output> <reference_output>
You can also test your result using this command.
Make sure that venus-jvm-latest.jar
disassembler.S
and input.S
reside in the same directory. To run your program locally and write the output to disassembler.output
java -jar venus-jvm-latest.jar disassembler.S >> disassembler.output
To debug your program online, you might want to replace .import input.S
in disassembler.S
with the content of input.S.
You can use any RISC-V instruction as long as venus can recognize them.
To interact with the input file, you can use the global labels defined in input.S
. For example, you can get the lines_of_machine_codes
with the following code:
la a7, lines_of_machine_codes
Handwritten assembly are postfixed with extension .S
to distinguish from compiler generated assembly .s
You can learn how to output a string, int or char from Lab 3 and Lab 4.
Actually almost all things you need can be learnt from venus Wiki.
Learn save and load from memory using RISC-V.
Be careful about the calling convention, it will make life easier.
Write comments.
The test cases are very friendly! Don’t focus too much on the edge cases, focus on the correctness on the common cases.
We will test your program using RISC-V emulator venus. You probably want to read this before you started.
#
.autolab.txt
, as in Project 1.1. From your gitlab we will only use disassembler.S
. Do NOT include venus or any other quite big files into the git - you are welcome to use the git to also share your test input and other small support files with your teammate.Last Modified: Mon Mar 29 23:11:13 CST 2021