Project 2.2 - Computer Architecture I - ShanghaiTech University

Project 2.2: CPU

Computer Architecture I ShanghaiTech University
Project 2.1 Project 2.2 Project 3

Overview

MAKE SURE TO CHECK YOUR CIRCUITS WITH THE GIVEN HARNESSES TO SEE IF THEY FIT! YOU WILL FAIL ALL OUR TESTS IF THEY DO NOT.
(This also means that you should not be moving around given inputs and outputs in the circuits).
Sample tests for a completed alu, regfile, mem and cpu have been included in the proj2.2_framework. Given the current directory structure, you can run the bash script (long-test.sh) with your *.circ files in the same directory and it will run the autograder. We recommend running the sample tests locally, but the autograder only works with python 2.7. These tests are NOT comprehensive, you will need to do further testing on your own.
You are allowed to use any of Logisim's built-in blocks for all parts of this project.
Save often. Logism can be buggy and the last thing you want is to lose some of your hard work.

In this project you will be using MIPS-Logisim to create a 32-bit two-cycle processor. It is similar to MIPS, except that memory addresses represent 32-bit words instead of 8-bit bytes (word-addressed instead of byte-addressed). Also, all addresses are 24-bits wide instead of 32-bits, due to limitations in Logisim. Throughout the implementation of this project, we'll be making design choices that make it compatible with machine code outputs from MARS and your Project 1!

Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

Pipelining

Your processor will have a 2-stage pipeline:

Instruction Fetch: An instruction is fetched from the instruction memory.
Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal MIPS pipeline.

You should note that data hazards do NOT pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not necessarily executed if the branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or a taken branch. Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction. Notice that 0x0000 is a nop instruction; please use this, as it will simplify grading and testing. You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-pipeline design. If you are unsure about pipelining, it is perfectly fine (maybe even recommended) to first implement a single-cycle processor. This will allow you to first verify that your instruction decoding, control signals, arithmetic operations, and memory accesses are all working properly. From a single-cycle processor you can then split off the Instruction Fetch stage with a few additions and a few logical tweaks. Some things to consider:

Will the IF and EX stages have the same or different PC values?
Do you need to store the PC between the pipelining stages?
To MUX a nop into the instruction stream, do you place it before or after the instruction register?
What address should be requested next while the EX stage executes a nop? Is this different than normal?

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulate --> Reset Simulation (Ctrl+R or Command+R) to reset your processor.

Deliverables

Approach this project like you would any coding assignment: construct it piece by piece and test each component early and often!

Tidyness and readability will be a large factor in grading your circuit if there are any issues, so please make it as neat as possible!

1) Obtaing the Files

We have provided a framework: proj2.2_framework. Please copy your regfile.circ and alu.circ from project 2.1 to this framework. Also notice that you should use the MIPS-logisim.jar in this framework rather than the one in proj2.1_framework , they are different!

2) Processor

We have provided a skeleton for your processor in cpu.circ along with a testing harness in run.circ. Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA) using a two-cycle pipeline. Your processor will contain an instance of your ALU, Data Memory, and Register File. You are also responsible for constructing the entire datapath and control from scratch. It will interact with our harness through 5 inputs and 10 outputs.

One important thing to notice is that we have two different locations with registers now, both in the regfile you created and in run.circ. If an instruction uses $t0-$t3, then pipe these requests to your own regfile. However, if an instruction requires any of the other 28 registers, output those requests to run.circ. Do note that the numbering system for all 32 registers is the same as in MIPS. Also, the registers in run.circ have been initialized to random numbers. In addition, $0, $t0-$t3 registers in run.circ are not changeable - nothing happens if you write to those registers.

NOTE: You also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. This should be viewable in your main circuit.

Your processor will get its program from the processor harness we have provided in run.circ. It will send the address of instruction memory it wants to access to the harness through an output, and accept the instruction at that address as an input. Inspect run.circ to see exactly what's going on. Your processor has 4 inputs that come from the harness:

Input Name	Bit Width	Description
RS_READ_VALUE	32	Driven with the value of the register specified in RS
RT_READ_VALUE	32	Driven with the value of the register specified in RT
INSTRUCTION	32	Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK	1	The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

Your processor must provide 10 outputs to the harness:

Output Name	Bit Width	Description
$t0	32	Driven with the contents of $t0.
$t1	32	Driven with the contents of $t1.
$t2	32	Driven with the contents of $t2.
$t3	32	Driven with the contents of $t3.
RS	5	Determines which register's value is sent to RS_READ_VALUE (see above).
RT	5	Determines which register's value is sent to RT_READ_VALUE (see above).
Write Register	5	Determines which register to set to Write Data on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_VALUE	32	Determines what data to write to the register identified by the Write Register input on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_ENABLE	1	Determines whether data is written on the next rising edge of the clock.
FETCH_ADDRESS	24	This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Follow the same instructions as the register file and ALU regarding rearranging inputs and outputs of the processor. In particular, you should ensure that your processor is correctly loaded by a fresh copy of run.circ before you submit.

3) Data Memory

You will build your Data Memory on your own using a RAM module. Note that this is different than a ROM module. Logisim RAM modules can be found in the built-in Memory library/folder.

Although the input address is 32 bits, due to limitations in Logisim, you should only be using 24 bits to address your RAM module. Consider carefully which 24 bits you want to use, given that the input is byte addressed. We will be losing a few bits of information by doing this, but that's okay for the purposes of this assignment.

In addition, you can assume that the .data base address is 0x10010000, just like in MARS. Notice that this means the instructions you have will refer to memory starting at 0x10010000, which should translate to 0x000000 for your RAM modules.

For lw/sw, you can assume that the addresses are properly aligned, as per MIPS instructions. For lb/lbu, make sure you are only returning the specified byte, loaded into the least significant byte of Data Mem Out. You will need to deal with sign/zero-extending the output in your CPU.

NOTE: Your data memory should use little endian.

We have provided a skeleton for your data memory in mem.circ along with a testing harness in mem-harness.circ. We will be testing your memory module separately for correctness. It has 5 inputs:

Input Name	Bit Width	Description
Data Mem Addr	32	Byte-addressed address to read from or write to.
Data Mem In	32	Determines what data to write into the address indentified by "Data Mem Addr" on the next rising edge of the clock, assuming that MEM_WRITE is asserted.
ACCESS_BYTE	1	1 iff we are loading or storing a byte.
MEM_WRITE	1	Determines whether data is written on the next rising edge of the clock.
CLOCK	1	Input for the clock. This can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

The Data Memory also has 1 output:

Output Name	Bit Width	Description
Data Mem Out	32	Driven with the value at the address identified by "Data Mem Addr".

For those unfamiliar with the RAM module, the pictures above show a good way to wire up a circuit to use RAM. You are not required to implement Data Memory as shown above and you can use a memory with separate read and write ports if you should so desire.

Here are a few things to know about the RAM module before you get started:

"clk" provides synchronization for memory writes. Be sure to use the same clock here as you do for your Register File.
"sel" determines whether or not the RAM module is active. We will probably not run into any cases where we need to turn our RAM off, so you can wire a constant 1 to this.
"A" chooses which address will be accessed.
"clr" will instantly set all contents of memory to 0 if high. You should wire a constant 0 to this port, as we will not be using it.
"ld" determines whether we are reading or writing to memory. If "ld" is high, then "D" will be driven with the contents of memory at address "A" (left image). If "ld" is low, then the contents of "D" will be stored in memory at address "A" (right image).
"D" acts as both data in and data out for this module. This means you have to be careful not to drive this line from two conflicting sources, which in this case are DataIn and the output of the memory. You can solve this by using a controlled buffer (a.k.a. a tri-state buffer) on the "D" port of the RAM module. By wiring logic to the "ld" port and the valve port of the controlled buffer together so that they are always opposite values (as in the pictures above), we can prevent conflicts between data being driven in and the contents of memory coming out.
The "poke" tool can be used to modify the contents of the memory. You can also use right-click --> Load Image... to load an image from a file.

The best way to learn how these work is simply to play with them. You can also refer to Logisim documentation on RAM modules here.

Instruction Set Architecture (ISA)

You will be implementing a simple 32-bit two-cycle processor with 32 registers, but your regfile will only be responsible for four of them ($t0 - $t3). The numeric values for these registers are the same as the green sheet. It will have separate data and instruction memory. Just like MIPS, each of the four registers that you will be implementing is big enough to hold ONE word.

Your processor will be similiar to MIPS, except for memory addressing. Memory addresses will represent 32-bit words instead of 8-bit bytes. This means that the memory modules are word-addressed instead of byte-addressed. However, note that your instructions will be using byte-addressing, as it should be normal MIPS code. Make sure you keep track of which addresses are byte-addressed and which are word-addressed when thinking about MIPS instruction addressing, the instruction memory module, MIPS data addressing, and the data memory module!

IMPORTANT: Because of the limitations of Logisim, our memory addess will be 24 bits, unlike the normal 32 bit memory address in MIPS. Which bits would we need to truncate so that as many translations of MIPS code is supported as possible?

The instructions we will be looking at is below. Your processor will pull out a 32-bit value from instruction memory and determine the meaning of that instruction by looking at the opcode (the top 6 bits, which are bits 31-26). If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.

Notice how we do not use all the instructions in MIPS. Your project only has to work on these specified instructions, most of which you should have seen in project 1 along with a few extra ones, although we have taken out sb to simplify your memory file. This way the project is shorter and easier.

Instruction	Format
Add	add $rd, $rs, $rt
Add Unsigned	addu $rd, $rs, $rt
Sub	sub $rd, $rs, $rt
Sub Unsigned	subu $rd, $rs, $rt
And	and $rd, $rs, $rt
Or	or $rd, $rs, $rt
Set Less Than	slt $rd, $rs, $rt
Set Less Than Unsigned	sltu $rd, $rs, $rt
Jump Register	jr $rs
Shift Left Logical	sll $rd, $rt, shamt
Shift Right Logical	srl $rd, $rt, shamt
Shift Right Arithmetic	sra $rd, $rt, shamt
Add Immediate Unsigned	addiu $rt, $rs, immediate
And Immediate	andi $rt, $rs, immediate
Or Immediate	ori $rt, $rs, immediate
Load Upper Immediate	lui $rt, immediate
Load Byte	lb $rt, offset($rs)
Load Byte Unsigned	lbu $rt, offset($rs)
Load Word	lw $rt, offset($rs)
Store Word	sw $rt, offset($rs)
Branch on Equal	beq $rs, $rt, label
Branch on Not Equal	bne $rs, $rt, label
Jump	j label
Jump and Link	jal label
Count 1s	cnto $rd, $rs, $rt
Bit Palindrome	bitpal $rd, $rs

Some specifics on selected instructions:

Jumping

The argument to the jump and jal instructions is a pseudoabsolute address, similar to MIPS. The target address is an unsigned number representing the lower 26 bits of the next instruction to be executed. Because of the limitations of Logisim, we will have to shorten our address to 24 bits, so we will cut off the upper 2 bits of our address. We do NOT concatenate any zeroes to the bottom of our address like we would in MIPS. This is because our processor is word-addressed, so every possible address holds a valid 32-bit instruction
Remember that the assembler/linker we're using, MARS(and your project 1), will represent absolute addresses of the .text section starting from a base address of 0x00400000, byte-addressed. However, your instruction memory starts this section at 0x000000, word-addressed, so make sure you account for this offset while calculating your address for jumps (j, jr, jal).
Note that you should kill the next instruction after a jump, jr, or jal even if that is the instruction you are going to be jumping to.
On a jal the address of the next instruction should be written into $ra. Don't forget to add the offset to the address of the next instruction!
While we want to follow MIPS as much as possible, we have chosen to deviate from the default execution concerning instructions addresses within the CPU. Rather than converting everything back and forth between byte and word addressing, we will assume word-addressed addresses when working with jal and jr. The addresses jal stores should be word-addressed and jr assumes that the registers have word-addressed addresses.
Given the previous assumption, la would not work properly if you compiled it in MIPS. Instead, if you were to use lui and ori to load an address into a register, you should load in the word-addressed address.

Branching

The argument to the beq and bne instructions is a signed offset relative to the next instruction to be executed if we don't take the branch, which is similar to MIPS. Note that the address of this next instruction is PC+1 rather than PC+4 because our processor is word-addressed. Here, currPC means the address of the branch instruction. We can write beq as the following:

	if $rs == $rt
		nextPC = currPC+1 + offset
	else
		increment PC like normal

Think! There's a reason we write "increment PC like normal" here instead of just "currPC+1".
The bne instruction differs only by the conditional in the if statement: replace the == with !=.
Note that you should not kill the next instruction if the branch is not taken. If the branch is taken you should always kill the instruction.

Immediates

Note that the immediate field is only 16 bits wide, so we must perform some kind of extension on it before passing it to the ALU.If an immediate is supposed to be unsigned, be sure to zero-extend it. If an immediate is signed, be sure to sign-extend it. This should be the same specifications as on the MIPS green sheet.

Logisim Notes

If you are having trouble with Logisim, RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to Piazza about any crazy bugs that you find and we will investigate.

Things to Look Out For

Do NOT gate the clock! This is very bad design practice when making real circuits, so we will discourage you from doing this by heavily penalizing your project if you gate your clock.
BE CAREFUL with copying and pasting from different Logisim windows. Logisim has been known to have trouble with this in the past.
When you import another file (Project --> Load Library --> Logisim Library...), it will appear as a folder in the left-hand viewing pane. The skeleton files should have already imported necessary files.
Changing attributes before placing a component changes the default settings for that component. So if you are about to place many 32-bit pins, this might be desireable. If you only want to change that particular component, place it first before changing the attributes.
When you change the inputs & outputs of a sub-circuit that you have already placed in main, Logisim will automatically add/remove the ports when you return to main and this sometimes shifts the block itself. If there were wires attached, Logisim will do its automatic moving of these as well, which can be extremely dumb in some cases. Before you change the inputs and outputs of a block, it can sometimes be easier to first disconnect all wires from it.
Error signals (red wires) are obviously bad, but they tend to appear in complicated wiring jobs such as the one you will be implementing here. It's good to be aware of the common causes while debugging:

Testing

Once you've implemented your processor, you can test its correctness by writing programs to run on it! First, try this simple program as a sanity check: halt.s. This program loads the same immediate into two different registers using lui/ori and then branches back one instruction (offset = -1) if these registers are equal.

             Assembly:               Binary:
             ========                ======
             lui $t0, 0x3333         3c083333
             ori $t0, $t0, 0x4444    35084444
             lui $t1, 0x3333         3c093333
             ori $t1, $t1, 0x4444    35294444
       self: beq $t0, $t1, self      1109ffff

For practice, verify that the assembly on the left matches the translated binary on the right. This program effectively "halts" the processor by putting it into an infinite loop, so you can observe the outputs as well as memory and register state. Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.

To test your processor, open run.circ. Find the Instruction Memory RAM and right click --> Load Image... Select the assembled program (.hex file - see details on the Assembler below) to load it and then start clock ticks.

Assembler

We've provided a basic assembler (assembler.py) in the start kit to make writing your programs easier so you can use assembly instead of machine code. You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler.

The assembler takes files of the following form (this is halt.s, which is included in the start kit):

         
             lui $t0, 0x3333          #3c083333
             ori $t0, $t0, 0x4444     #35084444
             lui $t1, 0x3333          #3c093333
             ori $t1, $t1, 0x4444     #35294444
       self: beq $t0, $t1, self       #1109ffff

Commas are optional but the '$' is not. '#' starts a comment. The assembler can be invoked with the following command:

   $ python assembler.py input.s -o output.hex

The output file is input.hex if not explicitly set - that is, the same name as the input file but with a .hex extension. Use the -o option to change the output file name arbitrarily.

As an alternative to the assembler.py, you can also use MARS command line utilities to assemble your file. This will also allow you to create .hex files for your memory, although it won't assemble the new instructions we added to your processor. You can look at this link for specifics, but a sample script has been written in mars-assem.sh.

In addition, you are welcome to use your project 1 assembler and linker to create these .hex file! Try it out and marvel at having created 3/4th of the CALL process. Although, be wary of bugs in your project 1.

Submission

Put your alu.circ, regfile.circ, cpu.circ and mem.circ in an empty folder named proj2.2, then use the command below to compress it.

tar czvf proj2.2.tar proj2.2

Finally, submit the compressed file to Autolab.