Project 2.2 - Computer Architecture I - ShanghaiTech University

Project 2.2: CPU

Computer Architecture I ShanghaiTech University
Project 2.1 Project 2.2 Project 3

Overview

MAKE SURE TO CHECK YOUR CIRCUITS WITH THE GIVEN HARNESSES TO SEE IF THEY FIT! YOU WILL FAIL ALL OUR TESTS IF THEY DO NOT.
(This also means that you should not be moving around given inputs and outputs in the circuits).
This is a PARTNER project. Use the gradebot repositories accordingly!
Sample tests for a completed alu, regfile, mem and cpu have been included in the proj2-2StartKit. Given the current directory structure, you can run the bash script (long-test.sh) with your *.circ files in the same directory and it will run the autograder. We recommend running the sample tests locally, but the autograder only works with python 2.7. These tests are NOT comprehensive, you will need to do further testing on your own.
You are allowed to use any of Logisim's built-in blocks for all parts of this project.
Save often. Logism can be buggy and the last thing you want is to lose some of your hard work.

In this project you will be using MIPS-Logisim to create a 32-bit two-cycle processor. It is similar to MIPS, except that memory addresses represent 32-bit words instead of 8-bit bytes (word-addressed instead of byte-addressed). Also, all addresses are 24-bits wide instead of 32-bits, due to limitations in Logisim. Throughout the implementation of this project, we'll be making design choices that make it compatible with machine code outputs from MARS and your Project 1!

Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

Pipelining

Your processor will have a 2-stage pipeline:

Instruction Fetch: An instruction is fetched from the instruction memory.
Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal MIPS pipeline.

You should note that data hazards do NOT pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not necessarily executed if the branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or a taken branch. Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction. Notice that 0x0000 is a nop instruction; please use this, as it will simplify grading and testing. You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-pipeline design. If you are unsure about pipelining, it is perfectly fine (maybe even recommended) to first implement a single-cycle processor. This will allow you to first verify that your instruction decoding, control signals, arithmetic operations, and memory accesses are all working properly. From a single-cycle processor you can then split off the Instruction Fetch stage with a few additions and a few logical tweaks. Some things to consider:

Will the IF and EX stages have the same or different PC values?
Do you need to store the PC between the pipelining stages?
To MUX a nop into the instruction stream, do you place it before or after the instruction register?
What address should be requested next while the EX stage executes a nop? Is this different than normal?

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulate --> Reset Simulation (Ctrl+R) to reset your processor.

Deliverables

Approach this project like you would any coding assignment: construct it piece by piece and test each component early and often!

Tidyness and readability will be a large factor in grading your circuit if there are any issues, so please make it as neat as possible! If we can't comprehend your circuit, you will probably receive no partial credit.

Similarly to the first part, we will be distributing the project files through Github. You can look back on the step 0 for project 1 for more specific steps.

In the repository add a remote repo that contains the framework files:
git remote add framework http://shtech.org/course/ca/projects/proj2.2.git
Go and fetch the files:
git fetch framework
Now merge those files with your master branch:
git merge framework/master
The rest of the git commands work as usual.
Copy regfile.circ and alu.circ from proj2.1 to your new gradebot repo!

2) Processor

We have provided a skeleton for your processor in cpu.circ along with a testing harness in run.circ. Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA) using a two-cycle pipeline. Your processor will contain an instance of your ALU, Data Memory, and Register File. You are also responsible for constructing the entire datapath and control from scratch. It will interact with our harness through 5 inputs and 10 outputs.

One important thing to notice is that we have two different locations with registers now, both in the regfile you created and in run.circ. If an instruction uses $t0-$t3, then pipe these requests to your own regfile. However, if an instruction requires any of the other 28 registers, output those requests to run.circ. Do note that the numbering system for all 32 registers is the same as in MIPS. Also, the registers in run.circ have been initialized to random numbers. In addition, $0, $t0-$t3 registers in run.circ are not changeable - nothing happens if you write to those registers.

NOTE: You also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. This should be viewable in your main circuit.

Your processor will get its program from the processor harness we have provided in run.circ. It will send the address of instruction memory it wants to access to the harness through an output, and accept the instruction at that address as an input. Inspect run.circ to see exactly what's going on. Your processor has 4 inputs that come from the harness:

Input Name	Bit Width	Description
RS_READ_VALUE	32	Driven with the value of the register specified in RS
RT_READ_VALUE	32	Driven with the value of the register specified in RT
INSTRUCTION	32	Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK	1	The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

Your processor must provide 10 outputs to the harness:

Output Name	Bit Width	Description
$t0	32	Driven with the contents of $t0.
$t1	32	Driven with the contents of $t1.
$t2	32	Driven with the contents of $t2.
$t3	32	Driven with the contents of $t3.
RS	5	Determines which register's value is sent to RS_READ_VALUE (see above).
RT	5	Determines which register's value is sent to RT_READ_VALUE (see above).
Write Register	5	Determines which register to set to Write Data on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_VALUE	32	Determines what data to write to the register identified by the Write Register input on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_ENABLE	1	Determines whether data is written on the next rising edge of the clock.
FETCH_ADDRESS	24	This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Follow the same instructions as the register file and ALU regarding rearranging inputs and outputs of the processor. In particular, you should ensure that your processor is correctly loaded by a fresh copy of run.circ before you submit.

3) Data Memory

You will build your Data Memory on your own using a RAM module. Note that this is different than a ROM module. Logisim RAM modules can be found in the built-in Memory library/folder.

Although the input address is 32 bits, due to limitations in Logisim, you should only be using 24 bits to address your RAM module. Consider carefully which 24 bits you want to use, given that the input is byte addressed. We will be losing a few bits of information by doing this, but that's okay for the purposes of this assignment.

In addition, you can assume that the .data base address is 0x10010000, just like in MARS. Notice that this means the instructions you have will refer to memory starting at 0x10010000, which should translate to 0x000000 for your RAM modules.

For lw/sw, you can assume that the addresses are properly aligned, as per MIPS instructions. For lb/lbu, make sure you are only returning the specified byte, loaded into the least significant byte of Data Mem Out. You will need to deal with sign/zero-extending the output in your CPU.

NOTE: Your data memory should use little endian.

EXTRA FOR EXPERTS: Implement logic for sb in your data memory. Make sure you don't overwrite any other byte that's part of the same word.

We have provided a skeleton for your data memory in mem.circ along with a testing harness in mem-harness.circ. We will be testing your memory module separately for correctness. It has 5 inputs:

Input Name	Bit Width	Description
Data Mem Addr	32	Byte-addressed address to read from or write to.
Data Mem In	32	Determines what data to write into the address indentified by "Data Mem Addr" on the next rising edge of the clock, assuming that MEM_WRITE is asserted.
ACCESS_BYTE	1	1 iff we are loading or storing a byte.
MEM_WRITE	1	Determines whether data is written on the next rising edge of the clock.
CLOCK	1	Input for the clock. This can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

The Data Memory also has 1 output:

Output Name	Bit Width	Description
Data Mem Out	32	Driven with the value at the address identified by "Data Mem Addr".

For those unfamiliar with the RAM module, the pictures above show a good way to wire up a circuit to use RAM. You are not required to implement Data Memory as shown above and you can use a memory with separate read and write ports if you should so desire.

Here are a few things to know about the RAM module before you get started:

"clk" provides synchronization for memory writes. Be sure to use the same clock here as you do for your Register File.
"sel" determines whether or not the RAM module is active. We will probably not run into any cases where we need to turn our RAM off, so you can wire a constant 1 to this.
"A" chooses which address will be accessed.
"clr" will instantly set all contents of memory to 0 if high. You should wire a constant 0 to this port, as we will not be using it.
"ld" determines whether we are reading or writing to memory. If "ld" is high, then "D" will be driven with the contents of memory at address "A" (left image). If "ld" is low, then the contents of "D" will be stored in memory at address "A" (right image).
"D" acts as both data in and data out for this module. This means you have to be careful not to drive this line from two conflicting sources, which in this case are DataIn and the output of the memory. You can solve this by using a controlled buffer (a.k.a. a tri-state buffer) on the "D" port of the RAM module. By wiring logic to the "ld" port and the valve port of the controlled buffer together so that they are always opposite values (as in the pictures above), we can prevent conflicts between data being driven in and the contents of memory coming out.
The "poke" tool can be used to modify the contents of the memory. You can also use right-click --> Load Image... to load an image from a file.

The best way to learn how these work is simply to play with them. You can also refer to Logisim documentation on RAM modules here.

4) Test Code

Since you are building a processor, you can run actual programs on it!

There are more details about testing your processor and the provided assembler in the Testing section, but in particular you will be REQUIRED to write and submit the following program:

Write a program that calls lfsr, beginning with a given input, until it reaches a value that is a palindrome. The seed for lfsr will be stored at register $a0. You must save the final palindrome value in register $v0. If you reach your seed value without finding a palindrome, save the seed value in register $v0. Your function must be labeled: "LfsrPalindrome:". Save the assembler source in a file called lfsrpalindrome.s. Remember, at the end of the function, you must jump back to the caller by calling jr $ra.

Write this function as you would a normal MIPS function, remembering to stay within your processor's ISA. You cannot assume anything about the values in the registers when your function is called. It is recommended that you write your own main functions to set up the registers, though your submitted code should contain ONLY the functions themselves.

We will test your code by appending it to our own main functions for different test cases, assembling using assembler.py, and then running it on both your submitted processor and a known working processor. This is why it is important that you include only your function and not other testing statements. Our main functions will all end in a halt (an instruction that jumps or branches to itself indefinitely - see Testing for details) in order to avoid executing your code an extra time. Feel free to set something similar up when testing your own code.

Instruction Set Architecture (ISA)

You will be implementing a simple 32-bit two-cycle processor with 32 registers, but your regfile will only be responsible for four of them ($t0 - $t3). The numeric values for these registers are the same as the green sheet. It will have separate data and instruction memory. Just like MIPS, each of the four registers that you will be implementing is big enough to hold ONE word.

Your processor will be similiar to MIPS, except for memory addressing. Memory addresses will represent 32-bit words instead of 8-bit bytes. This means that the memory modules are word-addressed instead of byte-addressed. However, note that your instructions will be using byte-addressing, as it should be normal MIPS code. Make sure you keep track of which addresses are byte-addressed and which are word-addressed when thinking about MIPS instruction addressing, the instruction memory module, MIPS data addressing, and the data memory module!

IMPORTANT: Because of the limitations of Logisim, our memory addess will be 24 bits, unlike the normal 32 bit memory address in MIPS. Which bits would we need to truncate so that as many translations of MIPS code is supported as possible?

The instructions we will be looking at is below. Your processor will pull out a 32-bit value from instruction memory and determine the meaning of that instruction by looking at the opcode (the top 6 bits, which are bits 31-26). If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.

Notice how we do not use all the instructions in MIPS. Your project only has to work on these specified instructions, most of which you should have seen in project 1 along with a few extra ones, although we have taken out sb to simplify your memory file. This way the project is shorter and easier.

Instruction	Format
Add	add $rd, $rs, $rt
Add Unsigned	addu $rd, $rs, $rt
Sub	sub $rd, $rs, $rt
Sub Unsigned	subu $rd, $rs, $rt
And	and $rd, $rs, $rt
Or	or $rd, $rs, $rt
Set Less Than	slt $rd, $rs, $rt
Set Less Than Unsigned	sltu $rd, $rs, $rt
Jump Register	jr $rs
Shift Left Logical	sll $rd, $rt, shamt
Shift Right Logical	srl $rd, $rt, shamt
Shift Right Arithmetic	sra $rd, $rt, shamt
Add Immediate Unsigned	addiu $rt, $rs, immediate
And Immediate	andi $rt, $rs, immediate
Or Immediate	ori $rt, $rs, immediate
Load Upper Immediate	lui $rt, immediate
Load Byte	lb $rt, offset($rs)
Load Byte Unsigned	lbu $rt, offset($rs)
Load Word	lw $rt, offset($rs)
Store Word	sw $rt, offset($rs)
Branch on Equal	beq $rs, $rt, label
Branch on Not Equal	bne $rs, $rt, label
Jump	j label
Jump and Link	jal label
Bit Palindrome	bitpal $rd, $rs
LFSR	lfsr $rd, $rs

Some specifics on selected instructions:

Jumping

The argument to the jump and jal instructions is a pseudoabsolute address, similar to MIPS. The target address is an unsigned number representing the lower 26 bits of the next instruction to be executed. Because of the limitations of Logisim, we will have to shorten our address to 24 bits, so we will cut off the upper 2 bits of our address. We do NOT concatenate any zeroes to the bottom of our address like we would in MIPS. This is because our processor is word-addressed, so every possible address holds a valid 32-bit instruction
Remember that the assembler/linker we're using, MARS(and your project 1), will represent absolute addresses of the .text section starting from a base address of 0x00400000, byte-addressed. However, your instruction memory starts this section at 0x000000, word-addressed, so make sure you account for this offset while calculating your address for jumps (j, jr, jal).
Note that you should kill the next instruction after a jump, jr, or jal even if that is the instruction you are going to be jumping to.
On a jal the address of the next instruction should be written into $ra. Don't forget to add the offset to the address of the next instruction!
While we want to follow MIPS as much as possible, we have chosen to deviate from the default execution concerning instructions addresses within the CPU. Rather than converting everything back and forth between byte and word addressing, we will assume word-addressed addresses when working with jal and jr. The addresses jal stores should be word-addressed and jr assumes that the registers have word-addressed addresses.
Given the previous assumption, la would not work properly if you compiled it in MIPS. Instead, if you were to use lui and ori to load an address into a register, you should load in the word-addressed address.

Branching

The argument to the beq and bne instructions is a signed offset relative to the next instruction to be executed if we don't take the branch, which is similar to MIPS. Note that the address of this next instruction is PC+1 rather than PC+4 because our processor is word-addressed. Here, currPC means the address of the branch instruction. We can write beq as the following:

	if $rs == $rt
		nextPC = currPC+1 + offset
	else
		increment PC like normal

Think! There's a reason we write "increment PC like normal" here instead of just "currPC+1".
The bne instruction differs only by the conditional in the if statement: replace the == with !=.
Note that you should not kill the next instruction if the branch is not taken. If the branch is taken you should always kill the instruction.

Immediates

Note that the immediate field is only 16 bits wide, so we must perform some kind of extension on it before passing it to the ALU.If an immediate is supposeed to be unsigned, be sure to zero-extend it. If an immediate is signed, be sure to sign-extend it. This should be the same specifications as on the MIPS green sheet.

Logisim Notes

While you may use Logisim 2.7.1 for developing your alu.circ, regfile.circ, mem.circ, and cpu.circ, do note that you have to open run.circ with the MIPS-logisim file.

If you are having trouble with Logisim, RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to Piazza about any crazy bugs that you find and we will investigate.

Things to Look Out For

Do NOT gate the clock! This is very bad design practice when making real circuits, so we will discourage you from doing this by heavily penalizing your project if you gate your clock.
BE CAREFUL with copying and pasting from different Logisim windows. Logisim has been known to have trouble with this in the past.
When you import another file (Project --> Load Library --> Logisim Library...), it will appear as a folder in the left-hand viewing pane. The skeleton files should have already imported necessary files.
Changing attributes before placing a component changes the default settings for that component. So if you are about to place many 32-bit pins, this might be desireable. If you only want to change that particular component, place it first before changing the attributes.
When you change the inputs & outputs of a sub-circuit that you have already placed in main, Logisim will automatically add/remove the ports when you return to main and this sometimes shifts the block itself. If there were wires attached, Logisim will do its automatic moving of these as well, which can be extremely dumb in some cases. Before you change the inputs and outputs of a block, it can sometimes be easier to first disconnect all wires from it.
Error signals (red wires) are obviously bad, but they tend to appear in complicated wiring jobs such as the one you will be implementing here. It's good to be aware of the common causes while debugging:

Logisim's Combinational Analysis Feature

Logisim offers some functionality for automating circuit implementation given a truth table, or vice versa. Though not disallowed (enforcing such a requirement is impractical), use of this feature is discouraged. Remember that you will not be allowed to have a laptop running Logisim on the final.

Testing

Once you've implemented your processor, you can test its correctness by writing programs to run on it! First, try this simple program as a sanity check: halt.s. This program loads the same immediate into two different registers using lui/ori and then branches back one instruction (offset = -1) if these registers are equal.

             Assembly:               Binary:
             ========                ======
             lui $t0, 0x3333         3c083333
             ori $t0, $t0, 0x4444    35084444
             lui $t1, 0x3333         3c093333
             ori $t1, $t1, 0x4444    35294444
       self: beq $t0, $t1, self      1109ffff

For practice, verify that the assembly on the left matches the translated binary on the right. This program effectively "halts" the processor by putting it into an infinite loop, so you can observe the outputs as well as memory and register state. Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.

To test your processor, open run.circ. Find the Instruction Memory RAM and right click --> Load Image... Select the assembled program (.hex file - see details on the Assembler below) to load it and then start clock ticks.

As described in the Deliverables, you are REQUIRED to write and submit the sample program to test your processor (lfsrpalindrome.s), but you should also write others to test all your instructions.

Remember: Debugging Sucks. Testing Rocks.

Assembler

We've provided a basic assembler to make writing your programs easier so you can use assembly instead of machine code. You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler. This assembler.py supports all of the instructions for your processor.

The assembler is included in the start kit (one you pull from the repo with earlier instruction) or can be downloaded from the link above. The standard assembler is a work in progress, so please report bugs to Piazza!

The assembler takes files of the following form (this is halt.s, which is included in the start kit):

         #Comments are great!
             lui $t0, 0x3333          #3c083333
             ori $t0, $t0, 0x4444     #35084444
             lui $t1, 0x3333          #3c093333
             ori $t1, $t1, 0x4444     #35294444
       self: beq $t0, $t1, self       #1109ffff

Commas are optional but the '$' is not. '#' starts a comment. The assembler can be invoked with the following command:

   $ python assembler.py input.s [-o output.hex]

The output file is input.hex if not explicitly set - that is, the same name as the input file but with a .hex extension. Use the -o option to change the output file name arbitrarily.

As an alternative to the assembler.py, you can also use MARS command line utilities to assemble your file. This will also allow you to create .hex files for your memory, although it won't assemble the new instructions we added to your processor. You can look at this link for specifics, but a sample script has been written in mars-assem.sh.

In addition, you are welcome to use your project 1 assembler and linker to create these .hex file! Try it out and marvel at having created 3/4th of the CALL process. Although, be wary of bugs in your project 1.

Submission

Use gradebot to commit your code. We will actually test and grade your CPU by hand, so be sure to create your own extensive test cases!