Project 2.2: CPU

Computer Architecture I ShanghaiTech University
Project 2.1 Project 2.2 Project 3
Overview | Deliverables | ISA | Logisim | Testing | Submission

Overview

In this project you will be using MIPS-Logisim to create a 32-bit two-cycle processor. It is similar to MIPS, except that memory addresses represent 32-bit words instead of 8-bit bytes (word-addressed instead of byte-addressed). Also, all addresses are 24-bits wide instead of 32-bits, due to limitations in Logisim. Throughout the implementation of this project, we'll be making design choices that make it compatible with machine code outputs from MARS and your Project 1!

Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

Pipelining

Your processor will have a 2-stage pipeline:

  1. Instruction Fetch: An instruction is fetched from the instruction memory.
  2. Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal MIPS pipeline.

You should note that data hazards do NOT pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not necessarily executed if the branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or a taken branch. Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction. Notice that 0x0000 is a nop instruction; please use this, as it will simplify grading and testing. You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-pipeline design. If you are unsure about pipelining, it is perfectly fine (maybe even recommended) to first implement a single-cycle processor. This will allow you to first verify that your instruction decoding, control signals, arithmetic operations, and memory accesses are all working properly. From a single-cycle processor you can then split off the Instruction Fetch stage with a few additions and a few logical tweaks. Some things to consider:

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulate --> Reset Simulation (Ctrl+R or Command+R) to reset your processor.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Deliverables

Approach this project like you would any coding assignment: construct it piece by piece and test each component early and often!

Tidyness and readability will be a large factor in grading your circuit if there are any issues, so please make it as neat as possible!

1) Obtaing the Files

We have provided a framework: proj2.2_framework. Please copy your regfile.circ and alu.circ from project 2.1 to this framework. Also notice that you should use the MIPS-logisim.jar in this framework rather than the one in proj2.1_framework , they are different!

2) Processor

We have provided a skeleton for your processor in cpu.circ along with a testing harness in run.circ. Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA) using a two-cycle pipeline. Your processor will contain an instance of your ALU, Data Memory, and Register File. You are also responsible for constructing the entire datapath and control from scratch. It will interact with our harness through 5 inputs and 10 outputs.

One important thing to notice is that we have two different locations with registers now, both in the regfile you created and in run.circ. If an instruction uses $t0-$t3, then pipe these requests to your own regfile. However, if an instruction requires any of the other 28 registers, output those requests to run.circ. Do note that the numbering system for all 32 registers is the same as in MIPS. Also, the registers in run.circ have been initialized to random numbers. In addition, $0, $t0-$t3 registers in run.circ are not changeable - nothing happens if you write to those registers.

NOTE: You also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. This should be viewable in your main circuit.

Your processor will get its program from the processor harness we have provided in run.circ. It will send the address of instruction memory it wants to access to the harness through an output, and accept the instruction at that address as an input. Inspect run.circ to see exactly what's going on. Your processor has 4 inputs that come from the harness:

Input Name Bit Width Description
RS_READ_VALUE 32 Driven with the value of the register specified in RS
RT_READ_VALUE 32 Driven with the value of the register specified in RT
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

Your processor must provide 10 outputs to the harness:

Output Name Bit Width Description
$t0 32 Driven with the contents of $t0.
$t1 32 Driven with the contents of $t1.
$t2 32 Driven with the contents of $t2.
$t3 32 Driven with the contents of $t3.
RS 5 Determines which register's value is sent to RS_READ_VALUE (see above).
RT 5 Determines which register's value is sent to RT_READ_VALUE (see above).
Write Register 5 Determines which register to set to Write Data on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_VALUE 32 Determines what data to write to the register identified by the Write Register input on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_ENABLE 1 Determines whether data is written on the next rising edge of the clock.
FETCH_ADDRESS 24 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Follow the same instructions as the register file and ALU regarding rearranging inputs and outputs of the processor. In particular, you should ensure that your processor is correctly loaded by a fresh copy of run.circ before you submit.

3) Data Memory

You will build your Data Memory on your own using a RAM module. Note that this is different than a ROM module. Logisim RAM modules can be found in the built-in Memory library/folder.

Although the input address is 32 bits, due to limitations in Logisim, you should only be using 24 bits to address your RAM module. Consider carefully which 24 bits you want to use, given that the input is byte addressed. We will be losing a few bits of information by doing this, but that's okay for the purposes of this assignment.

In addition, you can assume that the .data base address is 0x10010000, just like in MARS. Notice that this means the instructions you have will refer to memory starting at 0x10010000, which should translate to 0x000000 for your RAM modules.

For lw/sw, you can assume that the addresses are properly aligned, as per MIPS instructions. For lb/lbu, make sure you are only returning the specified byte, loaded into the least significant byte of Data Mem Out. You will need to deal with sign/zero-extending the output in your CPU.

NOTE: Your data memory should use little endian.

We have provided a skeleton for your data memory in mem.circ along with a testing harness in mem-harness.circ. We will be testing your memory module separately for correctness. It has 5 inputs:

Input Name Bit Width Description
Data Mem Addr 32 Byte-addressed address to read from or write to.
Data Mem In 32 Determines what data to write into the address indentified by "Data Mem Addr" on the next rising edge of the clock, assuming that MEM_WRITE is asserted.
ACCESS_BYTE 1 1 iff we are loading or storing a byte.
MEM_WRITE 1 Determines whether data is written on the next rising edge of the clock.
CLOCK 1 Input for the clock. This can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

The Data Memory also has 1 output:

Output Name Bit Width Description
Data Mem Out 32 Driven with the value at the address identified by "Data Mem Addr".

For those unfamiliar with the RAM module, the pictures above show a good way to wire up a circuit to use RAM. You are not required to implement Data Memory as shown above and you can use a memory with separate read and write ports if you should so desire.

Here are a few things to know about the RAM module before you get started:

The best way to learn how these work is simply to play with them. You can also refer to Logisim documentation on RAM modules here.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Instruction Set Architecture (ISA)

You will be implementing a simple 32-bit two-cycle processor with 32 registers, but your regfile will only be responsible for four of them ($t0 - $t3). The numeric values for these registers are the same as the green sheet. It will have separate data and instruction memory. Just like MIPS, each of the four registers that you will be implementing is big enough to hold ONE word.

Your processor will be similiar to MIPS, except for memory addressing. Memory addresses will represent 32-bit words instead of 8-bit bytes. This means that the memory modules are word-addressed instead of byte-addressed. However, note that your instructions will be using byte-addressing, as it should be normal MIPS code. Make sure you keep track of which addresses are byte-addressed and which are word-addressed when thinking about MIPS instruction addressing, the instruction memory module, MIPS data addressing, and the data memory module!

IMPORTANT: Because of the limitations of Logisim, our memory addess will be 24 bits, unlike the normal 32 bit memory address in MIPS. Which bits would we need to truncate so that as many translations of MIPS code is supported as possible?

The instructions we will be looking at is below. Your processor will pull out a 32-bit value from instruction memory and determine the meaning of that instruction by looking at the opcode (the top 6 bits, which are bits 31-26). If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.

Notice how we do not use all the instructions in MIPS. Your project only has to work on these specified instructions, most of which you should have seen in project 1 along with a few extra ones, although we have taken out sb to simplify your memory file. This way the project is shorter and easier.

Instruction Format
Add add $rd, $rs, $rt
Add Unsigned addu $rd, $rs, $rt
Sub sub $rd, $rs, $rt
Sub Unsigned subu $rd, $rs, $rt
And and $rd, $rs, $rt
Or or $rd, $rs, $rt
Set Less Than slt $rd, $rs, $rt
Set Less Than Unsigned sltu $rd, $rs, $rt
Jump Register jr $rs
Shift Left Logical sll $rd, $rt, shamt
Shift Right Logical srl $rd, $rt, shamt
Shift Right Arithmetic sra $rd, $rt, shamt
Add Immediate Unsigned addiu $rt, $rs, immediate
And Immediate andi $rt, $rs, immediate
Or Immediate ori $rt, $rs, immediate
Load Upper Immediate lui $rt, immediate
Load Byte lb $rt, offset($rs)
Load Byte Unsigned lbu $rt, offset($rs)
Load Word lw $rt, offset($rs)
Store Word sw $rt, offset($rs)
Branch on Equal beq $rs, $rt, label
Branch on Not Equal bne $rs, $rt, label
Jump j label
Jump and Link jal label
Count 1s cnto $rd, $rs, $rt
Bit Palindrome bitpal $rd, $rs

Some specifics on selected instructions:

Jumping

Branching

Immediates


Overview | Deliverables | ISA | Logisim | Testing | Submission

Logisim Notes

If you are having trouble with Logisim, RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to Piazza about any crazy bugs that you find and we will investigate.

Things to Look Out For


Overview | Deliverables | ISA | Logisim | Testing | Submission

Testing

Once you've implemented your processor, you can test its correctness by writing programs to run on it! First, try this simple program as a sanity check: halt.s. This program loads the same immediate into two different registers using lui/ori and then branches back one instruction (offset = -1) if these registers are equal.

             Assembly:               Binary:
             ========                ======
             lui $t0, 0x3333         3c083333
             ori $t0, $t0, 0x4444    35084444
             lui $t1, 0x3333         3c093333
             ori $t1, $t1, 0x4444    35294444
       self: beq $t0, $t1, self      1109ffff

For practice, verify that the assembly on the left matches the translated binary on the right. This program effectively "halts" the processor by putting it into an infinite loop, so you can observe the outputs as well as memory and register state. Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.

To test your processor, open run.circ. Find the Instruction Memory RAM and right click --> Load Image... Select the assembled program (.hex file - see details on the Assembler below) to load it and then start clock ticks.

Assembler

We've provided a basic assembler (assembler.py) in the start kit to make writing your programs easier so you can use assembly instead of machine code. You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler.

The assembler takes files of the following form (this is halt.s, which is included in the start kit):

         
             lui $t0, 0x3333          #3c083333
             ori $t0, $t0, 0x4444     #35084444
             lui $t1, 0x3333          #3c093333
             ori $t1, $t1, 0x4444     #35294444
       self: beq $t0, $t1, self       #1109ffff

Commas are optional but the '$' is not. '#' starts a comment. The assembler can be invoked with the following command:

   $ python assembler.py input.s -o output.hex

The output file is input.hex if not explicitly set - that is, the same name as the input file but with a .hex extension. Use the -o option to change the output file name arbitrarily.

As an alternative to the assembler.py, you can also use MARS command line utilities to assemble your file. This will also allow you to create .hex files for your memory, although it won't assemble the new instructions we added to your processor. You can look at this link for specifics, but a sample script has been written in mars-assem.sh.

In addition, you are welcome to use your project 1 assembler and linker to create these .hex file! Try it out and marvel at having created 3/4th of the CALL process. Although, be wary of bugs in your project 1.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Submission

Put your alu.circ, regfile.circ, cpu.circ and mem.circ in an empty folder named proj2.2, then use the command below to compress it.

tar czvf proj2.2.tar proj2.2

Finally, submit the compressed file to Autolab.