Project 2.2: CPU

Computer Architecture I ShanghaiTech University
Project 2.1 Project 2.2
Overview | Deliverables | ISA | Logisim | Testing | Submission

Overview

In this project you will be using MIPS-Logisim to create a 32-bit two-cycle processor. It is similar to MIPS, except that memory addresses represent 32-bit words instead of 8-bit bytes (word-addressed instead of byte-addressed). Also, all addresses are 24-bits wide instead of 32-bits, due to limitations in Logisim. Throughout the implementation of this project, we'll be making design choices that make it compatible with machine code outputs from MARS and your Project 1!

Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

Pipelining

Your processor will have a 2-stage pipeline:

  1. Instruction Fetch: An instruction is fetched from the instruction memory.
  2. Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal MIPS pipeline.

You should note that data hazards do NOT pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not necessarily executed if the branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or a taken branch. Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction. Notice that 0x0000 is a nop instruction; please use this, as it will simplify grading and testing. You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-pipeline design. If you are unsure about pipelining, it is perfectly fine (maybe even recommended) to first implement a single-cycle processor. This will allow you to first verify that your instruction decoding, control signals, arithmetic operations, and memory accesses are all working properly. From a single-cycle processor you can then split off the Instruction Fetch stage with a few additions and a few logical tweaks. Some things to consider:

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulate --> Reset Simulation (Ctrl+R) to reset your processor.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Deliverables

Approach this project like you would any coding assignment: construct it piece by piece and test each component early and often!

Tidyness and readability will be a large factor in grading your circuit if there are any issues, so please make it as neat as possible! If we can't comprehend your circuit, you will probably receive no partial credit.

Similarly to the first part, we will be distributing the project files through Github. You can look back on the step 0 for project 1 for more specific steps.

  1. In the repository add a remote repo that contains the framework files:
    git remote add framework http://shtech.org/course/ca/17s/projects/2.2/proj2.2.git
  2. Go and fetch the files:
    git fetch framework
  3. Now merge those files with your master branch:
    git merge framework/master
  4. The rest of the git commands work as usual.
  5. Copy regfile.circ and alu.circ from proj2.1 to your new gradebot repo!
  6. Also copy the MIPS-logisim.jar into the repo and into the "test-files" folder - but don't commit them to the repo!

2) Processor

We have provided a skeleton for your processor in cpu.circ along with a testing harness in run.circ. Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA) using a two-cycle pipeline. Your processor will contain an instance of your ALU, Data Memory, and Register File. You are also responsible for constructing the entire datapath and control from scratch. It will interact with our harness through 5 inputs and 10 outputs.

One important thing to notice is that we have two different locations with registers now, both in the regfile you created and in run.circ. If an instruction uses $t0-$t3, then pipe these requests to your own regfile. However, if an instruction requires any of the other 28 registers, output those requests to run.circ. Do note that the numbering system for all 32 registers is the same as in MIPS. Also, the registers in run.circ have been initialized to random numbers. In addition, $0, $t0-$t3 registers in run.circ are not changeable - nothing happens if you write to those registers.

NOTE: You also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. This should be viewable in your main circuit.

Your processor will get its program from the processor harness we have provided in run.circ. It will send the address of instruction memory it wants to access to the harness through an output, and accept the instruction at that address as an input. Inspect run.circ to see exactly what's going on. Your processor has 4 inputs that come from the harness:

Input Name Bit Width Description
RS_READ_VALUE 32 Driven with the value of the register specified in RS
RT_READ_VALUE 32 Driven with the value of the register specified in RT
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

Your processor must provide 10 outputs to the harness:

Output Name Bit Width Description
$t0 32 Driven with the contents of $t0.
$t1 32 Driven with the contents of $t1.
$t2 32 Driven with the contents of $t2.
$t3 32 Driven with the contents of $t3.
RS 5 Determines which register's value is sent to RS_READ_VALUE (see above).
RT 5 Determines which register's value is sent to RT_READ_VALUE (see above).
Write Register 5 Determines which register to set to Write Data on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_VALUE 32 Determines what data to write to the register identified by the Write Register input on the next rising edge of the clock, assuming that RegWrite is asserted.
RD_WRITE_ENABLE 1 Determines whether data is written on the next rising edge of the clock.
FETCH_ADDRESS 24 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Follow the same instructions as the register file and ALU regarding rearranging inputs and outputs of the processor. In particular, you should ensure that your processor is correctly loaded by a fresh copy of run.circ before you submit.

3) Data Memory

You will build your Data Memory on your own using a RAM module. Note that this is different than a ROM module. Logisim RAM modules can be found in the built-in Memory library/folder.

Although the input address is 32 bits, due to limitations in Logisim, you should only be using 24 bits to address your RAM module. Consider carefully which 24 bits you want to use, given that the input is byte addressed. We will be losing a few bits of information by doing this, but that's okay for the purposes of this assignment.

In addition, you can assume that the .data base address is 0x10010000, just like in MARS. Notice that this means the instructions you have will refer to memory starting at 0x10010000, which should translate to 0x000000 for your RAM modules.

For lw/sw, you can assume that the addresses are properly aligned, as per MIPS instructions. For lb/lbu, make sure you are only returning the specified byte, loaded into the least significant byte of Data Mem Out. You will need to deal with sign/zero-extending the output in your CPU.

NOTE: Your data memory should use little endian.

EXTRA FOR EXPERTS: Implement logic for sb in your data memory. Make sure you don't overwrite any other byte that's part of the same word.

We have provided a skeleton for your data memory in mem.circ along with a testing harness in mem-harness.circ. We will be testing your memory module separately for correctness. It has 5 inputs:

Input Name Bit Width Description
Data Mem Addr 32 Byte-addressed address to read from or write to.
Data Mem In 32 Determines what data to write into the address indentified by "Data Mem Addr" on the next rising edge of the clock, assuming that MEM_WRITE is asserted.
ACCESS_BYTE 1 1 iff we are loading or storing a byte.
MEM_WRITE 1 Determines whether data is written on the next rising edge of the clock.
CLOCK 1 Input for the clock. This can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

The Data Memory also has 1 output:

Output Name Bit Width Description
Data Mem Out 32 Driven with the value at the address identified by "Data Mem Addr".

For those unfamiliar with the RAM module, the pictures above show a good way to wire up a circuit to use RAM. You are not required to implement Data Memory as shown above and you can use a memory with separate read and write ports if you should so desire.

Here are a few things to know about the RAM module before you get started:

The best way to learn how these work is simply to play with them. You can also refer to Logisim documentation on RAM modules here.

4) Test Code

Since you are building a processor, you can run actual programs on it!

There are more details about testing your processor and the provided assembler in the Testing section, but in particular you will be REQUIRED to write and submit the following program:

  1. Write a program that calls lfsr, beginning with a given input, until it reaches a value that is a palindrome. The seed for lfsr will be stored at register $a0. You must save the final palindrome value in register $v0. If you reach your seed value without finding a palindrome, save the seed value in register $v0. Your function must be labeled: "LfsrPalindrome:". Save the assembler source in a file called lfsrpalindrome.s. Remember, at the end of the function, you must jump back to the caller by calling jr $ra.

Write this function as you would a normal MIPS function, remembering to stay within your processor's ISA. You cannot assume anything about the values in the registers when your function is called. It is recommended that you write your own main functions to set up the registers, though your submitted code should contain ONLY the functions themselves.

We will test your code by appending it to our own main functions for different test cases, assembling using assembler.py, and then running it on both your submitted processor and a known working processor. This is why it is important that you include only your function and not other testing statements. Our main functions will all end in a halt (an instruction that jumps or branches to itself indefinitely - see Testing for details) in order to avoid executing your code an extra time. Feel free to set something similar up when testing your own code.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Instruction Set Architecture (ISA)

You will be implementing a simple 32-bit two-cycle processor with 32 registers, but your regfile will only be responsible for four of them ($t0 - $t3). The numeric values for these registers are the same as the green sheet. It will have separate data and instruction memory. Just like MIPS, each of the four registers that you will be implementing is big enough to hold ONE word.

Your processor will be similiar to MIPS, except for memory addressing. Memory addresses will represent 32-bit words instead of 8-bit bytes. This means that the memory modules are word-addressed instead of byte-addressed. However, note that your instructions will be using byte-addressing, as it should be normal MIPS code. Make sure you keep track of which addresses are byte-addressed and which are word-addressed when thinking about MIPS instruction addressing, the instruction memory module, MIPS data addressing, and the data memory module!

IMPORTANT: Because of the limitations of Logisim, our memory addess will be 24 bits, unlike the normal 32 bit memory address in MIPS. Which bits would we need to truncate so that as many translations of MIPS code is supported as possible?

The instructions we will be looking at is below. Your processor will pull out a 32-bit value from instruction memory and determine the meaning of that instruction by looking at the opcode (the top 6 bits, which are bits 31-26). If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.

Notice how we do not use all the instructions in MIPS. Your project only has to work on these specified instructions, most of which you should have seen in project 1 along with a few extra ones, although we have taken out sb to simplify your memory file. This way the project is shorter and easier.

Instruction Format
Add add $rd, $rs, $rt
Add Unsigned addu $rd, $rs, $rt
Sub sub $rd, $rs, $rt
Sub Unsigned subu $rd, $rs, $rt
And and $rd, $rs, $rt
Or or $rd, $rs, $rt
Set Less Than slt $rd, $rs, $rt
Set Less Than Unsigned sltu $rd, $rs, $rt
Jump Register jr $rs
Shift Left Logical sll $rd, $rt, shamt
Shift Right Logical srl $rd, $rt, shamt
Shift Right Arithmetic sra $rd, $rt, shamt
Add Immediate Unsigned addiu $rt, $rs, immediate
And Immediate andi $rt, $rs, immediate
Or Immediate ori $rt, $rs, immediate
Load Upper Immediate lui $rt, immediate
Load Byte lb $rt, offset($rs)
Load Byte Unsigned lbu $rt, offset($rs)
Load Word lw $rt, offset($rs)
Store Word sw $rt, offset($rs)
Branch on Equal beq $rs, $rt, label
Branch on Not Equal bne $rs, $rt, label
Jump j label
Jump and Link jal label
Count 1s cnto $rd, $rs, $rt
Bit Palindrome bitpal $rd, $rs
LFSR lfsr $rd, $rs

Some specifics on selected instructions:

Jumping

Branching

Immediates


Overview | Deliverables | ISA | Logisim | Testing | Submission

Logisim Notes

While you may use Logisim 2.7.1 for developing your alu.circ, regfile.circ, mem.circ, and cpu.circ, do note that you have to open run.circ with the MIPS-logisim file.

If you are having trouble with Logisim, RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to Piazza about any crazy bugs that you find and we will investigate.

Things to Look Out For

Logisim's Combinational Analysis Feature

Logisim offers some functionality for automating circuit implementation given a truth table, or vice versa. Though not disallowed (enforcing such a requirement is impractical), use of this feature is discouraged. Remember that you will not be allowed to have a laptop running Logisim on the final.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Testing

Once you've implemented your processor, you can test its correctness by writing programs to run on it! First, try this simple program as a sanity check: halt.s. This program loads the same immediate into two different registers using lui/ori and then branches back one instruction (offset = -1) if these registers are equal.

             Assembly:               Binary:
             ========                ======
             lui $t0, 0x3333         3c083333
             ori $t0, $t0, 0x4444    35084444
             lui $t1, 0x3333         3c093333
             ori $t1, $t1, 0x4444    35294444
       self: beq $t0, $t1, self      1109ffff

For practice, verify that the assembly on the left matches the translated binary on the right. This program effectively "halts" the processor by putting it into an infinite loop, so you can observe the outputs as well as memory and register state. Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.

To test your processor, open run.circ. Find the Instruction Memory RAM and right click --> Load Image... Select the assembled program (.hex file - see details on the Assembler below) to load it and then start clock ticks.

As described in the Deliverables, you are REQUIRED to write and submit the sample program to test your processor (lfsrpalindrome.s), but you should also write others to test all your instructions.

Remember: Debugging Sucks. Testing Rocks.

Assembler

We've provided a basic assembler to make writing your programs easier so you can use assembly instead of machine code. You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler. This assembler.py supports all of the instructions for your processor.

The assembler is included in the start kit (one you pull from the repo with earlier instruction) or can be downloaded from the link above. The standard assembler is a work in progress, so please report bugs to Piazza!

The assembler takes files of the following form (this is halt.s, which is included in the start kit):

         #Comments are great!
             lui $t0, 0x3333          #3c083333
             ori $t0, $t0, 0x4444     #35084444
             lui $t1, 0x3333          #3c093333
             ori $t1, $t1, 0x4444     #35294444
       self: beq $t0, $t1, self       #1109ffff

Commas are optional but the '$' is not. '#' starts a comment. The assembler can be invoked with the following command:

   $ python assembler.py input.s [-o output.hex]

The output file is input.hex if not explicitly set - that is, the same name as the input file but with a .hex extension. Use the -o option to change the output file name arbitrarily.

As an alternative to the assembler.py, you can also use MARS command line utilities to assemble your file. This will also allow you to create .hex files for your memory, although it won't assemble the new instructions we added to your processor. You can look at this link for specifics, but a sample script has been written in mars-assem.sh.

In addition, you are welcome to use your project 1 assembler and linker to create these .hex file! Try it out and marvel at having created 3/4th of the CALL process. Although, be wary of bugs in your project 1.


Overview | Deliverables | ISA | Logisim | Testing | Submission

Submission

Use gradebot to commit your code. Gradebot will also be available for autograding soon!