Project 2: ULNAv2-A

This project is due on Sunday, May 17, at 11:59:59 PM (Eastern daylight time). You must use submit to turn in your homework like so:
submit cs411_jtang proj2 proj2.circ proj2.S Each submitted file must contain your name and assignment number. For the Logisim file, place that information in a text label on the main circuit. For the assembly file, place that information in a comment at the top of the file.

This assignment builds upon the logic components and assembly code from Homework 4, and your C code from Homework 3. Thus, you must have completed those homework assignments before attempting this assignment.

In this assignment, you will finally build the ULNAv2-A processor in Logisim. To test your processor, you will translate a portion of your floating-point multiplier code from Homework 3 into ULNAv2-A.

Part 1: Data Memory

To begin, create a directory for your assignment and download the following files into that directory via wget:

http://www.csee.umbc.edu/~jtang/cs411.s20/homework/proj2/proj2.S: Partial translation of HW3's half_float_mult() into ULNAv2-A. You will need to complete the assembly code.
http://www.csee.umbc.edu/~jtang/cs411.s20/homework/proj2/uint16_mult.circ: Logisim circuit that implements the shift-add multiplier (version 3). You will need this circuit if you are attempting the extra credit.

If you have not already, download a copy of the ULNAv2-A specification from Homework 4. Also download and build the ULNAv2-A assembler and emulator; you will need them for this project. Furthermore, duplicate your hw4.circ, renaming that copy as proj2.circ.

To your Main circuit, add a Data Memory (a RAM device) component, configured with separate load and store ports. Set the Data Memory's address width to 16 bits, and data width also to 16 bits. Join its clock line, so that changes to data memory occur on falling edges. For now, connect the memory's address line to the ALU output, and input data line to WOut. Add tunnels from your Decoder's MemRead and MemWrite control signals to this memory. Add another tunnel DataOut that will eventually feed back into the register file.

Part 2: Implement ALU A, B, and W Buses

See the extra credit below before starting this Part. Your work implementations for this and Part 4 will differ if you attempt the extra credit.

Next, break the connections between the register file and ALU. Set the ALU's A input to the tunnel ABus, and likewise set the B input to BBus.

You then need to add the logic to set the values going into ABus, BBus, and WBus. Create three more subcircuits, ABus Selector, BBus Selector, and WIn Selector. These subcircuits take control signals from the Decoder to determine which values to propagate.

For example, the BBus Selector has within it a mux to select one of (at least) four values: BOut, Imm5 (sign extended), Imm8 (sign extended), or Imm8 (zero extended). Depending upon your implementation, you may need additional inputs into the mux.

Likewise, the subcircuit ABus Selector sets ABus using a mux. You will find that AOut is not the only possible value for the ALU's A input.

WIn Selector sets WIn using a mux. The inputs to this mux are at least ALUOut and DataOut. This mux is selected by MemToReg, and its output feeds into WIn.

Hint 1: Note that the ALU does not perform bit shifting (ash, lsh, or rot), but a shifted output is one of the possible values to WIn. The easiest way is to construct a dedicated subcircuit Logic Operations that takes AOut and Imm5 to perform the different shift/rotate operations. You will then need to route this subcircuit's output back to the main circuit.

Hint 2: Use your spreadsheet from Homework 4.

Part 3: Add Branch Control Logic

In the third homework, you created the subcircuit PC Control Unit and added it to Main. Recall that PCSel was used to select between PC+1 or the constant 411. In this section, you will build the full PC Control Unit.

Study the ULNAv2-A RTN. Create a new signal BranchType, a multi-bit value that encodes the various branch types (unconditional, conditional branch if less than, conditional branch if not equal, etc.)
Create a new subcircuit, Branch Control Unit. This circuit determines if a branch should be taken or not. It takes 3 inputs:

Branch Requested (1 bit)

1 if a branch is requested, 0 for non-branching instructions.

Condition Codes (4 bits)

Current set of condition codes, from the Condition Codes register.

Branch Type (multiple bits)

Encoding of branch type, from previous step.

If the branch type is unconditional and if a branch is requested, Branch Control Unit outputs 1 regardless of the Condition Codes. If the branch type is conditional and if the Condition Codes match, Branch Control Unit also outputs 1. For all other cases, including if a branch is not requested, the subcircuit outputs 0.
Integrate your Branch Control Unit into the PC Control Unit, replacing the old PCSel control signal with the output from Branch Control Unit. Re-test your main circuit by manually poking the clock line. Each clock cycle should still increment the program counter by 1.
Remove the constant 411, replacing it with the various possible destinations for a branch instructions. For example, the unconditional branch instruction b jumps to CurPC + SignExtend(Imm11).

Part 4: Finish ULNAv2-A Processor

This part is by far the most difficult part of the entire semester. You are to complete building the Instruction Decoder from the fourth homework and tie everything together. You will need to do the following:

Add control signals that set ALUASrc, ALUBsrc, and MemToReg.
Decode Imm5, Imm8, and Imm11. Route them to the B Bus and other places throughout.
Generate control signals for your Branch Control Unit.

You may need to do more, depending upon how you design the rest of your datapath.

Hint 3: Some control signals will be obvious. For example, MemRead is true when executing ldw or ldwi instructions. Those two instruction's numbers intentionally have a very similar bit pattern.

Hint 4: Build yourself Karnaugh maps for the more complex control signals. Use your spreadsheet to analyze all of the instructions.

Test your decoder thus far with adder.S and bosrt.S from Homework 4. Load the assembled image into Instruction Memory, then poke the clock several times. Check that all signals are generated correctly, and that the program goes into an infinite loop when it executes the halt instruction. Compare your circuit's state to what the emulator reports.

Part 5: ULNAv2-A Floating Point Multiplication

Now that hopefully you have a functional ULNAv2-A processor, it is time to write the final assembly code. Take your unsigned multiplication code from HW4's multu16.S and copy it into indicated portion of proj2.S. Then write ULNAv2-A assembly that matches this C function:

    /**
     * Determine the sign when f1 is multiplied by f2. Both f1 and f2
     * are half-precision floating point values.
     *
     * @param[in] f1 First half-precision floating point operand
     * @param[in] f2 Second half-precision floating point operand
     * @return 0 if the product would be positive, 0x8000 if negative
     */
    extern uint16_t calc_sign(uint16_t f1, uint16_t f2);

Assemble the binary. Load your assembled image, then enable the clock. After the system halts, inspect your Data Memory. Compare the contents with the comments at the end of proj2.S with your Data Memory's contents. If you get the same results, you most likely will score very well for this project!

Finally, use the ULNAv2-A emulator to calculate how many cycles are needed to execute the program. Add a comment near the top of proj2.S with your observation.

Other Hints and Notes

Ask plenty of questions on the Blackboard discussion board.
At the top of your proj2.circ circuit, list any help you received as well as web pages you consulted. Please do not use any URL shorteners, such as goo.gl or TinyURL. Also, do not cite shared data services, such as Pastebin, Dropbox, or Google Drive.
You are free to construct your circuit differently than the images above. Specifically, the diagrams above intentionally omitted some control signals.

Extra Credit

You may earn an additional 10% credit for this assignment by integrating a hardware multiplication unit. Examine the Logisim file uint16.circ. This circuit implements a shift-add multiplier (Version 3). Using this circuit as a basis (or build your own, if you feel so inclined), add these two new operations to your ULNAv2-A processor:

Instruction	Instruction Class	Instruction Number	RTN (both of these also do PC ← PC + 1)
mul	A	6	Product ← R[A] UMUL R[B]; R[W] ← Product[15:0]
movup	C	17	R[W] ← LogicalShiftRight(Product[31:16], 16)

The first instruction, mul, takes two registers and performs 16-bit unsigned multiplication. The lower 16 bits of the resulting product are written to a third register. The second instruction, movup, writes the upper 16 bits of the most recent mul instruction to a register. (It moves the upper half of the product to a register.) If no mul has yet executed, movup writes 0 to the target register.

You are to add a multiplication unit to your circuit, then update your datapath to handle these two new instructions. Note that the given uint16_mult.circ implements multiplication as a multi-cycle instruction. That means it takes several cycles to perform the multiplication; while it is busy the CPU should not attempt to execute any other instruction.

Finally, update your proj2.S. Comment out your original software-based multiplication algorithm uint16_mult, replacing it with mul and movup. Test the updated assembly code in both the ULNAv2-A emulator and with your Logisim circuit. Note the significant decrease in execution time. Near the top of the file in a comment, list both CPU cycle counts for both the original software-only uint16_mult and the hardware-accelerated version.

If you choose to perform this extra credit, put a comment near the top of your proj2.S, alerting the grader.

CMSC 411:

Computer Architecture