Homework 3: Floating Point Arithmetic

This homework is due on Tuesday, April 7, at 11:59:59 PM (Eastern daylight time). You must use submit to turn in your homework like so:
submit cs411_jtang hw3 hw3.c hw3_asm.S hw3.circ The grader will use the supplied Makefile to compile your work. In addition, each submitted source code file must have a file header comment, as described on the coding conventions page. For the Logisim file, place your name and assignment number in a text label on the main circuit.

You can only complete this assignment within an ARMv8-A development environment, and thus you must have completed the first homework before attempting this assignment. In addition, you must have a working 16-bit ALU from the first project.

Back in the old days, computers were built to perform only integer arithmetic, as that the CPU lacked support for floating point calculations. For example, the original Intel systems deferred floating point to an optional coprocessor, the Intel 8087 chip. For users that did not have an Intel 8087, they relied upon specialized software libraries to implement floating point operations. These became known as soft floating point systems.

In this assignment, you will build a portion of a soft floating point library. You will write code, in C, to parse bitwise representation of floating point numbers. You will then implement, in assembly, unsigned multiplication. Using that multiplier, you will then manually calculate the product of two floating point values. Finally, you will add more logic components to your Logisim file to support the second project.

Part 1: Floating Point Parser

To begin, create a directory for your assignment and download the following files into that directory via wget:

http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/hw3.c: Skeleton code for this assignment.
http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/hw3_asm.S: Skeleton code for your assembly code.
http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/Makefile: Builds the code for this assignment, by simply running make. Also included is a clean target to remove all built objects. You do not need to modify this file, nor should you submit it with your work.

Furthermore, copy your proj1.circ file from the first project, and rename that copy as hw3.circ. You will modify hw3.circ in Part 4.

Now run make to compile everything. The program takes two integer parameters. The given skeleton code converts those parameters into two 16-bit values.

Your first job is to implement half_float_parse(). This function interprets its 16-bit incoming paramter val as an IEEE-753 half floating point value. That is, given val, display the sign bit, exponent, and significand bits. If val is a special value, then display that special value.

The special values you need to handle are:

Negative Zero
Denormalized
Infinity, both positive and negative
Not a Number, both positive and negative

In your output, show the sign bit. Then show the decimal form of the exponent and its actual magnitude (as a decimal). Then display the bits associated with the significand, as a hexadecimal value. Finally, if the bits within val represent a special value, then state it as such. The function returns true if the value is normal, false otherwise.

As an aid to your parsing, the given C code displays val as if it were a float. Be careful with your shifting and bit masks. Be aware of what C's ">>" really does, and use correct variable types to hold intermediary values

Part 2: Unsigned Integer Multiplication

The next task is to implement uint16_mult(). Read the function comments in hw3.c, and then modify hw3_asm.S. Implement a shift-add multiplication algorithm (or Booth's algorithm for extra credit), directly in ARMv8-A assembly. Store the upper 16 bits of the product at the memory address pointed to by register X2, lower 16 bits at the address pointed to by X3.

Note that you are performing 16-bit unsigned multiplication. Use bitfield move instructions (BFM, SBFM, and UBFM) to copy bits from one register to another. See sections 6.2.3 and 6.2.4 of the ARM Cortex-A Series Programmer's Guide for examples on using the bitwise operations.

Restriction 1: You may not use the built-in multiplication or division instructions for this assignment. That would be cheating, wouldn't it? You are limited to only adds, subtracts, shifts, rotates, bitmasks, bitwise logic, compares, and branches.

Restriction 2: You are limited to using only the first eight registers (X0 through X7). The rationale will be clearer when you see the next assignment.

Restriction 3: ARMv8-A registers are 64-bits, but for this assignment you are limited to only accessing the lower 16 bits of each register. Because the product will have 32 bits, you will have to split that value across two registers, and then shift bits between the two. More will become evident in the next assignment.

Part 3: Floating Point Multiplication

The final code to implement is half_float_mult(). This function takes two half floating point values, multiplies them, then returns the resulting product.

Part of the code is given to you. You are responsible for calculating the new sign bit and new exponent bits. You are also to calculate the product of the signficands; do not neglect to consider the leading implicit one bit. You also need to normalize the product, adjusting exponent bits as necessary. The skeleton code will then reassemble the parts into a single half floating point.

Restriction 4: As with uint16_mult(), you may not use the real multiplication operator anywhere within this function. Instead you must use your uint16_mult() implementation to calculate the significands' product.

Restriction 5: uint16_mult() returns the product as two values. You may only work with those halves as two separate uint16_t variables. You will need to shift bits between the two halves. Because the significands are each 11 bits (including leading one), the significands' product takes 22 bits of storage. The lower 20 bits are to the right of the binary point, while the upper 2 bits are left of the binary point. Thus the bit representations are:

variable:	`product_upper`																`product_lower`
bit within variable:	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
significands' product bit:	unused										21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

In the next assignment, you will rewrite uint16_mult() and half_float_mult() in assembly. It is in your best interest to implement them cleanly, with minimal branching and local variables.

Part 4: Logic Components

In the first project you created a 16-bit ALU. Prepare your hw3.circ for the final project by removing the subcircuits Main, 2-bit Counter, Reel, and any extra credit portions, keeping only those needed for your 16-bit ALU. Then create a new Main subcircuit as the first circuit, and move Main to be the top-most circuit.

Create a new subcircuit, Register File. This register file must hold eight 16-bit registers. It has these seven inputs:

ASel (3 bits): Selects a register to route out through the A Bus.
BSel (3 bits): Selects a register to route out through the B Bus.
WSel (3 bits): Selects a register to route out through the W Bus, and also which register to update.
WEn (1 bit): If true and Clk is true, then update the register specified by WSel.
WData (16 bits): Value to store into the register file, when WEn and Clk are true.
0 (1 bit): Reset line. If true, unconditionally reset all register values to zero.
Clk (1 bit): If true and WEn is true, then update the register specified by WSel using the value at WData.

The register has three outputs, A Bus, B Bus, and W Bus. These are all 16-bits, and their values are selected by ASel, BSel, and WSel, respectively.

Caution: The register file is only to be updated when both WEn and Clk are true. Many previous students are tempted to connect Clk to both the register's enable and clock ports. This is incorrect. Clk only connects to a register's clock port.

Your system will have a 16-bit Program Counter register that is incremented by 1 aftor most clock cycles. Create another subcircuit, PC Control Unit, based upon the Instruction Fetch Unit from Lecture 12. This subcircuit has these inputs:

CurPC (16 bits): Current Program Counter value.
Imm (16 bits): Alternative program address.
PCSel (1 bit): If false, select CurPC, incremented by 1. Otherwise, select Imm.

In your Main, add a Program Counter, Register File, 16-bit ALU, and PC Control Unit. Store the condition codes in a 4-bit register, with its own enable control. Add a splitter to that register's output, to easily monitor their values. Note the use of tunnels to make the subcircuit more understandable.

For now, add a dummy constant input to PC Control Unit. Test that your PC increments by repeatedly poking the clock line and changing PCSel. Ensure the PC is updated on a falling edge.

The connections between your register file, ALU, PC control unit, and clock line will be modified further in the next assignment.

Part 5: Required Documentation

Add a comment block to the top of hw3.c file that answers these questions:

Assume that every instruction in your hw3_asm.S takes exactly one cycle to execute. What is the fewest number of cycles it will take to perform your 16-bit multiplication, from the time that execution enters your function up to (and including) the final ret instruction?
What is the worst case (most number of cycles) possible for your uint16_mult() implementation?

Sample Output

Here is a sample output from running the program. The grader will use different values to test your submission.

$ ./hw3 0x4110 0x0421
For the bit pattern 0x4110 (half float value: 2.53125):
  Sign bit: 0
  Exponent bits: 16 (actual magnitude: 1)
  Significand: 0x510
For the bit pattern 0x0421 (half float value: 6.30021e-05):
  Sign bit: 0
  Exponent bits: 1 (actual magnitude: -14)
  Significand: 0x421
Part 2: multiplying 16656 and 1057:
  Correct product: 0x010ca310 (uint32_t value: 17605392)
   Part 2 product: 0x010ca310 (uint32_t value: 17605392)
Part 3: multiplying 2.53125 and 6.30021e-05:
  Correct product: 0x093a (half float value: 0.000159502)
   Part 3 product: 0x093a (half float value: 0.000159502)

$ ./hw3 0xabcd 0x6543
For the bit pattern 0xabcd (half float value: -0.0609436):
  Sign bit: 1
  Exponent bits: 10 (actual magnitude: -5)
  Significand: 0x7cd
For the bit pattern 0x6543 (half float value: 1347):
  Sign bit: 0
  Exponent bits: 25 (actual magnitude: 10)
  Significand: 0x543
Part 2: multiplying 43981 and 25923:
  Correct product: 0x43f4d7a7 (uint32_t value: 1140119463)
   Part 2 product: 0x43f4d7a7 (uint32_t value: 1140119463)
Part 3: multiplying -0.0609436 and 1347:
  Correct product: 0xd521 (half float value: -82.0625)
   Part 3 product: 0xd521 (half float value: -82.0625)

$ ./hw3 0xfc00 0x0300
For the bit pattern 0xfc00 (half float value: -inf):
  Sign bit: 1
  Exponent bits: 31 (actual magnitude: 16)
  Significand: 0x000
    * negative infinity
For the bit pattern 0x0300 (half float value: 4.57764e-05):
  Sign bit: 0
  Exponent bits: 0 (actual magnitude: -14)
  Significand: 0x300
    * denormalized
Part 2: multiplying 64512 and 768:
  Correct product: 0x02f40000 (uint32_t value: 49545216)
   Part 2 product: 0x02f40000 (uint32_t value: 49545216)

Other Hints and Notes

Ask plenty of questions on the Blackboard discussion board.
At the top of your submitted files, list any help you received as well as web pages you consulted. Please do not use any URL shorteners, such as goo.gl or TinyURL. Also, do not cite shared data services, such as Pastebin, Dropbox, or Google Drive.
C99 introduced fixed-width integer types. This assignment intentionally uses them, to force the compiler to use certain register assignments.
During lecture the Program Counter was always incremented by 4, but for this homework increment it instead by 1. This is due to a limitation of Logisim, and will be further explained in the next assignment.

Extra Credit

You may earn an additional 10% credit for this assignment by implementing a more difficult form for uint16_mult(). Instead of using a shift-add algorithm, implement Booth's algorithm in ARMv8-A assembly. As before, you may only use 16-bits from each of the registers, and you may only use registers X0 through X7. You only need to handle unsigned integer operands.

For Part 5, calculate the cycle count using your Booth's algorithm implementation.

If you choose to perform this extra credit, put a comment at the top of your hw3.c file, alerting the grader.

CMSC 411:

Computer Architecture