Homework 3: Floating Point Arithmetic
This homework is due on Tuesday, April 7, at 11:59:59 PM
(Eastern daylight time). You must use submit to
turn in your homework like so:
submit cs411_jtang hw3 hw3.c hw3_asm.S hw3.circ
The grader will use the supplied Makefile to compile your
work. In addition, each submitted source code file must have a file
header comment, as described on the coding
conventions page. For the Logisim file, place your name and
assignment number in a text label on the main circuit.
You can only complete this assignment within an ARMv8-A development environment, and thus you must have completed the first homework before attempting this assignment. In addition, you must have a working 16-bit ALU from the first project.
Back in the old days, computers were built to perform only integer arithmetic, as that the CPU lacked support for floating point calculations. For example, the original Intel systems deferred floating point to an optional coprocessor, the Intel 8087 chip. For users that did not have an Intel 8087, they relied upon specialized software libraries to implement floating point operations. These became known as soft floating point systems.
In this assignment, you will build a portion of a soft floating point library. You will write code, in C, to parse bitwise representation of floating point numbers. You will then implement, in assembly, unsigned multiplication. Using that multiplier, you will then manually calculate the product of two floating point values. Finally, you will add more logic components to your Logisim file to support the second project.
Part 1: Floating Point Parser
To begin, create a directory for your assignment and download the following files into that directory via wget:
- http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/hw3.c
- Skeleton code for this assignment.
- http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/hw3_asm.S
- Skeleton code for your assembly code.
- http://www.csee.umbc.edu/~jtang/cs411.s20/homework/hw3/Makefile
- Builds the code for this assignment, by simply running make. Also included is a clean target to remove all built objects. You do not need to modify this file, nor should you submit it with your work.
Now run make to compile everything. The program takes two integer parameters. The given skeleton code converts those parameters into two 16-bit values.
Your first job is to implement half_float_parse()
. This
function interprets its 16-bit incoming paramter val
as
an IEEE-753
half floating point value. That is, given val
,
display the sign bit, exponent, and significand
bits. If val
is a special value, then display that
special value.
The special values you need to handle are:
- Negative Zero
- Denormalized
- Infinity, both positive and negative
- Not a Number, both positive and negative
In your output, show the sign bit. Then show the decimal form of the
exponent and its actual magnitude (as a decimal). Then display the
bits associated with the significand, as a hexadecimal
value. Finally, if the bits within val
represent a
special value, then state it as such. The function
returns true
if the value is normal, false
otherwise.
As an aid to your parsing, the given C code
displays val
as if it were a float
. Be
careful with your shifting and bit masks. Be aware of what C's
">>" really does, and use correct variable types to
hold intermediary values
Part 2: Unsigned Integer Multiplication
The next task is to implement uint16_mult()
. Read the
function comments in hw3.c, and then
modify hw3_asm.S. Implement a shift-add multiplication
algorithm (or Booth's algorithm for extra
credit), directly in ARMv8-A assembly. Store the upper 16 bits of
the product at the memory address pointed to by
register X2
, lower 16 bits at the address pointed to
by X3
.
Note that you are performing 16-bit unsigned
multiplication. Use bitfield move instructions
(BFM
, SBFM
, and UBFM
) to copy
bits from one register to another. See sections 6.2.3 and 6.2.4 of
the ARM Cortex-A Series Programmer's Guide for examples on using the
bitwise operations.
Restriction 1: You may not use the built-in multiplication or division instructions for this assignment. That would be cheating, wouldn't it? You are limited to only adds, subtracts, shifts, rotates, bitmasks, bitwise logic, compares, and branches.
Restriction 2: You are limited to using only the
first eight registers (X0
through X7
). The
rationale will be clearer when you see the next assignment.
Restriction 3: ARMv8-A registers are 64-bits, but for this assignment you are limited to only accessing the lower 16 bits of each register. Because the product will have 32 bits, you will have to split that value across two registers, and then shift bits between the two. More will become evident in the next assignment.
Part 3: Floating Point Multiplication
The final code to implement is half_float_mult()
. This
function takes two half floating point values, multiplies them, then
returns the resulting product.
Part of the code is given to you. You are responsible for calculating the new sign bit and new exponent bits. You are also to calculate the product of the signficands; do not neglect to consider the leading implicit one bit. You also need to normalize the product, adjusting exponent bits as necessary. The skeleton code will then reassemble the parts into a single half floating point.
Restriction 4: As with uint16_mult()
,
you may not use the real multiplication operator anywhere within
this function. Instead you must use
your uint16_mult()
implementation to calculate the
significands' product.
Restriction 5: uint16_mult()
returns
the product as two values. You may only work with those halves as
two separate uint16_t
variables. You will need to shift
bits between the two halves. Because the significands are each 11
bits (including leading one), the significands' product takes 22
bits of storage. The lower 20 bits are to the right of the binary
point, while the upper 2 bits are left of the binary point. Thus the
bit representations are:
variable: | product_upper |
product_lower |
||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bit within variable: | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
significands' product bit: | unused | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
In the next assignment, you will rewriteuint16_mult()
andhalf_float_mult()
in assembly. It is in your best interest to implement them cleanly, with minimal branching and local variables.
Part 4: Logic Components
In the first project you created a 16-bit ALU. Prepare your hw3.circ for the final project by removing the subcircuits Main, 2-bit Counter, Reel, and any extra credit portions, keeping only those needed for your 16-bit ALU. Then create a new Main subcircuit as the first circuit, and move Main to be the top-most circuit.
Create a new subcircuit, Register File. This register file must hold eight 16-bit registers. It has these seven inputs:
- ASel (3 bits)
- Selects a register to route out through the A Bus.
- BSel (3 bits)
- Selects a register to route out through the B Bus.
- WSel (3 bits)
- Selects a register to route out through the W Bus, and also which register to update.
- WEn (1 bit)
- If true and Clk is true, then update the register specified by WSel.
- WData (16 bits)
- Value to store into the register file, when WEn and Clk are true.
- 0 (1 bit)
- Reset line. If true, unconditionally reset all register values to zero.
- Clk (1 bit)
- If true and WEn is true, then update the register specified by WSel using the value at WData.
Caution: The register file is only to be updated when both WEn and Clk are true. Many previous students are tempted to connect Clk to both the register's enable and clock ports. This is incorrect. Clk only connects to a register's clock port.
Your system will have a 16-bit Program Counter register that is incremented by 1 aftor most clock cycles. Create another subcircuit, PC Control Unit, based upon the Instruction Fetch Unit from Lecture 12. This subcircuit has these inputs:
- CurPC (16 bits)
- Current Program Counter value.
- Imm (16 bits)
- Alternative program address.
- PCSel (1 bit)
- If false, select CurPC, incremented by 1. Otherwise, select Imm.
In your Main, add a Program Counter, Register
File, 16-bit ALU, and PC Control Unit. Store
the condition codes in a 4-bit register, with its own enable
control. Add a splitter to that register's output, to easily monitor
their values. Note the use
of tunnels
to make the subcircuit more understandable.
For now, add a dummy constant input to PC Control Unit. Test that your PC increments by repeatedly poking the clock line and changing PCSel. Ensure the PC is updated on a falling edge.
The connections between your register file, ALU, PC control unit, and clock line will be modified further in the next assignment.
Part 5: Required Documentation
Add a comment block to the top of hw3.c file that answers these questions:
-
Assume that every instruction in your hw3_asm.S takes
exactly one cycle to execute. What is the fewest number of
cycles it will take to perform your 16-bit multiplication, from
the time that execution enters your function up to (and
including) the final
ret
instruction? - What is the worst case (most number of cycles) possible for your uint16_mult() implementation?
Sample Output
Here is a sample output from running the program. The grader will use different values to test your submission.
$ ./hw3 0x4110 0x0421 For the bit pattern 0x4110 (half float value: 2.53125): Sign bit: 0 Exponent bits: 16 (actual magnitude: 1) Significand: 0x510 For the bit pattern 0x0421 (half float value: 6.30021e-05): Sign bit: 0 Exponent bits: 1 (actual magnitude: -14) Significand: 0x421 Part 2: multiplying 16656 and 1057: Correct product: 0x010ca310 (uint32_t value: 17605392) Part 2 product: 0x010ca310 (uint32_t value: 17605392) Part 3: multiplying 2.53125 and 6.30021e-05: Correct product: 0x093a (half float value: 0.000159502) Part 3 product: 0x093a (half float value: 0.000159502) $ ./hw3 0xabcd 0x6543 For the bit pattern 0xabcd (half float value: -0.0609436): Sign bit: 1 Exponent bits: 10 (actual magnitude: -5) Significand: 0x7cd For the bit pattern 0x6543 (half float value: 1347): Sign bit: 0 Exponent bits: 25 (actual magnitude: 10) Significand: 0x543 Part 2: multiplying 43981 and 25923: Correct product: 0x43f4d7a7 (uint32_t value: 1140119463) Part 2 product: 0x43f4d7a7 (uint32_t value: 1140119463) Part 3: multiplying -0.0609436 and 1347: Correct product: 0xd521 (half float value: -82.0625) Part 3 product: 0xd521 (half float value: -82.0625) $ ./hw3 0xfc00 0x0300 For the bit pattern 0xfc00 (half float value: -inf): Sign bit: 1 Exponent bits: 31 (actual magnitude: 16) Significand: 0x000 * negative infinity For the bit pattern 0x0300 (half float value: 4.57764e-05): Sign bit: 0 Exponent bits: 0 (actual magnitude: -14) Significand: 0x300 * denormalized Part 2: multiplying 64512 and 768: Correct product: 0x02f40000 (uint32_t value: 49545216) Part 2 product: 0x02f40000 (uint32_t value: 49545216)
Other Hints and Notes
- Ask plenty of questions on the Blackboard discussion board.
- At the top of your submitted files, list any help you received as well as web pages you consulted. Please do not use any URL shorteners, such as goo.gl or TinyURL. Also, do not cite shared data services, such as Pastebin, Dropbox, or Google Drive.
- C99 introduced fixed-width integer types. This assignment intentionally uses them, to force the compiler to use certain register assignments.
- During lecture the Program Counter was always incremented by 4, but for this homework increment it instead by 1. This is due to a limitation of Logisim, and will be further explained in the next assignment.
Extra Credit
You may earn an additional 10% credit for this assignment by
implementing a more difficult form
for uint16_mult()
. Instead of using a shift-add
algorithm, implement Booth's algorithm in ARMv8-A assembly. As
before, you may only use 16-bits from each of the registers, and you
may only use registers X0
through X7
. You
only need to handle unsigned integer operands.
For Part 5, calculate the cycle count using your Booth's algorithm implementation.
If you choose to perform this extra credit, put a comment at the top of your hw3.c file, alerting the grader.