Description |
Subset of the instructions:
Consider a simplified MIPS CPU that follows the Tomasulo’s
Algorithm pipeline design discussed in class and in the textbook
while accepting only the following
Instruction Class |
Instruction Mnemonic |
Data Transfers |
LW, L.D, SW, S.D |
Arithmetic |
DADD, DSUB, ADD.D, SUB.D, MUL.D, DIV.D |
Tomasulo-based pipeline analyzer.
1) The program should read the instructions from a trace file,
2) then go through all the stages in the pipeline with proper number
of cycles in each stage for each instruction, according to Tomasulo’s
scheme
3) The program does not need to consider the actual effect of the
instructions.
4) Assume the availability of 8 integer and 8 floating-point registers.
5) The program should ignore multiple white space characters and use
“,” as a separator between operands.
6) As a result, the program outputs some statistics related to the
execution of the trace, whose format will be given later.
7) The algorithm description on page 193 of the textbook (Computer
Archiecture) is a good reference. Configuration
of the pipeline:
Function units:
# Integer operations: 2 # reservation stations per unit: 4
# Floating point add & sub: 3 # reservation stations per unit:
3
# Floating point multiplier: 1 # reservation stations per unit:
2
# Floating point division: 1 # reservation stations per unit: 2
# Data memory unit: 1 # reservation stations per unit: 3
# Common data bus (CDB): 1
Latencies:
Instruction
producing result |
Instruction
using result |
Latency
in clock cycles |
Integer operations |
ALU operation |
0 |
Integer operations |
Store |
0 |
ADD.D, SUB.D |
ALU operation |
2 |
ADD.D, SUB.D |
Store |
1 |
MUL.D |
ALU operation |
10 |
MUL.D |
Store |
9 |
DIV.D |
ALU operation |
40 |
DIV.D |
Store |
39 |
Load |
any |
1 |
Assume latencies for instructions not listed above are 0. Assume
there is no limitations for resources other than those listed above.
Result
Example: Consider the following input assembly program:
L.D F6, 34(R2)
L.D F2, 45(R3)
MUL.D F0, F2, F4
SUB.D F8, F2, F6
DIV.D F10, F0, F6
ADD.D F6, F8, F2
The output of the analyzer should be:
The longest instruction waiting time at a reservation station:
11
The clock cycle for which the largest number of units simultaneously
read a value on CDB: 4
The total number of clock cycles for the trace to complete execution:
57
& output the pineline circles.
|