Pipelining
Consider this sequence of MIPS instructions
ADD $s1, $s1, $s2
SUBI $s3, $s1, #1
BNEZ $s3, Offset
Timing
Using the standard 5-stage MIPS pipeline, show the pipeline timing for these 3 instructions. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known.
Instructions ↓ | Cycles → | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
ADD $s1, $s1, $s2 |
IF | ID | EX | DM | WB | ||||||
SUBI $s3, $s1, #1 |
|||||||||||
BNEZ $s3, Offset |
Pipeline changes
Draw the a dataflow diagram for this architecture, with the changes necessary to support a new "branch on not equal" that compares with an immediate value:
ADD $s1, $s1, $s2
BNEQI $s1, #1, Offset
Do not make any changes that would increase the clock cycle time. Remember that the register fetch happens in the second half of the clock cycle, so there is not time to do an arithmetic operation with a register in the same stage as the register fetch.
New timing
Show the pipeline timing using your new pipeline. Draw arrows between any stages where forwarding occurs. Circle the stage when the branch outcome is known
Instructions ↓ | Cycles → | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
ADD $s1, $s1, $s2 |
IF | ID | EX | DM | WB | ||||||
BNEQI $s1, #1, Offset |
Encoding
This new instruction requires two immediate values (the immediate used for comparison, and the branch offset). Assume we support this by introducing a new MIPS instruction encoding format as shown below. The opcode takes 6 bits, the register takes 5 bits, leaving 21 to split between the two immediates. How would you decide how many bits to allow for each?
R-Type | opcode(6) | Rs(5) | Rt(5) | Rd(5) | shift(5) | func(6) | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I-Type | opcode(6) | Rs(5) | Rt(5) | immediate(16) | ||||||||||||||||||||||||||||
J-Type | opcode(6) | address(26) | ||||||||||||||||||||||||||||||
New | opcode(6) | Rs(5) | immediate(??) | offset(??) |
Branch Prediction
Tired of dealing with all of the issues predicting conditional jumps, you decide to make JR
, the unconditional jump to an address in a register, the only means for performing conditional flow instruction in your new MIPS-like ISA. You figure that JR
, in combination with a conditional move instruction MOVZ
, can do anything the other types of branches can. "MOVZ R1, R2, R3
" is equivalent to this C code, "if (R3 == 0) R1 = R2
", but as a pure ALU operation without branching or control hazards.
Code Conversion
Give an instruction sequence to accomplish the following, but replacing BNEZ
with some combination of MOVZ
, JR
and possibly other operations, assuming additional registers are available for your use:
BNEZ R1, Loop
NonLoop: ...
CPI using BNEZ
You measure the standard MIPS pipeline (diagram above) for an application typical of your expected users. From that you get the following statistics: 15% of total instructions are branches, 65% of all branches are taken. What is the expected CPI using BNEZ
and a predict not taken strategy?
MOVZ
/JR
compared to BNEZ
Accounting for the change in instruction count and the fact that branches are always taken on your new architecture, what is the expected speedup (or slowdown) of the architecture with MOVZ
/JR
as compared to the architecture with BNEZ
?
Discussion
Spoiler alert: the answer to the previous question gives a slowdown for using MOVZ
/JR
. Why?