The MIPS R4000 had an eight-stage pipeline as shown here in abbreviated form:
Pipeline Speedup
(10 points)
The pipelining without hazards is shown in this pipeline timing diagram:
Cycle -> Instruction 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: add $1, $2, $3 IF IS RF EX DF DS TC WB 1: add $4, $5, $6 IF IS RF EX DF DS TC WB 2: add $7, $8, $9 IF IS RF EX DF DS TC WB 3: add $10,$11,$12 IF IS RF EX DF DS TC WB 4: add $13,$14,$15 IF IS RF EX DF DS TC WB 5: add $16,$17,$18 IF IS RF EX DF DS TC WB 6: add $19,$20,$21 IF IS RF EX DF DS TC WB 7: add $22,$23,$24 IF IS RF EX DF DS TC WB 8: add $25,$26,$27 IF IS RF EX DF DS TC WB
- What is the ideal pipeline speedup for this processor?
- What are three assumptions of the ideal pipeline speedup?
Data Hazards
(45 points)
For each data hazard that can happen between R-type instructions with this MIPS R4000 organization:
- Show a sequence of MIPS instructions that has that that hazard and no others.
- Explain what forwarding would be necessary to avoid it (e.g. in cycle 2, forward from the ALU result pipeline register between EX and DF to the ALU input in EX).
- Show the necessary forwarding path(s) for just this hazard on the abbreviated datapath.
Finally, draw a version of the abbreviated datapath with all of the R-type instruction forwarding paths.
Control Hazards
(45 points)
Consider this sequence of instructions, implementing y = |x|+1:
if (x<=0) x = -x; y = x + 1;
If x is in register $1 and y in register $2, this could correspond to the following assembly code:
bgtz $1, endif sub $1, $0, $1 endif: addi $2, $1, #1
Assume the branch target can only be resolved at the end of the EX stage, and the processor always predicts not taken.
- Show a pipeline timing diagram for the MIPS R4000 when $1 is 5.
- Show a second pipeline timing diagram for when $1 is -5
- What is the branch penalty?
- If 20% of the instructions in a program are branches, and 40% of the branches are taken, what is the expected average CPI?
Extra credit
(25 points)
iAPX 86,88 (Intel Advanced Processor Architecture 8086/8088) was the predecessor to the modern x86 Intel CPU architectures. It is a two operand machine (op dest, src), supporting source/destination operand combinations of register/memory, memory/register, memory/memory, immediate/register, and immediate/memory. Consider the following code segment and instruction set reference table. Assume the initial value for ARRAY[100] is 128 and for ARRAY[200] is 2048
MOV AX, ARRAY[100] ADD AX, 128 MOV CX, 4 MUL CX MOV ARRAY[100], AX AGAIN: MOV AX, ARRAY[200] SUB AX, 256 MOV ARRAY[200], AX MOV CX, AX MOV AX, ARRAY[100] SUB CX, AX JCXZ AGAIN
Instruction | Operands | Clock Cycles |
---|---|---|
MOV dest, src | reg, reg | 2 |
reg, imm | 4 | |
reg, mem | 12 | |
mem, reg | 13 | |
ADD dest, src SUB dest,src |
reg, reg | 3 |
reg, imm | 4 | |
reg, mem | 13 | |
mem, reg | 24 | |
mem, imm | 25 | |
MUL src (AX is dest) | reg | 118 |
JCXZ (jump if CX==0) | label | 18 |
Compute the CPI and expected execution time for a 5 MHz 8086
Submitting
Follow the class git instructions to submit. You can do your work electronically (e.g. use latex for written content and equations and something like Inkscape or Adobe Illustrator for any drawings), or on paper which you scan or photograph, or on a tablet. Submit your work in hw4 directory, and commit, tag, and push your final submission before the deadline. Be sure to edit the hw4/readme.txt file to tell us what files contain the answers to which problems (especially if there is more than one file), and what tools you used.