The goal of the semester project is to design and simulate a pipelined RISC CPU. Major components will be the pipelined ALU data path, the instruction decoder, hazard detection and associated forwarding/stall and cache memory controller.
The project is to be submitted in three incremental parts:
submit cs411 part1 part1.e
submit cs411 part2 part2.e
submit cs411 part3 part3.e
The files you submit are not the starter files but the starter files
with your additions to make it work.
PART1: Handle lw, sw, add, sub, ai, shl, shr and nop with no hazards.
(nop's will be inserted to prevent hazards.)
See opcodes.txt for detailed instruction formats and definitions.
You should use pipe2.e as a start for coding your circuit.
You can do your own shift circuit or use the bshift.e component.
Get add32.e if yours from HW4 is not working.
copy pipe2.e to part1.e them work on project in part1.e
ecomp add32.e bshift.e part1.e -o part1.net
esim < part1.run > part1.out
diff part1.out part1.chk should be no or few differences
some "RD" may be zero
some ir_s2, ir_s3, ir_s4 may be zero
no stalls, timing should be exact
For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
are pipeline registers and the component/memory names
inst_mem.mr, greg.mr, dmem.mr .
Before you check the results in registers and memory:
Did you compute your values of wr_reg and wr_mem ,
these should be computed in the appropriate stage.
Did you compute alusrc, memtoreg, regdst, cin, left, and shft?
Did you add signal log <= #b1;
The resulting registers should be:
Register 1 is 11111111 resulting from load word
Register 2 is 44444444 resulting from add
Register 3 is 22222222 resulting from subtract
Register 4 is 04444444 resulting from right shift 4
Register 5 is 11112500 from add immediate and then left shift 8
Memory location 2 is 11111111 from store word
no other memory changed!
General registers at end of simulation
greg 0- 3= 00000000 11111111 44444444 22222222
greg 4- 7= 04444444 11112500 00000000 00000000
greg 8-11= 00000000 00000000 00000000 00000000
greg12-15= 00000000 00000000 00000000 00000000
Data Memory at end of simulation
dmem 0- 3= 00112233 11111111 11111111 33333333
dmem 4- 7= 44444444 55555555 66666666 77777777
dmem 8-11= 88888888 00000000 00000000 00000000
dmem12-15= 00000000 00000000 00000000 00000000
Check the results in part1.out to be sure the instructions
worked. You can follow each instruction through the pipeline
by following the instruction register, ir_s* and check the
a, b, and c signals for correct values at each stage.
It is possible that your part1.out does not agree with
part1.chk but you should
be able to explain why. (Probably you have a timing problem.)
You may want to copy part1.run to another file and add more
'puts' statements to print out more internal signal names
in order to help debug your circuit.
Submit all components and your main circuit as one plain text
file using submit. No makefiles or run files or output is to be
submitted. Partial credit will be given based on number of
instructions simulated correctly. The starter file pipe2.e
only simulates lw.
PART2: Handle hazards. Detect hazards, prevent wrong results by data
forwarding where possible and then stall when necessary. Handle
jump and beq instructions as well as all in part1.
Note: jump and beq are followed by a delayed branch slot that
contains an instruction that is always executed. jump can not
cause a stall. If beq does not get data forwarding, then it
can stall, and stall, and stall. Add data forwarding for beq
by adding two mux's in the ID STAGE that get inputs from later
stages.
Data forwarding paths must cover at least those in Fig 6.51, p499.
Additional insite may be gained from a comparison of the
pipeline stages with and without data forwarding. See.
Implement your circuit assuming that software has correctly
filled the delayed branch slot and implement the branch in
the ID pipeline phase (e.g. Fig 6.51, Page 499) as modified for
this class project.
For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
are pipeline registers and the component/memory names
inst_mem.mr, greg.mr, dmem.mr and pc for program counter.
Run your circut with part2.run and
part2a.run and part2b.runto be sure it works!
Download files part2.chk and
part2a.chk and
part2b.chk to check answers:
ecomp add32.e bshift.e part2.e -o part2.net
esim < part2.run > part2.out
diff part2.out part2.chk
Then repeat for part2a and b which test branching (beq and jump)
Submit all components and your main circuit as one plain text
file using 'submit'. No makefiles or run files or output is to be
submitted. Partial credit will be given based on number of
data forwards, jump, beq, and hazard stalls handled correctly.
Do implement data forwarding into stage 1 (ID) for the beq
instruction.
Your circuit will not be tested with jump or branch addresses greater
than 15 bits, although this probably does not matter.
You may not get exactly the .chk results. Memory and registers
should agree. Your stalls might be different. Points will
be deducted for memory or register differences or grossly long
stalls. It may be an improvement if you stall less the .chk but
be sure to analyze your results. (Applies to Part2 and Part3)
Correction for a few internal signals are in these check files:
part2.chknew
part2a.chknew
PART3: Put a cache in the instruction memory (read only) and a cache
in the data memory (read/write)
Put the caches inside the inst_mem and dmem components.
Use the existing mr as the main memory.
Make a miss on the instruction cache cause a four cycle stall.
four 200ns cycles = 800ns
Make a miss on the data cache cause a eight cycle stall.
eight 200ns cycles = 1600ns
(remember a memory read can have "after 1600ns")
Fig 7.10, page 557 is a possible read only cache for inst_mem.
(75% credit if everything works to this point.)
You may submit this as part3a.e
Do a write through cache for the data memory.
(It must work to the point that results in main memory are
correct at the end of the run, partial credit for
partial functionality)
You may submit this as part3b.e
For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
are pipeline registers and the component/memory names
inst_mem.mr, greg.mr, dmem.mr, pc, cntr .
Run your circut with part3.run and
check against part3.chk
to be sure it works!
Test first with only instruction cache.
(save this file as part3a.e, test with part3a.run and part3a.chk)
Submit instruction cache only as part3a.e
Test with both instruction and data cache.
Submit this as part3b.e (Also OK as just part3.e)
(test with part3b.run and part3b.chk)
Submit all components and your main circuit as one plain text
file by using 'submit'. No makefiles or run files or output is to be
submitted. Partial credit will be given based on number of
instructions simulated correctly, number of hazards handled
correctly and proper operation of Icache and Dcache.
Expect waiting= some-big-number rather than 1,
because of big delays on memory read or write signals.
Last updated 4/29/99