CMSC 411 Project

CMSC 411 Computer Architecture Project

The goal of the semester project is to design and simulate a pipelined RISC CPU. Major components will be the pipelined ALU data path, the instruction decoder, hazard detection and associated forwarding/stall and cache memory controller.

 The project is to be submitted in three incremental parts:
   submit cs411 part1 part1.e
   submit cs411 part2 part2.e
   submit cs411 part3 part3.e

 The files you submit are not the starter files but the starter files
 with your additions to make it work.

 PART1: Handle lw, sw, add, sub, ai, shl, shr and nop with no hazards.
        (nop's will be inserted to prevent hazards.)
        See opcodes.txt for detailed instruction formats and definitions.
        You should use pipe2.e as a start for coding your circuit.
        You can do your own shift circuit or use the bshift.e component.
        Get  add32.e if yours from HW4 is not working.

        copy pipe2.e to part1.e them work on project in part1.e 
        ecomp add32.e bshift.e part1.e -o part1.net
        esim < part1.run > part1.out
        diff part1.out part1.chk        should be no or few differences
                                        some "RD" may be zero
                                        some ir_s2, ir_s3, ir_s4 may be zero
                                        no stalls, timing should be exact

        For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
        are pipeline registers and the component/memory names
        inst_mem.mr, greg.mr, dmem.mr .


        Before you check the results in registers and memory:
        Did you compute your values of wr_reg  and   wr_mem ,
        these should be computed in the appropriate stage.
        Did you compute alusrc, memtoreg, regdst, cin, left, and shft?
        Did you add  signal log <= #b1; 

        The resulting registers should be:
        Register 1 is 11111111 resulting from load word
        Register 2 is 44444444 resulting from add
        Register 3 is 22222222 resulting from subtract
        Register 4 is 04444444 resulting from right shift 4
        Register 5 is 11112500 from add immediate and then left shift 8

        Memory location 2 is 11111111 from store word
        no other memory changed!

        General registers at end of simulation 
        greg 0- 3= 00000000 11111111 44444444 22222222
        greg 4- 7= 04444444 11112500 00000000 00000000
        greg 8-11= 00000000 00000000 00000000 00000000
        greg12-15= 00000000 00000000 00000000 00000000
        Data Memory at end of simulation 
        dmem 0- 3= 00112233 11111111 11111111 33333333
        dmem 4- 7= 44444444 55555555 66666666 77777777
        dmem 8-11= 88888888 00000000 00000000 00000000
        dmem12-15= 00000000 00000000 00000000 00000000


        Check the results in part1.out to be sure the instructions
        worked. You can follow each instruction through the pipeline
        by following the instruction register, ir_s* and check the
        a, b, and c signals for correct values at each stage.
        It is possible that your part1.out does not agree with
        part1.chk but you should
        be able to explain why. (Probably you have a timing problem.)

        You may want to copy part1.run to another file and add more
        'puts' statements to print out more internal signal names
        in order to help debug your circuit.

        Submit all components and your main circuit as one plain text
        file using submit. No makefiles or run files or output is to be
        submitted. Partial credit will be given based on number of
        instructions simulated correctly. The starter file pipe2.e
        only simulates lw.

 PART2: Handle hazards. Detect hazards, prevent wrong results by data
        forwarding where possible and then stall when necessary. Handle
        jump and beq instructions as well as all in part1.
        
        Note: jump and beq are followed by a delayed branch slot that
        contains an instruction that is always executed. jump can not
        cause a stall. If beq does not get data forwarding, then it
        can stall, and stall, and stall. Add data forwarding for beq
        by adding two mux's in the ID STAGE that get inputs from later
        stages.

        Data forwarding paths must cover at least those in Fig 6.51, p499.
        Additional insite may be gained from a comparison of the
        pipeline stages with and without data forwarding.  See. 

        Implement your circuit assuming that software has correctly
        filled the delayed branch slot and implement the branch in
        the ID pipeline phase (e.g. Fig 6.51, Page 499) as modified for
        this class project.

        For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
        are pipeline registers and the component/memory names
        inst_mem.mr, greg.mr, dmem.mr and pc for program counter.

        Run your circut with  part2.run  and
         part2a.run  and part2b.runto be sure it works!
        Download files part2.chk and
         part2a.chk  and
         part2b.chk  to check answers:
          ecomp add32.e bshift.e part2.e -o part2.net
          esim < part2.run > part2.out
          diff part2.out part2.chk
  
        Then repeat for part2a and b which test branching (beq and jump)
        Submit all components and your main circuit as one plain text
        file using 'submit'. No makefiles or run files or output is to be
        submitted. Partial credit will be given based on number of
        data forwards, jump, beq, and hazard stalls handled correctly.

        Do implement data forwarding into stage 1 (ID) for the beq
        instruction.
        Your circuit will not be tested with jump or branch addresses greater
        than 15 bits, although this probably does not matter.

        You may not get exactly the .chk results. Memory and registers
        should agree. Your stalls might be different. Points will
        be deducted for memory or register differences or grossly long
        stalls. It may be an improvement if you stall less the .chk but
        be sure to analyze your results. (Applies to Part2 and Part3)

      Correction for a few internal signals are in these check files:
        part2.chknew
        part2a.chknew

 PART3: Put a cache in the instruction memory (read only) and a cache
        in the data memory (read/write)

        Put the caches inside the inst_mem and dmem components.
        Use the existing mr as the main memory. 
        Make a miss on the instruction cache cause a four cycle stall.
                           four 200ns cycles = 800ns
        Make a miss on the data cache cause a eight cycle stall.
                           eight 200ns cycles = 1600ns
                           (remember a memory read can have "after 1600ns")

        Fig 7.10, page 557 is a possible read only cache for inst_mem.
        (75% credit if everything works to this point.)
        You may submit this as part3a.e

        Do a write through cache for the data memory.
        (It must work to the point that results in main memory are
         correct at the end of the run, partial credit for
         partial functionality)
        You may submit this as part3b.e

        For grading reasons, keep the signal names *_s2, *_s3, *_s4 that
        are pipeline registers and the component/memory names
        inst_mem.mr, greg.mr, dmem.mr, pc, cntr .

        Run your circut with   part3.run  and
        check against  part3.chk 
        to be sure it works!
        Test first with only instruction cache.
        (save this file as part3a.e, test with part3a.run and part3a.chk)
        Submit instruction cache only as part3a.e

        Test with both instruction and data cache.
        Submit this as part3b.e  (Also OK as just part3.e)
        (test with part3b.run and part3b.chk)

        Submit all components and your main circuit as one plain text
        file by using 'submit'. No makefiles or run files or output is to be
        submitted. Partial credit will be given based on number of
        instructions simulated correctly, number of hazards handled
        correctly and proper operation of Icache and Dcache.

        Expect  waiting= some-big-number  rather than 1,
        because of big delays on memory read or write signals.
Last updated 4/29/99