CMSC 411 Homework 1-6 Fall 2006

CS411 Details of homework assignments HW1..HW6 and Midterm

Fall 2006

Click here for homework details HW7..HW12

    The most important item on all homework is YOUR NAME!
    Print. No readable name, no credit.
    Staple or clip pages together.

    Homework must be submitted when due.  You loose 10%, one grade,
    the first day homework is late. Then 10% each week thereafter.
    Max 50% off. A zero really hurts your average!
    Paper or EMail to squire@umbc.edu  ONLY PLAIN TEXT.
    I can NOT accept OCTET/STREAM. .doc .gif .jpg .rtf ...
    If I can not read or understand your homework, you do
    not get credit.  Type or print if your handwriting is bad. 
    Homework is always due on a scheduled class day within 15 minutes
    after the start of the class.  If class is canceled then homework
    is due the next time the class meets. EMailed homework has until
    midnight the day of the last sections due date.
    No "matching". No comparing. Do your own homework.

  EMail only plain text! No word processor formats.
       You may use a word processor or other software tools and
       print the results and turn in paper.
       Put CS411 and HW number in subject line on EMail.
       Put your name inside any EMail attachments.

Email HW 1,2,3, 5, 7,8,9,10,11,12 BUT submit HW4,6 part 1-3


 The "submit" facility only works on the "gl" machines.
 The student commands are:
    submit   cs411 HW4 file   puts your "file" into cs411 HW4
    submitrm cs411 HW4 file   removes your "file" from cs411 HW4
    submitls cs411 HW4        lists your files in cs411 HW4

 Note: For this semester the 'HW4' can be HW4, HW6, part1, part2 or part3.
       a) you must have your userid registered for "submit"
          send mail from a gl machine to squire if your submit fails
       b) you have to be logged onto a gl machine, putty or ssh are OK
       c) everything is case sensitive, sorry about the uppercase HW.

Do your own homework!

You can discuss homework with other class members but DO NOT COPY!

All parties involved in copying get zero on that assignment.

HW1 Terminology 25 points

     The answer is just two columns. The first column is the numbers
     1 through 26, the second column is the answer letter.
     Use each letter only once. Find the best fit.
     from the set {a-z}. Match the letter list with the number list.
     a  abstraction        n  DRAM
     b  assembler          o  implementation
     c  msb                p  instruction
     d  bit                q  instruction set architecture
     e  CPU                r  integrated circuit
     f  cache              s  operating system
     g  ALU                t  Memory
     h  compiler           u  processor
     i  computer family    v  semiconductor
     j  control            w  supercomputer
     k  datapath           x  transistor
     l  die                y  VLSI
     m  defect             z  yield

     1  central processing unit
     2  very large scale integration
     3  another name for a computer
     4  flaw (as in a wafer)
     5  an amplifier (solid state device)
     6  a wafer is cut up into many
     7  a place for instructions and data
     8  percent of chips that are good
     9  dynamic random access memory
    10  a device made by putting impurities into silicon
    11  high level description of the important information
    12  converts a source line to a machine instruction
    13  most significant bit, often the sign bit
    14  memory on the CPU holding recent instructions and data
    15  binary digit
    16  ISA
    17  IC
    18  the basic unit of a computer program
    19  large group of processors used as one computer
    20  arithmetic and logic unit
    21  OS
    22  circuits that direct the flow in datapaths
    23  converts statements to computer instructions
    24  upward compatible computers
    25  the paths where data flows
    26  the result of building a design

HW2 Evaluating Benchmarks 25 points

       You do not have to copy the questions, but show the
       computation and clearly indicate the answers.
       Be sure to label the answers with the part number.

 You are the lead designer of a new processor. The processor design
 and compiler are complete, and now you must decide whether to
 produce the current design as it stands or spend additional time
 to improve it. You discuss this problem with your engineering
 team and arrive at the following options:

 a) Leave the design as it stands. Call this the base machine, MBASE.
    It has a clock rate of 1.5GHz and the following measurements have
    been made using a simulator:

   instruction class  CPI  Frequency of use
             A         2     30%
             B         4     30%
             C         3     30% 
             D         5     10%

 b) Optimize the hardware. The hardware team claims they can improve
    the processor design to give a clock rate of 2.2GHz. Call this
    machine MOPT. The compiler team has made changes for this machine. 
    The following measurements have been made using a simulator:

   instruction class  CPI  Frequency of use
            A          2     30%
            B          3     40%
            C          4     20%
            D          5     10%

Part1: What is the CPI of MBASE?
                   CPI of MOPT? 

Part2: What is the MIPS rating of MBASE?
                   MIPS rating of MOPT?

Part3: How much faster is MOPT compared to MBASE?
       "How much faster" is a dimensionless ratio   faster/slower,
       this is called "speed up" always greater than 1.0,
       else "slow down" if less than 1.0

Part4: Using Amdahl's law:
       Suppose we enhance a machine to make all floating point
       instructions run five times faster. Look at how speedup
       behaves when we incorporate faster floating point hardware.
       If the execution time of some benchmark before floating
       point enhancement is 11 seconds total, what is the speedup
       if one-fifth of the 11 seconds was spent executing
       floating point instructions?

Part5: What is the speedup if one-half of the 11 seconds was spent
       executing floating point instructions?

Part6: How many total seconds did Part5 run?

Part7: Using Amdahl's law:
       You are going to enhance a machine and there are two possible
       improvements: Either make multiply instructions run four
       times faster than before, or make memory access instructions
       run two times faster than before. A program takes 100 seconds
       to execute before enhancement. 25% of the time is used by
       multiply instructions, 45% of the time is used by memory
       access instructions and the remaining 30% is used by other
       instructions.

   What is the speedup from just improving multiply instructions?
   What is the speedup from just improving memory access instructions?
   What is the speedup from improving both?
   (Show your work, the 100 seconds does not need to be used,
    you may just write the numbers in Amdahl's law format.)

HW3 Analyzing assembly and machine code 25 pts

Using the program matmul2.c from here or Downloadable source:
cp /afs/umbc.edu/users/s/q/squire/pub/download/matmul2.c . # the dot is part of the command

On a GL SGI machine, MIPS architecture only, irix.gl.umbc.edu
The Textbook and Project require you know a few SGI instructions
and their formats.

Part 1. Compare the assembly language printed by two compilers.
Part 2. Compare the assembly language printed by the compiler vs
the instructions in memory at execution time.

Note: The answers are not unique. It depends on which
compiler is used, which specific machine is used and
which options are used.

This assignment must be run on a GL SGI machine using:
c89 -g3 -O3 SGI compiler
gcc -g3 -O3 gnu compiler (much different on MIPS architecture)
^_____ letter upper case oh, NOT zero !

--------------------------------------------------------------------

Part 1
for getting assembly language source code to a file matmul2.s

gcc -g3 -O3 -S matmul2.c (creates matmul2.s)

mv matmul2.s matmul2gcc.s (save, next clobbers.)

c89 -g3 -O3 -S matmul2.c (creates matmul2.s differently)

mv matmul2.s matmul2sgi.s

Now, look in the files matmul2gcc.s and matmul2sgi.s
Ignore all lines where the first character is a dot "."

a) How may mul.d instructions in matmul2gcc.s ?
b) About how many mul.d instructions in matmul2sgi.s ?

------------------------------------------------------------------

Part 2

When running with redirection, ">", first test without redirection
to be sure you can type the correct input and it works. Then
type carefully or use a script to make the redirected run.
Extra "enter" keys may be needed at various places.
Ignore warning messages from debugger.

Remember memory addresses are in bytes, instructions take 4 bytes.
(Even in the 64 bit machine!)
In hex.out use an address to relate to memory to find the same word.
In the following sequences of commands, blank lines are typed as "enter"
Ignore information and error messages. Type very carefully!

c89 -g3 -O3 matmul2.c # need debug, -g3, for "stop main" to work
dbx -d a.out > hex.out
stop main
rerun
list 1,26

(#1)/100X

(#1)/100i

The file hex.out has the source listing with line numbers,
the hex address and hex instructions as loaded in memory and
the disassembly with hex address and decoded instruction.

An instruction field format is on page 207 of textbook.
mul.d is the MIPS=SGI double precision floating point multiply, "R" format.
Watch out for where the register values are placed.
(R2000 instructions differ from IRIX.GL.UMBC.EDU that are R??000.)

Most of the instruction in the loop are "housekeeping", there are various
instructions for loading and storing data, l.d and s.d are just one pair.

a) Do all the instructions have the same names in matmul2sgi.s and hex.out ?
b) Find a mul.d instruction in hex.out [use this for c) and d) ]
Write an assembly language line, write the machine address.
c) From the machine address, look up and write the mul.d instruction
from b) as hexadecimal
d) Write the hexadecimal as six decimal integers for the
fields 6,5,5,5,5,6 bits

Attach your "hex.out" file on paper or EMail (as plain text).

HW4 Use VHDL on a 32 bit PG adder 25 pts

  Computer Engineering Majors: You must design your own adder.
  The interface is the same and the file name must be add32.vhdl.
  Your adder must be an order square root of N adder. See hints
  in the handouts. Stage 1 has one delay, stage 2 has two delays,
  etc. Have exactly 32 bits of sum and carry out. The same test,
  given below, works for your adder. Possible hints: replace
  entity and architecture pg4 with fadd, a full adder and
  replace entity and architecture add4pg with c_sas that uses
  two fadd and some gates. Code main adder circuit down in 
  architecture add32. See end of Lecture 7 

  Everyone else:
  First:Get yourself set up to use a VHDL compiler/simulator.
        To use the Cadence VHDL on linux.gl.umbc.edu,
        Follow instructions exactly or you figure out a variation.
        Be on some computer with ssh, Putty, TeraTerm. Type commands:

  ssh  your-user-name@linux.gl.umbc.edu #or use Putty or TeraTerm
              (type in your password when asked)
  cp /afs/umbc.edu/users/s/q/squire/pub/download/cs411.tar  .
  tar -xvf cs411.tar
  cd vhdl
  cp Makefile.cadence Makefile  # only first time
  tcsh # or csh the following vhdl_cshrc will not work in bash
  source vhdl_cshrc
  make
  more add32_test.out
  make clean              # saves a lot of disk quota

      Now you need more starter files to do HW4:

  cp /afs/umbc.edu/users/s/q/squire/pub/download/add4pg.vhdl  .
  cp add4pg.vhdl add32.vhdl
  cp /afs/umbc.edu/users/s/q/squire/pub/download/pg4.vhdl  .
  cp /afs/umbc.edu/users/s/q/squire/pub/download/tadd32.vhdl  .
  cp /afs/umbc.edu/users/s/q/squire/pub/download/tadd32.run   .
  cp /afs/umbc.edu/users/s/q/squire/pub/download/tadd32.chk   .
 
  If you are using a browser, rather than 'cp' then:
  Build a four bit PG adder component or download and include
   add4pg.vhdl 
  Build the Propagate Generate component or download and include
   pg4.vhdl 


      Now complete HW4.
      When finished with HW4 "submit" a single file named add32.vhdl
      that is a PG 32 bit adder.

  submit cs411 HW4 add32.vhdl

    You will use the add32.vhdl file in the project, don't trash it.
    It is not important what the signal names are inside add32.vhdl,
    but keep the same interface, the entity declaration.

    Next: concatenate  pg4.vhdl to add32.vhdl

    Then: concatenate the following to add32.vhdl

    library IEEE;
    use IEEE.std_logic_1164.all;
    entity add32 is
      port(a    : in  std_logic_vector(31 downto 0);
           b    : in  std_logic_vector(31 downto 0);
           cin  : in  std_logic; 
           sum  : out std_logic_vector(31 downto 0);
           cout : out std_logic);
    end entity add32;

    architecture circuits of add32 is
      signal P0, P1, P2, P3, P4, P5, P6, P7: std_logic;
      ...
    begin
      a01: entity WORK.add4pg port map(a(3 downto 0),
                                       b(3 downto 0),
                                       cin,
                                       sum(3 downto 0),
                                       P0,
                                       G0);
      ...
    end architecture circuits; -- of add32


  Now fill in the "..." to finish HW4.
  If you do not know how to concatenate files, get  add32pg_start.vhdl 

  You need a 32 bit adder, so use eight instances of add4pg and
  two instances of pg4 in an add32 architecture.
  
  Connect the circuit per the handout. The two 16 bit adders
  are connected end to end to make a 32 bit adder.
  cin goes into the first pg4, the carry out from the second
  pg4 gets the signal name  cout.
  Use unique signal names or unique subscripts. All connections with
  the same name are tied together and have the same value.

  

  For testing your  add32  component download tadd32.vhdl and tadd32.run


  Use these commands to set up VHDL, then compile and simulate:

  On  linux.gl.umbc.edu  (use ssh to get there using your UMBC account.)

      You must ssh linux.gl.umbc.edu  because the Cadence
      software is licensed to some specific machines.
      Each time you log on to do VHDL, type the commands:

         cd vhdl
         tcsh
         source vhdl_cshrc

      Then do your VHDL homework or project.

      Then do your own thing with Makefile for HW4, then HW6, project
      You can most easily use this directory for HW4, HW6, and
      the five parts of the project.

      (Modify  Makefile  as shown below.)
      Add at end of the "all" list   tadd32.out

      somewhere with preceding and trailing blank lines

tadd32.out: add32.vhdl tadd32.vhdl tadd32.run
      ncvhdl -v93 add32.vhdl
      ncvhdl -v93 tadd32.vhdl
      ncelab -v93 tadd32:circuits
      ncsim -batch -logfile tadd32.out -input tadd32.run tadd32

      Note: be sure commands are preceded by a tab, not spaces

      Check the file tadd32.out to be sure your adder worked.
      The answers are in tadd32.chk

      You can check your output with the command

         diff -iw tadd32.out tadd32.chk

  Submit  ONE file  add32.vhdl  that has the entity  add32  in it.

      submit cs411 HW4 add32.vhdl

  Your circuits must run. Incorrect results loose points.

  Debugging: Find the lowest bit that is wrong in the first output
  that does not compare. Proofread the numbers and signal names.
  if 'cout' is a 'U' rather than '0' or '1', there is a break in
  the adder chain. Unfortunately, the hex output converts 'U'
  undefined and 'X' don't know to zero, so you do not see the error.

  You should include a few comments so anyone reading your circuits can
  understand them. Put in references to book rather than do a lot
  of typing.

  Follow the links below to Project and Download for more information.
  See the writeups on VHDL and sample circuits.
  The building blocks may become part of your final project.

  Special instruction for using VHDL on a PC in windows:
  Download and install Symphony Simili. (See VHDL Resource link.)

  Use ftp or scp to get /afs/umbc.edu/users/s/q/squire/pub/download/make.bat
  tadd32.vhdl
  tadd32.chks
  vhdlp.txt  (these explain command line)
  vhdle.txt


  vhdlp -x add32.vhdl
  vhdlp -x tadd32.vhdl
  vhdle -p -t 63ns tadd32

  Then,   vhdle -p -t 63ns tadd32 > tadd32.out
          fc  tadd32.out  tadd32.chks

  Ignore all differences except on simulation output, e.g. sum

HW5 Five questions 25 pts

 
It is best if you do not minimize. The grader has only the plain answers.

  1. Write two VHDL statements that implement the truth table below
     Just use  "and"   "or"   and  "not"  with parenthesis.
     the answer starts   x <=
                         y <=

        a b c | x y
        ------+----
        0 0 0 | 0 0
        0 0 1 | 0 0
        0 1 0 | 0 1
        0 1 1 | 1 0
        1 0 0 | 1 1
        1 0 1 | 0 1
        1 1 0 | 1 0
        1 1 1 | 0 0

  2. Write the VHDL statement that implements the logic diagram

          +----+
      a --|AND |____
      b --|    |   |
          +----+   | +----+
                   --|OR  |
          +----+     |    |
      c --|OR  |_____|    |__
      d --|    |     |    |  |
          +----+     |    |  |
                   --|    |  |
          +----+   | |    |  |
      e --|NOT |---| +----+  |  +----+
          +----+             |--|AND |
                                |    |-- g
      f ------------------------|    |
                                +----+

     Be sure to include the semicolon in VHDL statements,
     else you loose one point for each that is missing.

  3. Draw the logic diagram that represents the VHDL statement

       g <= ((not a or b) xor (b and d and not e)) and (e or not f);

  4. For the following schematic, Ripple Carry wiring:
     Use a, b, e and f  all as four ones. e.g. a <= "1111"   etc.
     4a) what is the six bit result s.
     4b) given that the time from any input to any output in the
         full adder is  2T, how much time does the longest
         path require?   the answer is   ____ T




  5. For the following schematic, Carry Save wiring:
     Use a, b, e and f  all as four ones. e.g. a <= "1111"   etc.
     5a) what is the six bit result s.
     5b) given that the time from any input to any output in the
         full adder is  2T, how much time does the longest
         path require?   the answer is   ____ T




remember, basic digital logic

HW6 Parallel Multiply Simulation 25 points

 
  Computer Engineering Majors: Design this homework for a 16 by 16
  multiplier that has a 32 bit product. Modify the test file for
  your design. Submit both your design as pmul16.vhdl and your test
  as pmul16_test.vhdl. Time in pmul16.run is 8704 per equation in
  pmul16_test.vhdl down near end.

  All others:
  This homework requires the creation of two small VHDL entities
  and the corresponding architectures with other changes to create
  an 8 bit by 8 bit parallel multiplier that produces an unsigned
  16 bit product. For starting the homework you are given a 4 bit
  by 4 bit parallel multiplier that produces an 8 bit unsigned
  product. Most of the 4 x 4 code is parameterized using VHDL
  generate statements, thus converting to 8 x 8 code is supposed
  to be relatively easy. Be sure to change all comments also.

  The 4 bit by 4 bit multiply to produce an 8 bit unsigned product is

  
  

  The component  madd  circuit is

   

   The VHDL source code for  pmul4  is

-- pmul4.vhdl parallel multiply 4 bit x 4 bit to get 8 bit unsigned product
--              uses VHDL 'generate' to have less statements
--              see diagram madd.jpg for madd schematic
--              see diagram pmul4.ps for pmul4 schematic

library IEEE;
use IEEE.std_logic_1164.all;

entity madd is      -- multiplying full adder stage
  port(c    : in  std_logic;   -- one input, think carry in
       b    : in  std_logic;   -- one input, think previous sum
       m    : in  std_logic;   -- multiplier bit
       a    : in  std_logic;   -- multiplicand bit
       sum  : out std_logic;   -- carry save sum out
       cout : out std_logic);  -- carry save carry out
end entity madd;

architecture circuits of madd is  -- multiplying full adder stage
  signal aa: std_logic;
begin
  aa <= a and m; -- logic could be reduced, yet probably circuit designed
  sum <= (aa and b and c) or (aa and not b and not c) or
         (not aa and b and not c) or (not aa and not b and c) after 1 ns;
  cout <= (aa and b) or (aa and c) or (b and c) after 1 ns;
end architecture circuits; -- of madd


library IEEE;
use IEEE.std_logic_1164.all;

entity pmul4 is  -- 4 x 4 = 8 bit unsigned product multiplier
  port(a : in  std_logic_vector(3 downto 0);  -- multiplicand
       b : in  std_logic_vector(3 downto 0);  -- multiplier
       p : out std_logic_vector(7 downto 0)); -- product
end pmul4;

architecture circuits of pmul4 is
  constant N  : integer := 3;     -- last row number
  constant NP : integer := N+1;   -- last row plus 1
  constant NM : integer := N-1;   -- last row minus 1
  type arr is array(0 to NP) of std_logic_vector(N downto 0);
  signal s    : arr; -- partial sums
  signal c    : arr; -- partial carries
  signal zero : std_logic := '0';
begin  -- circuits of pmul4
  -- the internal part of the multiplier is nested generate
  -- special case generate is needed for the top row,
  -- the bottom row, the left column and
  -- connecting to the product outputs.
  
  -- center 
  gmaddi: for i in 1 to N generate
    gmaddj: for j in 0 to NM generate
      maddij: entity WORK.madd
              port map(s(i-1)(j+1), c(i-1)(j), b(i), a(j), s(i)(j), c(i)(j));
    end generate gmaddj;  
  end generate gmaddi;  

  -- top row
  gmadd0j: for j in 0 to N generate
    madd0j: entity WORK.madd
            port map(zero, zero, b(0), a(j), s(0)(j), c(0)(j));
  end generate gmadd0j;

  -- left column
  gmaddiN: for i in 1 to N generate
    maddiN: entity WORK.madd
            port map(zero, c(i-1)(N), b(i), a(N), s(i)(N), c(i)(N));
  end generate gmaddiN;

  -- bottom row
  maddNP0: entity WORK.madd
           port map(s(N)(1), c(N)(0), '1', '0', s(NP)(0), c(NP)(0));
  maddNPN: entity WORK.madd
           port map(zero, c(N)(N), '1', c(NP)(NM), s(NP)(N), c(NP)(N));
  gmaddNP: for j in 1 to NM generate
    maddNPj: entity WORK.madd
             port map(s(N)(j+1), c(N)(j), '1', c(NP)(j-1), s(NP)(j), c(NP)(j));
  end generate gmaddNP;
  
  
  -- connect outputs
  gp0i: for i in 0 to N generate
    p0i: p(i) <= s(i)(0);
    pNi: p(i+NP) <= s(NP)(i);
  end generate gp0i;
  
end architecture circuits; -- of pmul4


  Notice that the only component used to build the multiplier
  is "madd" and some uses of "madd" have constants as inputs.

  Copy  pmul4.vhdl  to  pmul8.vhdl
  Edit  pmul8.vhdl and replace all  "pmul4"  with "pmul8"

  Look through the VHDL and change inputs from
  (3 downto 0)  to  (7 downto 0) and output from (7 downto 0)
  to (15 downto 0). Note: 4 bit numbers are (3 downto 0) in VHDL,
  and 16 bit numbers are (15 downto 0) in VHDL. Thus, the
  constant N goes from 3 to 7 when going from 4 bits to 8 bits.

  You must choose two different uses of "madd" with constant input(s)
  and code a simplified VHDL entity and architecture. Hint: make two
  copies of "madd" entity and architecture, give them different names,
  simplify by removing the constant input and delete the unneeded
  circuits. Replace the instantiations of your new entities in
  the pmul8 architecture. (Keep the original "madd" unchanged.)

  Look at the difference between samples.html sqrt8 and sqrt8m
  Notice how the  "Sm"  component was simplified for  "S0"  and  "S1".
  This is the idea for your simplifying  "madd".

  For testing your  pmul8  component download pmul8_test.vhdl and pmul8_test.run
and pmul8_test.chk

  cp /afs/umbc.edu/users/s/q/squire/pub/download/pmul8_test.vhdl  .
  cp /afs/umbc.edu/users/s/q/squire/pub/download/pmul8_test.run   .
  cp /afs/umbc.edu/users/s/q/squire/pub/download/pmul8_test.chk   .

      (Modify  Makefile as shown below.)
      Add at end of the "all" list   pmul8_test.out

      somewhere with preceding and trailing blank lines

pmul8_test.out: pmul8.vhdl pmul8_test.vhdl pmul8_test.run
      ncvhdl -v93 pmul8.vhdl
      ncvhdl -v93 pmul8_test.vhdl
      ncelab -v93 pmul8_test:circuits
      ncsim -batch -logfile pmul8_test.out -input pmul8_test.run pmul8_test

  If you have not typed these lines since logging in, type them now.
      tcsh
      source vhdl_cshrc

  Now run the simulation by typing   make
  Then check by typing   diff -iw pmul8_test.out pmul8_test.chk
  Everything is correct if there are no differences.

  submit cs411 HW6 pmul8.vhdl


  Using Symphony EDA the commands are:
  vhdlp -x pmul8.vhdl           all your code in this file
  vhdlp -x pmul8_test.vhdl      my test program
  vhdle -p -t 4608ns pmul8_test

  vhdle -p -t 4608ns pmul8_test > pmul8_test.out
  fc pmul8_test.out pmul8_test.chks

  (ignore differences other than from pmul8_test,
   e.g. ignore times, dates, versions, etc.)

Midterm exam. 15% of course grade

  Closed book. Multiple choice questions based on reading assignments,
  lectures, handouts and homework.
  Exam covers book: 1.1-1.5   common sense questions, not dates or people
                    4.1-4.5
                    page 97-101
                    3.1-3.6, B.5-6
                    5.1-5.4
      just instructions covered in class
      (nop, j, beq, add, sub, and, addi,
       sll, srl, cmpl, lw, sw)

  Exam covers homework: HW1-HW5
  Be sure to go over handouts,
  no questions on current events handouts