CMSC 411 Lecture 24, Virtual Memory 2

    <- previous    index    next ->

Lecture 24, Virtual Memory 2


Just a little history from the current man page for  gcc.
Note: The term "text" and "text segment" are instructions,
executable code.

From  man gcc    then  /segment

-fwritable-strings
    Store string constants in the writable data segment and don't
    uniquize them.  This is for compatibility with old programs which
    assume they can write into string constants.

    Writing into string constants is a very bad idea; ''constants''
    should be constant.

    This option is deprecated.

-fconserve-space
    Put uninitialized or runtime-initialized global variables into the
    common segment, as C does.  This saves space in the executable at
    the cost of not diagnosing duplicate definitions.  If you compile
    with this flag and your program mysteriously crashes after "main()"
    has completed, you may have an object that is being destroyed twice
    because two definitions were merged.

    This option is no longer useful on most targets, now that support
    has been added for putting variables into BSS without making them
    common.

-msep-data
    Generate code that allows the data segment to be located in a dif-
    ferent area of memory from the text segment.  This allows for
    execute in place in an environment without virtual memory manage-
    ment.  This option implies -fPIC.

-mno-sep-data
    Generate code that assumes that the data segment follows the text
    segment.  This is the default.

-mid-shared-library
    Generate code that supports shared libraries via the library ID
    method.  This allows for execute in place and shared libraries in
    an environment without virtual memory management.  This option
    implies -fPIC.


An example of building a self contained executable from a  .a  library
and an executable that needs a shared object  .so  available:

First, the main programs and the four little C library functions that
print their name in execution:

 /* ax.c  for  libax.a  test */
 #include <stdio.h>
 int main()
 {
   printf("In ax main \n");
   abc();
   xyz();
   return 0;
 }

 /* abc.c for libax.a test */
 #include <stdio.h>
 void abc()
 { printf("In abc \n"); }

 /* xyz.c  for libax.a test */
 #include <stdio.h>
 void xyz()
 { printf("In xyz \n"); }

 /* ab.c  for  libab.so  test */
 #include <stdio.h>
 int main()
 {
   printf("In ab main \n");
   aaa();
   bbb();
   return 0;
 }

 /* aaa.c for libab.so test */
 #include <stdio.h>
 void aaa()
 { printf("In aaa \n"); }

 /* bbb.c for libab.so test */
 #include <stdio.h>
 void bbb()
 { printf("In bbb \n"); }

 Then, the Makefile_so
 # Makefile_so  demo  ar  and  ld  and  shared library .so

 all: ax ab

 ax : ax.c  abc.c  xyz.c
	gcc -c abc.c               # compile for library
	gcc -c xyz.c
	ar crv libax.a abc.o xyz.o # build library
	ranlib libax.a
	rm -f *.o
	gcc -o ax ax.c -L. -lax    # use library
	./ax

 ab : ab.c aaa.c bbb.c
	gcc -c -fpic -shared aaa.c  # compile for library
	gcc -c -fpic -shared bbb.c
	ld  -o libab.so -shared aaa.o bbb.o -lm -lc
	rm -f *.o
	gcc -o ab ab.c -L. -lab    # use links to library
	./ab  # need LD_LIBRARY_PATH to include this directory

 abg : ab.c aaa.c bbb.c  # uses /usr/local/lib needs root priv
	gcc -c -fpic -shared aaa.c
	gcc -c -fpic -shared bbb.c
	ld  -o libab.so -shared aaa.o bbb.o -lm -lc
	rm -f *.o
	cp libab.so /usr/local/lib
	rm -f libab.so
	ldconfig
	gcc -o abg ab.c -lab
	./abg   # any user has access to  libab.so

 clean:
	rm -f ax
	rm -f ab
	rm *.a
	rm *.so

To see what is inside, gcc -S -g3 ax.c
ax.s


Here are some examples of addressing as seen in assembly code
and .o or .obj files. Then in executable a.out or .exe files
as seen through the debugger. The "relocatable" addresses are
converted to "virtual" addresses then during execution converted
to "physical" or RAM addresses. Coming soon to a WEB page near you.

To get memory map, yuk, output, add  -Ml,-M  to  gcc -o ... command

ax.map



Information that might help with Project part3

Some are ready to implement part3 of the project.
Part3 description.

CE Majors will implement the cache as hardware using multiplexers,
comparators (equal26) and gates. Some behavioral code to make
the cache memory may be used based on the behavioral code below.

Others may use a complete behavioral solution, just code the
hit/miss process you did by hand in Homework 8. This may be
based on the code below.


        Put the caches inside the instruction memory, part3a, and
        and data memory, part3b, components (entity and architecture).
        (you will need to pass a few extra signals in and out)

        Use the existing shared memory data as the main memory. 
        Make a miss on the instruction cache cause a two cycle stall.
        Make a miss on the data cache cause a three cycle stall.
        Previous stalls from part2b must still work.

        Both instruction cache and data cache hold 16 words
        organized as four blocks of four words. Remember vhdl
        memory is addressed by word address, the MIPS/SGI memory
        is addressed by byte address and a cache is addressed by
        block number. 

        The cache schematic for the instruction cache was handed out
        in class and shown in. icache.jpg

        The cache may be implemented using behavioral VHDL, basically
        writing sequential code in VHDL or by connecting hardware.



        Possible behavioral, not required, VHDL to set up the start of a cache:
        (no partial credit for just putting this in your cache.)

          -- add in or out signals to entity instruction_memory as needed
          -- for example, 'clk'  'clear'  'miss'  

          architecture behavior of instruction_memory is
            subtype block_type is std_logic_vector(154 downto 0);
            type cache_type is array (0 to 3) of block_type;
            signal cache : cache_type := (others=>(others=>'0'));
            -- now we have a cache memory initialized to zero
          begin  -- behavior
            inst_mem:
            process ... -- whatever, does not have to be just 'addr'
              variable quad_word_address : natural;  -- for memory fetch
              variable cblock : block_type;-- the shaded block in the cache
              variable index : natural;   -- index into cache to get a block
              variable word : natural;    -- select a word
              variable my_line : line;    -- for debug printout
              variable W0 : std_logic_vector(31 downto 0);
              ...
            begin
              ...
              index := to_integer(addr(5 downto 4));
              word  := to_integer(addr(3 downto 2));
              cblock := cache(index);  -- has valid (154), tag (153 downto 128)
                                       -- W0 (127 downto 96), W1(95 downto 64)
                                       -- W2(63 downto 32), W3 (31 downto 0)
                                       -- cblock is the shaded block in handout
              ...
              quad_word_address := to_integer(addr(13 downto 4));
              W0 := memory(quad_word_address*4+0);
              W1 := memory(quad_word_address*4+1); -- ...
                                       -- fill in cblock with new words, then
              cache(index) <= cblock after 30 ns; -- 3 clock delay
              miss <= '1', '0' after 30 ns;       -- miss is '1' for 30 ns
              ...
              -- the part3a.chk file has 'inst' set to zero while 'miss' is 1
              -- not required but cleans up the "diff"

        Possible hardware, not required, VHDL to set up the start of a cache:
        (no partial credit for just putting this in your cache.)
        is to use an memory entity for the cache such as:

-- cache memory for hardware solution to part3a
--              this is just the memory part, you implement the cache

library IEEE;
use IEEE.std_logic_1164.all;

entity cache_memory is
  port(index        : in  std_logic_vector (1 downto 0);
       clear        : in  std_logic;
       write_data   : in  std_logic_vector (154 downto 0);
       write_enable : in  std_logic;  -- rising clock and enable
       write_clk    : in  std_logic;  -- required to write
       out_data     : out std_logic_vector (154 downto 0));
end entity cache_memory;

library IEEE;
use IEEE.std_logic_textio.all;
use WORK.util_pkg.all;
use STD.textio.all;

architecture behavior of cache_memory is
  subtype block_type is std_logic_vector(154 downto 0);
  type cache_type is array (0 to 3) of block_type;
  signal cache_ram : cache_type := (others=>(others=>'0'));
begin  -- behavior
  cache_mem: process(index, clear, write_clk)
               variable block_addr : natural;  -- index
             begin
               block_addr := to_integer(index);
               if clear='1' then
                 out_data <= (others=>'0');
               elsif write_enable='1' and write_clk='1' then
                 cache_ram(block_addr) <= write_data;  -- write cache
                 out_data <= write_data;
               else
                 out_data <= cache_ram(block_addr) after 250 ps; -- read cache
               end if;
             end process cache_mem;


  debug:  process -- used to show cache
            variable my_line : LINE;   -- not part of working circuit
          begin
            wait for 9.5 ns;         -- just before rising clock
            for I in 0 to 3 loop
               write(my_line, string'("line="));
               write(my_line, I);
               write(my_line, string'("  V="));
               write(my_line, cache_ram(I)(154));
               write(my_line, string'("  tag="));
               hwrite(my_line, cache_ram(I)(151 downto 128)); -- ignore top bit
               write(my_line, string'("  w0="));
               hwrite(my_line, cache_ram(I)(127 downto 96));
               write(my_line, string'("  w1="));
               hwrite(my_line, cache_ram(I)(95 downto 64));
               write(my_line, string'("  w2="));
               hwrite(my_line, cache_ram(I)(63 downto 32));
               write(my_line, string'("  w3="));
               hwrite(my_line, cache_ram(I)(31 downto 0));
               writeline(output, my_line);
            end loop;
            writeline(output, my_line);  -- blank line
            wait for 0.5 ns;         -- rest of clock
          end process debug;
end architecture behavior;  -- of cache_memory

        and using gates and multiplexors to implement the cache.
        The cache implementation goes inside the instruction_memory entity.
        Any additional entities you need must precede the instruction_memory
        entity in the file part3a.vhdl.

        For debugging your cache, you might find it convenient to add
        this 'debug' print process inside the instruction_memory architecture:
        Then diff -iw part3a.out part3a_print.chk

  debug:  process -- used to print contents of I cache
            variable my_line : LINE;   -- not part of working circuit
          begin
            wait for 9.5 ns;         -- just before rising clock
            for I in 0 to 3 loop
               write(my_line, string'("line="));
               write(my_line, I);
               write(my_line, string'("  V="));
               write(my_line, cache(I)(154));
               write(my_line, string'("  tag="));
               hwrite(my_line, cache(I)(151 downto 128));  -- ignore top bits
               write(my_line, string'("  w0="));
               hwrite(my_line, cache(I)(127 downto 96));
               write(my_line, string'("  w1="));
               hwrite(my_line, cache(I)(95 downto 64));
               write(my_line, string'("  w2="));
               hwrite(my_line, cache(I)(63 downto 32));
               write(my_line, string'("  w3="));
               hwrite(my_line, cache(I)(31 downto 0));
               writeline(output, my_line);
            end loop;
            wait for 0.5 ns;         -- rest of clock
          end process debug;

        see part3a_print.chk with debug

        You may print out signals such as 'miss' using  prtmiss from.
        debug.txt
        
        Change  MEMread : std_logic := '1'; to
                MEMread : std_logic := '0';  for part3b.

        You submit on GL using:  submit cs411 part3 part3a.vhdl

        Do a write through cache for the data memory.
        (It must work to the point that results in main memory are
         correct at the end of the run and the timing is correct,
         partial credit for partial functionality)
        You submit this as part3b.vhdl

    <- previous    index    next ->

Lecture 24, Virtual Memory 2

Information that might help with Project part3

Other links

Go to top