Cache
(65 points)
Consider a processor with 64-bit words and the following cache stats when running the SPEC CPU 2006 benchmarks (miss rates based on actual data for the Intel Core i7):
Level | Data Size | Associativity | Block size | Cache Blocks | Address Bits | Hit Time | Miss Rate | Hit Rate | Miss Penalty | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Tag bits | Index bits | Offset bits | |||||||||
L1 Inst | 32 KiB | 4-way | 32 B | 3 cycles | 0.4% | ||||||
L1 Data | 32 KiB | 8-way | 32 B | 4 cycles | 9.5% | ||||||
L2 | 256 KiB | 8-way | 64 B | 10 cycles | 4% | ||||||
L3 | 8 MiB | 16-way | 64 B | 40 cycles | 1% | ||||||
Memory | — | — | — | — | — | — | — | 200 cycles | — | — | — |
Other Stats (26 points)
Provide the missing data from the table (the yellow boxes).
Provide the AMAT for instructions and data.
Cache Addressing (24 points)
- Give the index, tag, and offset for instruction address
0x00000001060cff30
at each of L1, L2 and L3. - Give the index, tag, and offset for data address
0x00007ffee9b309d8
at each of L1, L2 and L3.
Impact on Speedup (15 points)
Assume the L1 access times for instructions and data are built into the 20-stage pipeline, and that the ideal CPI without stalls is 1. What is the expected CPI including instruction and data memory stalls, if 20% of the instructions access data memory? What is the pipeline speedup accounting for these stalls?
Virtual Memory
(35 points)
The page size on this processor is 2 MiB. The instruction TLB holds 8 entries, the data TLB holds 32 entries, and the shared level-2 TLB holds 512 entries.
Accessible data (15 points)
What is the total instruction memory that can be accessed using entries in the level-1 instruction TLB? What is the total data memory tha can be accessed using entries in the level-1 data TLB? What is the total instruction or data memory that can be accessed without a TLB fault to the OS?
Virtual Addressing (20 points)
Use the 2 MiB page size and the cache architecture above. Assume L1 and L2 caches use virtual addressing, while the L3 cache uses physical addressing.
- What is the virtual page number and page offset for instruction memory address
0x00000001060cff30
? - If this instruction address maps to physical page number 0x1586, what is the full physical address?
- What are the L3 tag, index, and offset?
- What is the virtual page number and page offset for data memory address
0x00007ffee9b309d8
? - If this data maps to physical page number 0x05ae, what is the full physical address?
- What are the L3 tag, index, and offset?
Extra Credit
(25 points)
Write a program to time cache and memory access speed for your CPU. You will need to look up the CPU model and clock speed in your system, find information online for its cache sizes, and time accesses reading and writing data that fits in one level but is too big to fit in the next lower level (forcing capacity misses for each access).
To avoid allowing the CPU to prefetch your data, randomize the order that you visit the array elements. For each test size, use Sattolo's algorithm to fill an array with sequence of integers such that you can chain from one to the next to walk through the entire array in random order:
// visit every element of Sattolo-ordered array uint64_t r = array[0]; while(r != 0) { r = array[r]; }
This timing class provides a common interface for a high-resolution timer for C++11, Windows, Mac, and Linux. Use it like this:
Timer timer; // starts timing // expensive computation double elapsed = timer.time(); // returns elapsed time in seconds printf("time in seconds: %g\n", elapsed);
Initializing the sequence will touch every element of an array at the current test size, and reading it will access that data again. How long those read accesses take divided by the total number of accesses will give an estimate of the time per access (plus some additional loop overhead). Dividing by the clock rate will give an estimate of the number of cycles per access. Report the median of five measurements at each size.
Submit your code, as well as a document listing your CPU and its clock speed, cache sizes (and where you found them), what data sizes you tested for your results, and your numeric results for cycles per access at each level.
Submitting
Follow the class git instructions to submit. You can do your work electronically or on paper which you scan or photograph, or on a tablet. Submit your work in hw5 directory, and commit, tag, and push your final submission before the deadline. Be sure to edit the hw5/readme.txt file to tell us what files contain the answers to which problems (especially if there is more than one file), and what tools you used.