Programming Speedup
You have a program that spends 54 minutes on one core computation using a brute-force algorithm. You are considering replacing that with a smarter algorithm that will reduce the core computation time to 6 minutes, but you'll add 34 minutes of data structure traversal time.
Speedup
What is the speedup, accounting for both computation and data structure time?
Amdahl's Law
If the core computation time of the original program is 95% of the total execution time, compute the overall speedup using Amdahl's law. Identify the fraction enhanced and fraction unenhanced.
Instruction Set Speedup
Assume load operations are 20% of the total instructions for your workload and stores are 10%. Your load and store instructions can use a register+offset to compute the address
LOAD R1, offset(R2) ; Register[R1] = Memory[ Register[R2] + offset ]
You are considering removing these displacement load and store instructions from the instruction set, instead requiring two instructions for the 60% of loads and stores when the offset is not 0.
ADD R1, R2, offset ; Register[R1] = Register[R2] + offset
LOAD R1, R1 ; Register[R1] = Memory[R1]
Average CPI
If the CPI of load and stores before the change is 5, while other instructions have a CPI of 4, what is the average CPI?
Clock speedup
What is the expected speedup from this change, if it will allow a 1.2x improvement in clock speed, with no change in CPI?
CPI speedup
What is the expected speedup from this change if it allows you to reduce the CPI for load and store instructions to 4 cycles per instruction, but you get no change to the clock cycle time?
Complex CPI
do { r = r + a[i] * b[i]; ++i; } while (--c != 0);
This code can be translated into this code for the Intel 8088 (yes the one from 1991), assuming r is in the BX register, c is in the CX register, i is in the SI register, and
loop: MOV AX, a[SI] ; AX = a[SI]
IMUL b[SI] ; AX = AX * b[SI]
ADD BX, AX ; r = r + AX
ADD SI, 2 ; SI = SI + 2 (for 2-byte data): this is the ++i
DEC CX ; CX = CX - 1: this is the --c
JNZ loop ; this is the while test
Using the timing data for this processor, compute the CPI for 100 iterations of this code. You will need to use the "Effective Address (EA)" timing table as well as the individual instruction tables. Given a 10 MHz clock speed, what is the expected execution time of this code?