## **CMSC 611: Advanced Computer Architecture**

## Branch Prediction / Cache

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / © 2003 Elsevier Science

#### Performance of 2-bit Branch Buffer



SPEC89 benchmarks

#### **Correlating Predictors**



- The behavior of branch b3 is correlated with the behavior of b1 and b2
- Clearly of both branches b1 and b2 are untaken, then b3 will be taken
- A predictor that uses only the behavior of a single branch to predict the outcome of that branch can never capture this behavior
- Branch predictors that use the behavior of other branches to make a prediction are called correlating or two-level predictors

# Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch

# (2,2) Correlating Predictors

- Record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table
- (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters
  - Old 2-bit branch history table is a (0,2) predictor
- In a (2,2) predictor, the behavior of recent branches selects between, four predictions of next branch, updating just that prediction



Total size =  $2^m \times n \times \#$  prediction entries selected by branch address

# Accuracy of Different Schemes



#### Example

- Assume that d has values 0, 1, or 2 (alternating between 0, 2 as we enter this segment)
- Assume that the sequence will be executed repeatedly
- Ignore all other branches including those causing the sequence to repeat
- All branches are initially predicted to untaken state





With a single bit predictor

NT = Not Taken (*if* condition is false) T = Taken (*if* condition is true)

| d=? | b1<br>prediction | b1<br>action | New b1<br>prediction | b2<br>prediction | b2<br>action | New b2<br>prediction |
|-----|------------------|--------------|----------------------|------------------|--------------|----------------------|
| 2   | NT               | Т            | Т                    | NT               | Т            | Т                    |
| 0   | Т                | NT           | NT                   | Т                | NT           | NT                   |
| 2   | NT               | Т            | Т                    | NT               | Т            | Т                    |
| 0   | Т                | NT           | NT                   | Т                | NT           | NT                   |

All branches are mispredicted

12.

if (d==0)d=1;

if (d==1)



; branch b1 (d!=0) ; d==0, sp d=1

; branch b2 (d!=1)



#### With one bit predictor with one bit of correlation

#### (previous/predicition)

| d=? | b1<br>prediction    | b1<br>action | New b1<br>prediction | b2<br>prediction    | b2<br>action | New b2<br>prediction |
|-----|---------------------|--------------|----------------------|---------------------|--------------|----------------------|
| 2   | NT/ <mark>NT</mark> | Т            | NT/T                 | T/NT                | Т            | T/T                  |
| 0   | T/NT                | NT           | T/NT                 | NT/ <mark>NT</mark> | NT           | NT/NT                |
| 2   | NT/T                | Т            | NT/T                 | T/T                 | Т            | T/T                  |
| 0   | T/NT                | NT           | T/NT                 | NT/ <mark>NT</mark> | NT           | NT/NT                |

• Except for first iteration, all branches are correctly predicted





BNEZ DADDI L1: DSUBUI BNEZ

12.

R1, L1 R1, R0, #1 R3, R1, #1 R3, L2

; branch b1 (d!=0) ; d==0, sp d=1

; branch b2 (d!=1)

### **Tournament Predictors**

- Multilevel branch predictors use several levels of branch prediction tables together with an algorithm to choose among them
- Tournament selectors are the most popular form of multilevel branch predictors (e.g. DEC Alpha 21264)
- Tournament predictors combines both local and global predictor
- Selection between the two predictors are based on a selector (2bit counter)
- Make a transition with two wrong prediction using the current table for which the correct prediction would have been possible using the other predictor



#### Performance of Tournament Predictors



### **Cache & Memory**

- Why do designers need to know about Memory technology?
  - Processor performance is usually limited by memory bandwidth
  - As IC densities increase, lots of memory will fit on chip
- What are the different types of memory?
- How to maximize memory performance with least cost?

| Computer      |                           |
|---------------|---------------------------|
| <u>Memory</u> | Devices                   |
|               | Input                     |
|               | Output                    |
|               | Computer<br><u>Memory</u> |



**Problem:** Memory can be a bottleneck for processor performance **Solution:** Rely on memory hierarchy of faster memory to bridge the gap

## **Memory Hierarchy**

• Temporal Locality (Locality in Time):

 $\Rightarrow$  Keep most recently accessed data items closer to the processor

• Spatial Locality (Locality in Space):

 $\Rightarrow$  Move blocks consists of contiguous words to the faster levels

