Optimization
  - Complex data structures
  
    - Tables accessed in order are cache friendly
- But eventually lose to O()
    
      - Trees - Jump around, but O(lg N)
      
- Hash - Random access, but O(1)
- Others...
 
 
- Lazy evaluation
  
    - Don't forget cost of dirty bit
- Memory cost: up to 1 word
- Branch cost
    
      - Misprediction 20ish cycles = 20-80 instructions
- 4x4 matrix multiply
      
        - 16 *, 12 + = 28 instructions by most naive method
- 4 *, 12 MAD = 16 instructions if slightly smarter
- 4 *, 3 + = 7 instructions in SSE
- 1 *, 3 MAD = 4 instructions in SSE5
 
- Don't use branch to avoid work that's cheaper than branching
 
 
- Virtual inheritance
  
    - Use:
    
      - Pointer to object of base class
- Member functions based on concrete class
- Sphere, Polygon, Cone derived from Object class
- obj->render() calls Sphere::render(), Polygon::render() or Cone::render()
 
- How it's done
    
      - Base class has a v-table with pointers to virtual functions
- Derived classes put different function pointers into the v-table
- obj->render() is likely misprediction on v-table lookup
- obj->render() is also likely instruction cache miss
- v-table locality isn't great: don't tend to use all of v-table cache block
 
- Alternatives
    
      - Job list per derived type
- Explicit arrays per derived type
- Components instead of inheritance