Optimization
- Complex data structures
- Tables accessed in order are cache friendly
- But eventually lose to O()
- Trees - Jump around, but O(lg N)
- Hash - Random access, but O(1)
- Others...
- Lazy evaluation
- Don't forget cost of dirty bit
- Memory cost: up to 1 word
- Branch cost
- Misprediction 20ish cycles = 20-80 instructions
- 4x4 matrix multiply
- 16 *, 12 + = 28 instructions by most naive method
- 4 *, 12 MAD = 16 instructions if slightly smarter
- 4 *, 3 + = 7 instructions in SSE
- 1 *, 3 MAD = 4 instructions in SSE5
- Don't use branch to avoid work that's cheaper than branching
- Virtual inheritance
- Use:
- Pointer to object of base class
- Member functions based on concrete class
- Sphere, Polygon, Cone derived from Object class
- obj->render() calls Sphere::render(), Polygon::render() or Cone::render()
- How it's done
- Base class has a v-table with pointers to virtual functions
- Derived classes put different function pointers into the v-table
- obj->render() is likely misprediction on v-table lookup
- obj->render() is also likely instruction cache miss
- v-table locality isn't great: don't tend to use all of v-table cache block
- Alternatives
- Job list per derived type
- Explicit arrays per derived type
- Components instead of inheritance