This description outlines the SFA cache algorithm found in the version current as of June 9, 1998! There are some major differences from this to the one found in previous version! (Most notably the use of multi-dimensional fullAccum, fullCounts and subCounts arrays).
The new version of SFA uses a very similar caching algorithm as the old version of SFA did. The mechanism used is a two-pass sorted cache, with a strong emphasis placed on speed and the ability to easily exclude elements without adversely affecting performance. The cache is currently sorted using PackedGlyph.xsize as the sort key. Three arrays global to the DataSet class are used to maintain key data.
These arrays are:
fullCounts[timestepcount][256]: fullCounts is used to
keep track of the maximum number of data items which could be stored in
the cache at each key size.
fullAccum[timestepcount][256]: fullAccum is used to keep
track of the starting index within the cache of each key size.
subCounts[timestepcount][256]: subCounts is used to keep
track of the number of glyphs actually present in the cache at each key
size.
Before the first pass, the fullAccum[][] and fullCounts[][] arrays have all 256 of their elements set to 0 for each time step.
The first pass of the cache run is done at set load time. When the FileIO class loads the appropriate key (currently x-size), it makes a call to DataSet::setDimension with SFA_XSIZE as one of the parameters. When setDimension is called with this parameter, it makes a single pass loop of the entire xsize data (All other attributes do not cause a first-pass cache build. Note that this means that Xsize must be present in the .set file. A mechanism should be built to handle if this attribute does not exist).
The first pass maps each xsize element to it's cached value (with xsize, this is an 8 bit linear mapping). As each cache value is calculated, the associated fullCounts[tstep][] array index is incremented by one. (eg: If the currently examined x-size data has a value of 126, fullCounts[tstep][126] is incremented). This loop is executed once per glyph, regardless of the number of attributes associated with each glyph, making this step O(n). Note that this first pass will be executed once per timestep, as each timestep may have different xsize values.
Immediately after this first pass, the fullAccum[tstep][] array is built up using the data stored in fullCounts[tstep][]. fullCounts[tstep][] is used to store how many glyphs of each size are in the dataset, while fullAccum[tstep][] is used to maintain a start index for each size within the cache. For example, if fullCounts[tstep][0] = 30, fullCounts[tstep][1] = 40, fullCounts[tstep][2] = 30, etc, fullAccum[tstep][0] = 0, fullAccum[tstep][1] = 40, fullAccum[tstep][2] = 70, fullAccum[tstep][3] = 100, etc. This is a fixed 256 iteration loop, therefore O(256).
During each build of the full cache, various glyphs may be excluded
from the set to be drawn (they may not be in a subset, may be too small
for the desired resolution, etc). Each time a glyph is included in the
cache, it's associated subCounts[tstep][] array index is incremented. The
index of each glyph in the cache is then calculated by:
index = fullAccum[tstep][xsize] + subCounts[tstep][xsize]++;
This mechanism allows us at any given time to know how many of each size glyph are present in the data set, and how many are present in the cache.
The draw mechanism of the cache is a fairly simple nested loop. The
outer index of the loop is the xsize, counted down from 255 to 0 (the loop
is done in decreasing order to facilitate the drawing of only glyphs that
exceed a certain size, or to draw as many glyphs as we can in a certain
time frame). The inner loop index is from 0 to subCounts[tstep][xsize].
That is, for each xsize to be drawn, draw all the glyphs currently in the
cache for that particular xsize, again using the formula:
index = fullAccum[tstep][outerLoopIndex] + innerLoopIndex;
to calculate the index within the cache to draw.
See the draw documentation page for more
information on how the draw looping works.
As a small example, here is a very limited set of data, along with the
values of the fullCounts, subCounts and fullAccum arrays. This is for a
single timestep only. The set consists of 15 glyphs, 10 of which we want
in the draw cache, 5 of which we don't. (The data values given here are
the x-size values, the only attribute relative to cache indexing).
So in our example, when we came to drawing xsize == 5, we would have our inner loop going from 0 to 0 (subCounts[5] - 1, hence we only draw one glyph), with index 13 (fullAccum[5]).
Using the same example, when we came to draw xsize == 2, we would have our inner loop going from 0 to 3, starting at index 3 in the cache (therefore we would draw from index 3 to index 6 in the cache).
Some of this may be a bit vague. If it is, please let me know where
I can make improvements and I will do so. If there is anything I have left
out, please feel free to ask and I will attempt to explain it (and add
it to the document).