EXplain SEmantic Caching and what has been done so far ....


What are the things required for semantics caching ?\\
idea of semantic caching is very simple. along with the query results the query itself is stored in the cache. when a new query enters the system an attempt is first made to see if the answer to the query can be obtained from the results already present in the cache. this check is made by evaluating the new query against the queries already present in the cache. there are essentially three aspects to semantic caching 
1. determining if the new query can be answered from a set of existing queries in the cache.
2. actually answering the query ( satisfying the request ) using local cache and / or remote database.
3. managing the cache so as to maximize the probability of satisfying a new request from the existing cache and this enhancing the query response time.\\

Since any query on a relational database yields a set of tuples having a set of attributes, we can visualize the query result to be a rectangle . As we keep caching the queries . the cache contains a set of rectangles possibly overlapping. size of the rectangles denote the size of the result. let each rectangle be assigned a metric . a positive number denoting the importance of that rectangle for future queries. the higher the number the greater is the likelihood of that rectangle being accessed . the metric could be based on a multiple factors or could be a composite index calculated by taking into account numerous factors like hit rate , Manhattan distance , LRU, probability of future access.\\
now our problem crystallizes to finding a set of rectangles with the largest possible sum such that their size does not exceed the capacity of the cache and the number of such rectangles are bounded by a predetermined size. 
This problem can be transforming into an equivalent problem in the graph domain by the following mapping. Let each query and hence each rectangle be represented by a node. the metric of the rectangle is transformed into the weight of the corresponding node. Let the edge between nodes represent the intersection between corresponding rectangles . the graph thus created is an intersection/interval graph. 
The new problem definition is as follows.

Lot of prior work has been done in this area.


due to the limited cache size there is an upper bound to the amount of data that can be stored in the cache. for efficient caching a replacement strategy has to be devised that will determine what data to be kept in the cache and what data to be replaced with new data , the aim of any replacement strategy should be to maximize the probability of cache hits and reduce the data communication 
1. determination 
in case of semantic regions the objectiove of the cache replacement strategy is to determine what regions to keep in cache.
2. cache maintainence
the queries have to be updated accordingly , so that they accurately describe the data present in the cache.

2 way process: using the query to determine the contents of the cache, using replacement startegy to manage data, and using this data manage the queries stored to reflect the data accurately.


Algorithm,

the problem at hand is a 0-1 knapsack problem with an additional exclusivity constraint that some objects may not be selected if some other object is selected. this problem can be frramed as :
an exact solution to this problem can be obtained by using dynamic programming problem formaulation. the problem fomulation is as follows. (..) 
the complexity of this problem is O(mW) i.e pseupopolynomial. if W is arbitrarily large, the solution becomes unwieldly. hence we come up with a new heuristic to solve this problem in polynomial time. ew simplify the problem by relaxing the constraints on the interger variables. Instead of X taking values from {0,1} , it can take any vaule betweeen 0 and 1 , both inclusive  

the problem therefore is transformend in the graph domain as follows. 
for each semanntic region do the following. 
create a new node.
assign it a value ( metric / profit ) and a weight ( cost function / size of the region ) draw an edge between thier corresponding nodes if the segments intersect each other. 
the segments are projected on the x-axis by reducing their dimension by 1. we now have a set of interval of at most 1 dimension and the problem is to find .... ?
each interval has a weight and a value associated with it. their objective is to find at max K-intervals such that their sum is maximized. 

Explain - subset sum problem + iterate till K is reached. Its complexity  ...
It is easy to may this problem in the graph domain.