HOME > Source
Codes > Dynamic Data Collection IR System
Update:
• Modify internal algorithm to reduce IO operations(2004-4-28)
• add action to help item (2004-4-27)
• Centre System Info Dialog when it shows (2004-4-27)
• Finish debuging "Coalesce sub collection" function (2004-4-26)
Download:
Detail of design can be viewed from HERE.
This Dynamic Data Collection Information Retrieval System uses the Ruters
Corpus Volume 1(RCV1) as the data collection. The detail of RCV1 can be
gotten from HERE
Download IRSystem.jar file from HERE
and test data from HERE
(This data is first 100M data from RCV1 and ONLY used for testing
this program. You can get 2 CDs from HERE)
FAQ:
• How to run the program?
Download the jar file from above link. then using "javaw
-jar IRSystem.jar" to run the program. Or you can download
the IRSystem2W.exe to run the program
(need the IRSystem.jar too)
• How to use the system?
1) Run the system using "javaw -jar IRSystem.jar"
or IRSystem2W.exe

2) Setting the data collection path and data info path
Tools > Setting..

Data collection path: path that stores the RCV1 data collection. (If you
have not gotten the RCV1, you can download first 100M testing file from
HERE).
Saving path: directory that store the IR System Info.
3) Adding files to system
Tools > Adding Files...
Adding...

comments: if the file you want to add exists in the system, the system
does not insert this file
4) Now, you can search in this system

Options Setting:
a) top number of topics setting
b) IDF & TF (now only have 1 IDF & TF)
c) Search in Group (check the sub collections)
5) View the system Infomation
Tools > System Info...

6) Remove file from system
Tools > Remove Files...
7) Coalesce Sub Data Collection
Because main idea in the system is using the overflow block. we can coalesce
the data collection to increase the search performance.
Tools > Coalesce Sub Col...
8) Initiate System
File > Init System ...

9) Setting the view of query results
View >
10) View the document content
double click the search result
|