Resume
Work Experiences
Course Projects
Source Codes
Relate Links
Contact Me
 
Copyright © 2004
Guang Huang
Hit Counter
Logo
 HomeResumeWork ExperiencesCourse ProjectsSource CodesLinkContact
 

HOME > Source Codes > Dynamic Data Collection IR System

Update:
• Modify internal algorithm to reduce IO operations(2004-4-28)
• add action to help item (2004-4-27)
• Centre System Info Dialog when it shows (2004-4-27)
• Finish debuging "Coalesce sub collection" function (2004-4-26)

Download:
Detail of design can be viewed from HERE.

This Dynamic Data Collection Information Retrieval System uses the Ruters Corpus Volume 1(RCV1) as the data collection. The detail of RCV1 can be gotten from HERE

Download IRSystem.jar file from HERE and test data from HERE (This data is first 100M data from RCV1 and ONLY used for testing this program. You can get 2 CDs from HERE)

FAQ:
How to run the program?
Download the jar file from above link. then using "javaw -jar IRSystem.jar" to run the program. Or you can download the IRSystem2W.exe to run the program (need the IRSystem.jar too)
How to use the system?
1) Run the system using "javaw -jar IRSystem.jar" or IRSystem2W.exe

2) Setting the data collection path and data info path
Tools > Setting..

Data collection path: path that stores the RCV1 data collection. (If you have not gotten the RCV1, you can download first 100M testing file from HERE).
Saving path: directory that store the IR System Info.

3) Adding files to system
Tools > Adding Files...

Adding...

comments: if the file you want to add exists in the system, the system does not insert this file

4) Now, you can search in this system

Options Setting:
a) top number of topics setting
b) IDF & TF (now only have 1 IDF & TF)
c) Search in Group (check the sub collections)

5) View the system Infomation
Tools > System Info...

6) Remove file from system
Tools > Remove Files...

7) Coalesce Sub Data Collection
Because main idea in the system is using the overflow block. we can coalesce the data collection to increase the search performance.
Tools > Coalesce Sub Col...

8) Initiate System
File > Init System ...

9) Setting the view of query results
View >

10) View the document content
double click the search result