Document Handler¶
During the scanning process, all files will be read from the Subversion Repository and it is checked if for a particular document type (decided by the extension) exists a special document handler.
This document handler will get the whole contents of the file and can do with it what it's like to do. The basic idea is to scan e.g. Word, Excel files, using 3rd party libraries like POI etc. to extract the text information from such kind of files. The scanned information will be stored in the index and can
be searched later.
If you have an idea for an particular document type just give me a hint about it and what kind of data will be of interest.