Rough membership; User preference-based document categorization; Discernibility of words
One of the problems that plague document ranking is the inherent ambiguity, which arises due to the nuances of natural language. Though two documents may contain the same set of words, their relevance may be very different to a single user, since the context of the words usually determines the relevance of a document. Context of a document is very difficult to model mathematically other than through user preferences. Since it is difficult to perceive all possible user interests a priori and install filters for the same at the server side, we propose a rough-set-based document filtering scheme which can be used to build customized filters at the user end. The documents retrieved by a traditional search engine can then be filtered automatically by this agent and the user is not flooded with a lot of irrelevant material. A rough-set-based classificatory analysis is used to learn the user's bias for a category of documents. This is then used to filter out irrelevant documents for the user. To do this we have proposed the use of novel rough membership functions for computing the membership of a document to various categories.