Volume 5, Issue 12



Abstract : Fuzzy document representation involves transforming the unstructured data into numerical vectors. Such a representation is more useful for text classification and document clustering. The proposed Fuzzy Conceptualization Model (FCM) performs conceptualization and provides a better data representation model on the basis of semantic relatedness and similarity between terms in a word corpus. Word embedding is used to hold the semantically related words in a concept cluster. The concept clusters are inferred and vectored for the given corpus to hold the data in a multidimensional space. FCM determines the fuzzy membership value of a base term by calculating the affinity score between its corresponding word embedding and other word embeddings. A weighing scheme is used to distinguish between exact and approximate matches. The greatest bound for the distribution of base set over the documents gives the best matched documents for a search query. The exact and approximate matches are differentiated by considering the normalized term frequency of a term in the specified concept cluster along with its actual presence. The resultant matrix gives a lower dimensional and discriminated representation of data. Moving with the data points having discriminated and non discriminated nature over an affine vector leads to the clustering of them well with proper anchoring of them with the previous mile stones of each data points. The proposed model is useful for the retrieval of information with short and vague keywords. The experimental analysis of the FCM on synthetic and real data sets shows high accuracy in results.

Pages : 12-20

Downloads : 1

Publication Date :

Modified Date : 2018-12-31

Cite/Export :

SIJIN P , DR. CHAMPA H. N. , "FUZZY DOCUMENT REPRESENTATION FOR SEARCH DIVERSIFICATION", IJIERT - International Journal of Innovations in Engineering Research and Technology, Volume 5, Issue 12, ISSN : 2394-3696, Page No. 12-20