Automatic discovery of similar words by substitute vectors
1Yildiz Technical University, Computer Engineering Department, Esenler-ISTANBUL
Sigma J Eng Nat Sci 2016; 34(1): 125-133
Abstract
Patterns between words are generally used for automatic information extraction. However, the patterns can only find related words close to each other. In this study, a method based on substitute vectors can overcome of this difficulty. Firstly, the word sets having the same substitute vector are constructed. Then, similar word sets are obtained according to the number of co-occurring sets. In this sets, semantically relatedness ratio is above 70%. The proposed method is unsupervised. Because, it does not require any seed words manually labeled.
Keywords: Automatic information extraction, discovery of similar words, synonyms, near-synonyms, substitute vectors, natural language processing, artificial intelligence.