Accelerating Text Mining Using Domain-Specific Stop Word Lists

« (…) In this paper, we present a novel mathematical approach for the automatic extraction of domain-specific words called the hyperplane-based approach. This new approach depends on the notion of low dimensional representation of the word in vector space and its distance from hyperplane. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. We compare the hyperplane-based approach with other feature selection methods, namely \c{hi}2 and mutual information. (…) »

source > arxiv.org, Farah Alshanik, Amy Apon, Alexander Herzog, Ilya Safro, Justin Sybrandt, 18 novembre 2020

Accueil