Effective Text Classification Through Supervised Rough Set-Based Term Weighting

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Mdpi

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

This research presents an innovative approach in text mining based on rough set theory. This study fundamentally utilizes the concept of symmetry from rough set theory to construct indiscernibility matrices and model uncertainties in data analysis, ensuring both methodological structure and solution processes remain symmetric. The effective management and analysis of large-scale textual data heavily relies on automated text classification technologies. In this context, term weighting plays a crucial role in determining classification performance. Particularly, supervised term weighting methods that utilize class information have emerged as the most effective approaches. However, the optimal representation of class-term relationships remains an area requiring further research. This study proposes the Rough Multivariate Weighting Scheme (RMWS) and presents its mathematical derivative, the Square Root Rough Multivariate Weighting Scheme (SRMWS). The RMWS model employs rough sets to identify information-carrying documents within the document-term-class space and adopts a computational methodology incorporating alpha, beta, and gamma coefficients. Moreover, the distribution of the term among classes is again effectively revealed. Comprehensive experimental studies were conducted on three different datasets featuring imbalanced-multiclass, balanced-multiclass, and imbalanced-binary class structures to evaluate the model's effectiveness. The results show that RMWS and its derivative SRMWS methods outperform existing approaches by exhibiting superior performance on balanced and unbalanced datasets without being affected by class imbalance and number of classes. Furthermore, the SRMWS method is found to be the most effective for SVM and KNN classifiers, while the RMWS method achieves the best results for NB classifiers. These results show that the proposed methods significantly improve the text classification performance.

Açıklama

Anahtar Kelimeler

text classification, term weighting, rough set, supervised learning, natural language processing

Kaynak

Symmetry-Basel

WoS Q Değeri

Scopus Q Değeri

Cilt

17

Sayı

1

Künye

Onay

İnceleme

Ekleyen

Referans Veren