A new metric for feature selection on short text datasets

dc.contributor.authorÇekik, Rasim
dc.contributor.authorUysal, Alper Kurşat
dc.date.accessioned2022-11-24T12:12:00Z
dc.date.available2022-11-24T12:12:00Z
dc.date.issued2022en_US
dc.departmentFakülteler, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.description.abstractIn recent years, short texts are everywhere, especially in social media networks. Short text classification is an essential task for various applications related to the operations on short text documents. In many cases, using the entire feature set causes the high dimensionality problem in short text data. This problem reason of time-consuming and negatively impacts the performance of classifiers. This study presents an effective feature selection algorithm called XY method, which represents the features on XY line and calculates the distance of a feature to the XY line. Also, a value named λ is calculated. According to this value, the terms are divided into different regions such as negative, positive, and third to determine their discrimination capability. The novel XY method aims to select as few terms as possible in the negative region. The proposed method is evaluated using four different short text datasets with Macro-F1 success measure. In comparisons with other existing feature selection algorithms such as chi-square, information gain, deviation from Poisson distribution, recently proposed max-min ratio, and distinguishing feature selector demonstrate that the XY method achieves either better or competitive performance in significantly reduced various feature sizes.en_US
dc.identifier.citationCekik, R., & Uysal, A. K. (2022). A new metric for feature selection on short text datasets. Concurrency and Computation: Practice and Experience, e6909.en_US
dc.identifier.doi10.1002/cpe.6909en_US
dc.identifier.issue13en_US
dc.identifier.orcid0000-0002-7820-413Xen_US
dc.identifier.scopus2-s2.0-85126295552
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1002/cpe.6909
dc.identifier.urihttps://hdl.handle.net/11503/2058
dc.identifier.volume34en_US
dc.identifier.wosWOS:000769577400001
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.institutionauthorÇekik, Rasim
dc.language.isoen
dc.publisherWILEYen_US
dc.relation.ispartofCONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCEen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectfeature selectionen_US
dc.subjectshort text classificationen_US
dc.subjecttext miningen_US
dc.titleA new metric for feature selection on short text datasetsen_US
dc.typeArticle

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Concurrency and Computation - 2022 - Cekik - A new metric for feature selection on short text datasets.pdf
Boyut:
2.85 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Full Text / Article

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: