A new metric for feature selection on short text datasets

Çekik, Rasim; Uysal, Alper Kurşat

doi:10.1002/cpe.6909

A new metric for feature selection on short text datasets

Dosyalar

Concurrency and Computation - 2022 - Cekik - A new metric for feature selection on short text datasets.pdf (2.85 MB)

Tarih

2022

Yazarlar

Çekik, Rasim

Uysal, Alper Kurşat

Yayıncı

WILEY

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In recent years, short texts are everywhere, especially in social media networks. Short text classification is an essential task for various applications related to the operations on short text documents. In many cases, using the entire feature set causes the high dimensionality problem in short text data. This problem reason of time-consuming and negatively impacts the performance of classifiers. This study presents an effective feature selection algorithm called XY method, which represents the features on XY line and calculates the distance of a feature to the XY line. Also, a value named λ is calculated. According to this value, the terms are divided into different regions such as negative, positive, and third to determine their discrimination capability. The novel XY method aims to select as few terms as possible in the negative region. The proposed method is evaluated using four different short text datasets with Macro-F1 success measure. In comparisons with other existing feature selection algorithms such as chi-square, information gain, deviation from Poisson distribution, recently proposed max-min ratio, and distinguishing feature selector demonstrate that the XY method achieves either better or competitive performance in significantly reduced various feature sizes.

Anahtar Kelimeler

feature selection, short text classification, text mining

Kaynak

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

WoS Q Değeri

Q3

Scopus Q Değeri

Q1

Cilt

34

Sayı

13

Künye

Cekik, R., & Uysal, A. K. (2022). A new metric for feature selection on short text datasets. Concurrency and Computation: Practice and Experience, e6909.

Bağlantı

https://doi.org/10.1002/cpe.6909
https://hdl.handle.net/11503/2058

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Bilgisiyar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A new metric for feature selection on short text datasets

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren