Using generative adversarial network to improve the accuracy of detecting AI-generated tweets

被引：1

作者：

Hui, Yang ^{[1
]}

机构：

[1] Zhengzhou Shengda Univ, Sch Humanities & Law, Zhengzhou 451191, Henan, Peoples R China

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

Artificial Intelligence; AI-generated tweets; Generative adversarial network; Random forest; Text analysis;

D O I：

10.1038/s41598-024-78601-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This paper provides a novel approach using state-of-the-art generative Artificial Intelligence (AI) models to enhance the accuracy of machine learning methods in detecting AI-generated texts; the underlying generative capabilities are used along with ensemble-based learning methods for the exact characterization of created text attributes. Four basic steps are involved in the proposed methodology. The first step of the text process is the preprocessing stage itself consisting of several steps for the purification of irrelevant data. These stages include noise removal, text tokenization, removal of stop-words, word normalization, and handling uncommon words. In the next step, feature engineering and text representations are done whereby every preprocessed text is represented by a square matrix. This matrix encapsulates data about word correlations, cooccurrence, and word weights. The third step is Generative Adversarial Network (GAN)-based feature extraction, using a GAN model to extract efficient features in classifying the texts based on their creator type. After that, it turns the discriminator part into a strong feature extraction model. The fourth step is weighted Random Forest (RF)-based detection, with the features extracted by the discriminator of GAN serving as input to the RF-based detection model. This approach has covered the differences between texts generated by a human and that generated by Artificial Intelligence, with a significant improvement of 99.60% average accuracy, representing a 1.5% improvement against comparative methods.

引用

页数：16

共 23 条

[1]

Abburi H, 2023, Arxiv, DOI [arXiv:2311.03084, DOI 10.48550/ARXIV.2311.03084, 10.48550/arXiv.2311.03084]

[2]

Aguilar-Canto F., 2023, GPT-2 versus GPT-3 and Bloom: LLMs for LLMs Generative Text Detection

[3]

Akram Arslan, 2023, arXiv, DOI DOI 10.48550/ARXIV.2310.01423

[4]

An B., 2023, Int. J. Asian Lang. Process, V33, P2330002, DOI [10.1142/S2717554523300025, DOI 10.1142/S2717554523300025]

[5] Modeling thin layers of analytes on substrates for spectral analysis: use of solid/liquid n and k values to model reflectance spectra [J].

Bernacki, Bruce E. ;

Johnson, Timothy J. ;

Myers, Tanya L. .

OPTICAL ENGINEERING, 2020, 59 (09)

[6]

Bhattacharjee Amrita, 2024, ACM SIGKDD Explorations Newsletter, V25, P14, DOI 10.1145/3655103.3655106

[7]

Daniya T., 2020, ADV MATH SCI J, V9, P1857, DOI [DOI 10.37418/AMSJ.9.10.53, 10.37418/amsj.9.10.53]

[8]

Ghosal S. S., 2023, A Survey on the Possibilities & Impossibilities of AI-generated Text Detection

[9]

Ghosal SS, 2023, Arxiv, DOI [arXiv:2310.15264, DOI 10.48550/ARXIV.2310.15264, 10.48550/arXiv.2310.15264]

[10]

Ghosal Soumya Suvra, 2023, arXiv

← 1 2 3 →