Predicting Author's Native Language Using Abstracts of Scholarly Papers

被引：1

作者：

Baba, Takahiro ^{[1
]}

Baba, Kensuke ^{[2
]}

Ikeda, Daisuke ^{[1
]}

机构：

[1] Kyushu Univ, Fukuoka 8190395, Japan

[2] Fujitsu Labs, Kawasaki, Kanagawa 2118588, Japan

来源：

FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018) | 2018年 / 11177卷

关键词：

Native language identification; Document classification; Text analysis; Machine learning;

D O I：

10.1007/978-3-030-01851-1_43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting author's attributes is useful for understanding implicit meanings of documents. The target problem of this paper is predicting author's native language for each document. The authors of this paper used surface-level features of documents for the problem and tried to clarify the practical tendencies of the writing style as word occurrences. They conducted a classification of the abstracts written in English of approximately 85,000 scholarly papers written in English or in Japanese. As a result of the experiment, the accuracy of the binary classification was 0.97, and they found that a number of distinctive phrases used in the classification were related to typical writing styles of Japanese.

引用

页码：448 / 453

页数：6

共 6 条

[1] [Anonymous], 2010, OXFORD DICT ENGLISH
[2] Predicting Native Language from Gaze
Berzak, Yevgeni
Nakamura, Chie
Flynn, Suzanne
Katz, Boris
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 541 - 551
[3] Europe PMC, EUR PUBMED CENTR
[4] String Kernels for Native Language Identification: Insights from Behind the Curtains
Ionescu, Radu Tudor
Popescu, Marius
Cahill, Aoife
[J]. COMPUTATIONAL LINGUISTICS, 2016, 42 (03) : 491 - 525
[5] Paquette G., 2004, ENGLISH COMPOSITION
[6] Wong S.-M.J., 2011, Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, P1600

← 1 →