Predicting Author's Native Language Using Abstracts of Scholarly Papers

被引:1
作者
Baba, Takahiro [1 ]
Baba, Kensuke [2 ]
Ikeda, Daisuke [1 ]
机构
[1] Kyushu Univ, Fukuoka 8190395, Japan
[2] Fujitsu Labs, Kawasaki, Kanagawa 2118588, Japan
来源
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018) | 2018年 / 11177卷
关键词
Native language identification; Document classification; Text analysis; Machine learning;
D O I
10.1007/978-3-030-01851-1_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting author's attributes is useful for understanding implicit meanings of documents. The target problem of this paper is predicting author's native language for each document. The authors of this paper used surface-level features of documents for the problem and tried to clarify the practical tendencies of the writing style as word occurrences. They conducted a classification of the abstracts written in English of approximately 85,000 scholarly papers written in English or in Japanese. As a result of the experiment, the accuracy of the binary classification was 0.97, and they found that a number of distinctive phrases used in the classification were related to typical writing styles of Japanese.
引用
收藏
页码:448 / 453
页数:6
相关论文
共 6 条
  • [1] [Anonymous], 2010, OXFORD DICT ENGLISH
  • [2] Predicting Native Language from Gaze
    Berzak, Yevgeni
    Nakamura, Chie
    Flynn, Suzanne
    Katz, Boris
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 541 - 551
  • [3] Europe PMC, EUR PUBMED CENTR
  • [4] String Kernels for Native Language Identification: Insights from Behind the Curtains
    Ionescu, Radu Tudor
    Popescu, Marius
    Cahill, Aoife
    [J]. COMPUTATIONAL LINGUISTICS, 2016, 42 (03) : 491 - 525
  • [5] Paquette G., 2004, ENGLISH COMPOSITION
  • [6] Wong S.-M.J., 2011, Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, P1600