Entropy in different text types

被引:21
作者
Chen, Ruina [1 ,2 ]
Liu, Haitao [2 ,3 ]
Altmann, Gabriel [4 ]
机构
[1] Guizhou Univ, Dept Foreign Languages, Guiyang, Guizhou, Peoples R China
[2] Zhejiang Univ, Dept Linguist, Hangzhou 310058, Zhejiang, Peoples R China
[3] Zhejiang Univ, Ningbo Inst Technol, Hangzhou, Zhejiang, Peoples R China
[4] Ruhr Univ Bochum, Bochum, Germany
关键词
D O I
10.1093/llc/fqw008
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
The present investigation is an attempt to investigate how the unique linguistic profile of different text types can be reflected in their respective entropy characteristics. With samples from the Lancaster Corpus of Mandarin Chinese and the Freiburg-Brown corpus of American English, the research investigates entropy performances in two dimensions: the relative entropy of words and their partof- speech (POS) on different sentential positions, and entropy of aspect markers. Our research yields the following results: First, it shows a strikingly similar distribution pattern in Chinese and English concerning the relative entropy of wordforms and POS-forms on different sentential positions. The relative entropy of word-forms in descending order yields: news > essays > official > academic > fiction, and the POS-forms yields: fiction > essays > news > academic > official. The relative entropy of POS-forms may be a more reliable indicator of syntactical differences, which helps to distinguish dichotomous 'narrative vs. expository' text types in both Chinese and English. Second, there exists a cross-linguistic difference concerning entropy of aspect markers, namely, Chinese displays higher relative entropy than English. This indicates that aspect-marking in terms of variation is more prominent in Chinese grammar than in English. The 'narrative vs. expository distinction' is also identified by entropy of aspect markers in both Chinese and English, though more obviously in Chinese.
引用
收藏
页码:528 / 542
页数:15
相关论文
共 30 条
[1]  
Altmann G., 2015, Forms and Degrees of Repetition in Texts: Detection and Analysis
[2]   Entropy-based Assessment of Written Albanian Language [J].
alZahir, Saif ;
Borici, Arber .
JOURNAL OF QUANTITATIVE LINGUISTICS, 2011, 18 (01) :89-106
[3]  
[Anonymous], BIBLIO QUANTITATIVE
[4]  
Brown P. F., 1992, Computational Linguistics, V18, P31
[5]  
ESTEBAN MD, 1995, KYBERNETIKA, V31, P337
[6]  
Feng Z. W., 1994, CHINA INFORM PROCESS, V3, P34
[7]  
Feng Z. W., 1996, CHINESE INFORM PROCE, V2, P53
[8]  
Ferrer-i-Cancho R., 2002, J QUANT LINGUIST, V9, P35
[9]   Authorship Attribution Using Entropy [J].
Grabchak, M. ;
Zhang, Z. ;
Zhang, D. T. .
JOURNAL OF QUANTITATIVE LINGUISTICS, 2013, 20 (04) :301-313
[10]  
Gregory M. L., 2005, ENCY LANGUAGE LINGUI, V5, P683