Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits

被引:11
作者
Berube, Nicolas [1 ]
Sainte-Marie, Maxime [1 ]
Mongeon, Philippe [2 ]
Lariviere, Vincent [1 ,3 ]
机构
[1] Univ Montreal, Ecole Bibliothecon & Sci Informat, CP 6128,Succ Ctr Ville, Montreal, PQ H3C 3J7, Canada
[2] Leiden Univ, Ctr Sci & Technol Studies, POB 905, NL-2300 AX Leiden, Netherlands
[3] Univ Quebec Montreal, CIRST, OST, CP 8888,Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
关键词
SCIENCE; GROWTH; LAW; LINGUISTICS; STATISTICS; DYNAMICS; ARTICLES; LANGUAGE; LIBRARY;
D O I
10.1371/journal.pone.0197775
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both head and tail fits of zipfian word distributions are more independent of corpus size than other lexical diversity indicators, the latter however neatly outperforms the former in that regard. This benchmark-setting performance of zipfian distribution tails proves extremely handy in distinguishing actual patterns in lexical diversity from the statistical noise generated by other indicators due to corpus size fluctuations. From an empirical perspective, analysis of Web of Science (WoS) article titles from 1975 to 2014 shows that the lexical concentration of scholarly titles in Natural Sciences & Engineering (NSE) and Social Sciences & Humanities (SSH) articles increases by a little less than 8% over the whole period. With the exception of the lexically concentrated Mathematics, Earth & Space, and Physics, NSE article titles all increased in lexical concentration, suggesting a probable convergence of concentration levels in the near future. As regards to SSH disciplines, aggregation effects observed at the disciplinary group level suggests that, behind the stable concentration levels of SSH disciplines, a cross-disciplinary homogenization of the highest word frequency ranks may be at work. Overall, these trends suggest a progressive standardization of title wording in scientific article titles, as article titles get written using an increasingly restricted and cross-disciplinary set of words.
引用
收藏
页数:31
相关论文
共 150 条
[21]   Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references [J].
Bornmann, Lutz ;
Mutz, Ruediger .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (11) :2215-2222
[22]  
BRAAM RR, 1991, J AM SOC INFORM SCI, V42, P233, DOI 10.1002/(SICI)1097-4571(199105)42:4<233::AID-ASI1>3.0.CO
[23]  
2-I
[24]   FREQUENCY-RANK DISTRIBUTIONS [J].
BROOKES, BC ;
GRIFFITHS, JM .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1978, 29 (01) :5-13
[25]   VARIATION IN INFORMATION-CONTENT OF TITLES OF RESEARCH PAPERS WITH TIME AND DISCIPLINE [J].
BUXTON, AB ;
MEADOWS, AJ .
JOURNAL OF DOCUMENTATION, 1977, 33 (01) :46-52
[26]   FROM TRANSLATIONS TO PROBLEMATIC NETWORKS - AN INTRODUCTION TO CO-WORD ANALYSIS [J].
CALLON, M ;
COURTIAL, JP ;
TURNER, WA ;
BAUIN, S .
SOCIAL SCIENCE INFORMATION SUR LES SCIENCES SOCIALES, 1983, 22 (02) :191-235
[27]   Least effort and the origins of scaling in human language [J].
Cancho, RFI ;
Solé, RV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (03) :788-791
[28]   DIVERSITY OF VOCABULARY AND THE HARMONIC SERIES LAW OF WORD-FREQUENCY DISTRIBUTION [J].
Carroll, J. B. .
PSYCHOLOGICAL RECORD, 1938, 2 (16) :379-386
[29]  
Carroll JB, 1967, COMPUTATIONAL ANAL P
[30]  
Carroll JB, 1969, RATIONALE ASYMPTOTIC