Global-scale phylogenetic linguistic inference from lexical resources

被引:35
作者
Jaeger, Gerhard [1 ]
机构
[1] Tubingen Univ, Inst Linguist, Wilhelmstr 19, D-72074 Tubingen, Germany
关键词
SEQUENCE; EVOLUTION; EXPANSION; LANGUAGES; BIOLOGY; RECONSTRUCTION; BIOINFORMATICS; LIKELIHOOD; ALGORITHM; ALIGNMENT;
D O I
10.1038/sdata.2018.189
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two thirds of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training of an SVM classifier on expert cognacy judgments. Third, we defined two types of binary characters, based on automatically inferred cognate classes and on sound-class occurrences. Several tests are reported demonstrating the suitability of these characters for character-based phylogenetic inference.
引用
收藏
页数:16
相关论文
共 57 条
  • [1] [Anonymous], ADV LARGE MARGIN CLA
  • [2] [Anonymous], 2016, BOCHUMER LINGUISTISC
  • [3] [Anonymous], 2006, J QUANT LINGUIST
  • [4] [Anonymous], 1964, BEIJING DAXUE HANYU
  • [5] Curious parallels and curious connections - Phylogenetic thinking in biology and historical linguistics
    Atkinson, QD
    Gray, RD
    [J]. SYSTEMATIC BIOLOGY, 2005, 54 (04) : 513 - 526
  • [6] Languages evolve in punctuational bursts
    Atkinson, Quentin D.
    Meade, Andrew
    Venditti, Chris
    Greenhill, Simon J.
    Pagel, Mark
    [J]. SCIENCE, 2008, 319 (5863) : 588 - 588
  • [7] Bagga A., 1998, 36 ANN M ASS COMP LI, P79, DOI DOI 10.3115/980845.980859
  • [8] Sound-meaning association biases evidenced across thousands of languages
    Blasi, Damian E.
    Wichmann, Soren
    Hammarstroem, Harald
    Stadler, Peter F.
    Christiansen, Morten H.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (39) : 10818 - 10823
  • [9] Borcard D, 2011, USE R, P1, DOI 10.1007/978-1-4419-7976-6
  • [10] Bouchard-Cote A., 2013, P NATL ACAD SCI USA, V36, P141