A Markov model of the Indus script

被引:22
作者
Rao, Rajesh P. N. [1 ]
Yadav, Nisha [2 ,3 ]
Vahia, Mayank N. [2 ,3 ]
Joglekar, Hrishikesh
Adhikarie, R. [4 ]
Mahadevan, Iravatham [5 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] Tata Inst Fundamental Res, Dept Astron & Astrophys, Bombay 400005, Maharashtra, India
[3] Ctr Excellence Basic Sci, Bombay 400098, Maharashtra, India
[4] Inst Math Sci, Madras 600113, Tamil Nadu, India
[5] Indus Res Ctr, Roja Muthiah Res Lib, Madras 600113, Tamil Nadu, India
关键词
ancient scripts; archaeology; linguistics; machine learning; statistical analysis;
D O I
10.1073/pnas.0906237106
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Although no historical information exists about the Indus civilization (flourished ca. 2600-1900 B.C.), archaeologists have uncovered about 3,800 short samples of a script that was used throughout the civilization. The script remains undeciphered, despite a large number of attempts and claimed decipherments over the past 80 years. Here, we propose the use of probabilistic models to analyze the structure of the Indus script. The goal is to reveal, through probabilistic analysis, syntactic patterns that could point the way to eventual decipherment. We illustrate the approach using a simple Markov chain model to capture sequential dependencies between signs in the Indus script. The trained model allows new sample texts to be generated, revealing recurring patterns of signs that could potentially form functional subunits of a possible underlying language. The model also provides a quantitative way of testing whether a particular string belongs to the putative language as captured by the Markov model. Application of this test to Indus seals found in Mesopotamia and other sites in West Asia reveals that the script may have been used to express different content in these regions. Finally, we show how missing, ambiguous, or unreadable signs on damaged objects can be filled in with most likely predictions from the model. Taken together, our results indicate that the Indus script exhibits rich synactic structure and the ability to represent diverse content. both of which are suggestive of a linguistic writing system rather than a nonlinguistic symbol system.
引用
收藏
页码:13685 / 13690
页数:6
相关论文
共 25 条
  • [1] [Anonymous], 1906, Izvestija Fiziko-Matematicheskogo Obshtestva Pri Kazanskom Universitete
  • [2] [Anonymous], P INT C AC SPEECH SI, DOI DOI 10.1109/ICASSP.1995.479394
  • [3] Bishop C.M., 2008, Pattern Recognition and Machine Learning: A Matlab Companion
  • [4] CHEN SF, 1995, TR1098 HARV U COMP S
  • [5] Cunningham Alexander., 1875, Archaeological Survey of India Report for the Year 1872-1873, VV.
  • [6] Drake A.W., 1967, FUNDAMENTALS APPL PR
  • [7] Farmer S., 2004, Electronic Journal of Vedic Studies, V11, P19, DOI DOI 10.11588/EJVS.2004.2.620
  • [8] Jelenik Frederick., 1997, Statistical Methods for Speech Recognition
  • [9] Graphical models
    Jordan, MI
    [J]. STATISTICAL SCIENCE, 2004, 19 (01) : 140 - 155
  • [10] Kenoyer JonathanMark., 1998, ANCIENT CITIES INDUS