Local similarity and global variability characterize the semantic space of human languages

被引:5
|
作者
Lewis, Molly [1 ]
Cahill, Aoife [2 ]
Madnani, Nitin [3 ]
Evans, James [4 ,5 ]
机构
[1] Carnegie Mellon Univ, Psychol & Social & Decis Sci, Pittsburgh, PA 15213 USA
[2] Dataminr Inc, New York, NY 10016 USA
[3] Educ Testing Serv, Princeton, NJ 08541 USA
[4] Univ Chicago, Sociol & Data Sci, Chicago, IL 60637 USA
[5] Santa Fe Inst, Santa Fe, NM 87501 USA
关键词
human cognition; language; semantics; culture; communication; BODY; CATEGORIES; ENGLISH; COLOR; SPECIFICITY; SENSITIVITY; EVOLUTION; PATTERNS; MEANINGS; REFLECTS;
D O I
10.1073/pnas.2300986120
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
How does meaning vary across the world's languages? Scholars recognize the existence of substantial variability within specific domains, ranging from nature and color to kinship. The emergence of large language models enables a systems-level approach that directly characterizes this variability through comparison of word organization across semantic domains. Here, we show that meanings across languages manifest lower variability within semantic domains and greater variability between them, using models trained on both 1) large corpora of native language text comprising Wikipedia articles in 35 languages and also 2) Test of English as a Foreign Language (TOEFL) essays written by 38,500 speakers from the same native languages, which cluster into semantic domains. Concrete meanings vary less across languages than abstract meanings, but all vary with geographical, environmental, and cultural distance. By simultaneously examining local similarity and global difference, we harmonize these findings and provide a description of general principles that govern variability in semantic space across languages. In this way, the structure of a speaker's semantic space influences the comparisons cognitively salient to them, as shaped by their native language, and suggests that even successful bilingual communicators likely think with "semantic accents" driven by associations from their native language while writing English. These findings have dramatic implications for language education, cross-cultural communication, and literal translations, which are impossible not because the objects of reference are uncertain, but because associations, metaphors, and narratives interlink meanings in different, predictable ways from one language to another.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Variability in the Alignment of Number and Space Across Languages and Tasks
    Bender, Andrea
    Rothe-Wulf, Annelie
    Beller, Sieghard
    FRONTIERS IN PSYCHOLOGY, 2018, 9
  • [2] Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages
    Dautriche, Isabelle
    Mahowald, Kyle
    Gibson, Edward
    Piantadosi, Steven T.
    COGNITIVE SCIENCE, 2017, 41 (08) : 2149 - 2169
  • [3] The abessive in the Permian languages: similarity and difference in semantic structure
    Nekrasova, G. A.
    VESTNIK UGROVEDENIYA-BULLETIN OF UGRIC STUDIES, 2022, 12 (02): : 264 - 271
  • [4] Local-to-Global Cost Aggregation for Semantic Correspondence
    Wang, Zi
    Fu, Zhiheng
    Guo, Yulan
    Li, Zhang
    Yu, Qifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1209 - 1222
  • [5] Combining Local and Global Features Into a Siamese Network for Sentence Similarity
    Li, Yulong
    Zhou, Dong
    Zhao, Wenyu
    IEEE ACCESS, 2020, 8 (08): : 75437 - 75447
  • [6] Deep Category-Level and Regularized Hashing With Global Semantic Similarity Learning
    Chen, Yaxiong
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6240 - 6252
  • [7] Learning Local and Global Temporal Contexts for Video Semantic Segmentation
    Sun, Guolei
    Liu, Yun
    Ding, Henghui
    Wu, Min
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (10) : 6919 - 6934
  • [8] LGMamba: Large-Scale ALS Point Cloud Semantic Segmentation With Local and Global State-Space Model
    Li, Dilong
    Zhao, Jing
    Chang, Chongkei
    Chen, Ziyi
    Du, Jixiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [9] Fast Cross-Modal Hashing With Global and Local Similarity Embedding
    Wang, Yongxin
    Chen, Zhen-Duo
    Luo, Xin
    Li, Rui
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10064 - 10077
  • [10] Clustering clinical models from local electronic health records based on semantic similarity
    Goeg, Kirstine Rosenbeck
    Cornet, Ronald
    Andersen, Stig Kjaer
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 54 : 294 - 304