Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis

被引:94
作者
Garten, Justin [1 ]
Hoover, Joe [1 ]
Johnson, Kate M. [1 ]
Boghrati, Reihane [1 ]
Iskiwitch, Carol [1 ]
Dehghani, Morteza [1 ]
机构
[1] Univ Southern Calif, Computat Social Sci Lab, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
Methodological innovation; Text analysis; Semantic representation; Dictionary-based text analysis; INFORMATION; SIMILARITY;
D O I
10.3758/s13428-017-0875-9
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
引用
收藏
页码:344 / 361
页数:18
相关论文
共 69 条
  • [61] Socher R, 2013, P 2013 C EMP METH NA, P935
  • [62] Stone P., 1968, Journal of Regional Science, V8, P113, DOI DOI 10.1111/J.1467-9787.1968.TB01290.X
  • [63] Tai KS, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1556
  • [64] The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods
    Tausczik, Yla R.
    Pennebaker, James W.
    [J]. JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY, 2010, 29 (01) : 24 - 54
  • [65] Turney PD, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P417
  • [66] Similarity of semantic relations
    Turney, Peter D.
    [J]. COMPUTATIONAL LINGUISTICS, 2006, 32 (03) : 379 - 416
  • [67] TVERSKY A, 1977, PSYCHOL REV, V84, P327, DOI 10.1037/h0026750
  • [68] Watson D., 1994, PANAS X MANUAL POSIT, DOI [10.17077/48vt-m4t2, DOI 10.17077/48VT-M4T2]
  • [69] THE AUTOMATIC IDENTIFICATION OF STOP WORDS
    WILBUR, WJ
    SIROTKIN, K
    [J]. JOURNAL OF INFORMATION SCIENCE, 1992, 18 (01) : 45 - 55