Automatic translation, context, and supervised learning in comparative politics

被引:10
作者
Courtney, Michael [1 ]
Breen, Michael [2 ]
McMenamin, Iain [2 ]
McNulty, Gemma [3 ]
机构
[1] Cent Stat Off, Cork, Ireland
[2] Dublin City Univ, Sch Law & Govt, Polit, Dublin, Ireland
[3] Univ Coll Dublin, Clinton Inst, Dublin, Ireland
关键词
automatic translation; supervised learning; machine learning; text analysis; political communications; TEXT; POSITIONS; WORDS; FRAME;
D O I
10.1080/19331681.2020.1731245
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
This paper proves that automatic translation of multilingual newspaper documents deters neither human nor computer classification of political concepts. We show how theory-driven coding of newspaper text can be automated in several languages by monolingual researchers. Supervised machine learning is successfully applied to text in English from British, Spanish, and German sources. The paper has three main findings. First, results from human coding directly in a foreign language do not differ from coding computer-translated text. Second, humans can code translated text as well as they can code untranslated prose in their mother tongue. Third, machine learning based on translated Spanish and German training sets can reproduce human coding as accurately as a system learning from English training sets.
引用
收藏
页码:208 / 217
页数:10
相关论文
共 24 条
[1]  
Alexandrova P., 2014, EUROPEAN COUNCIL EUR, P53
[2]   Comparative studies of policy agendas [J].
Baumgartner, Frank R. ;
Green-Pedersen, Christoffer ;
Jones, Bryan D. .
JOURNAL OF EUROPEAN PUBLIC POLICY, 2006, 13 (07) :959-974
[3]  
de Vreese C., 2017, COMP POLITICAL JOURN
[4]   No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications [J].
de Vries, Erik ;
Schoonvelde, Martijn ;
Schumacher, Gijs .
POLITICAL ANALYSIS, 2018, 26 (04) :417-430
[5]  
Denny M. J., 2017, 75 ANN M MIDW POL SC
[6]   What Predicts the Game Frame? Media Ownership, Electoral Context, and Campaign News [J].
Dunaway, Johanna ;
Lawrence, Regina G. .
POLITICAL COMMUNICATION, 2015, 32 (01) :43-60
[7]  
Eggers A., 2011, PARTISAN CONVE UNPUB
[8]   Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts [J].
Grimmer, Justin ;
Stewart, Brandon M. .
POLITICAL ANALYSIS, 2013, 21 (03) :267-297
[9]   Computer-Assisted Topic Classification for Mixed-Methods Social Science Research [J].
Hillard, Dustin ;
Purpura, Stephen ;
Wilkerson, John .
JOURNAL OF INFORMATION TECHNOLOGY & POLITICS, 2008, 4 (04) :31-46
[10]  
Krippendorff K., 2013, Content analysis, V3rd