Data extraction from machine-translated versus original language randomized trial reports: a comparative study

被引:44
作者
Balk, Ethan M. [1 ]
Chung, Mei [1 ]
Chen, Minghua L. [1 ]
Chang, Lina Kong Win [1 ]
Trikalinos, Thomas A. [2 ]
机构
[1] Inst Clin Res & Hlth Policy Studies, Tufts Evidence Based Practice Ctr, 800 Washington St,Box 63, Boston, MA 02111 USA
[2] Brown Univ, Ctr Evidence Based Med, Providence, RI 02912 USA
基金
美国医疗保健研究与质量局;
关键词
Data extraction; Machine translation; Randomized controlled trials; Systematic review;
D O I
10.1186/2046-4053-2-97
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: Google Translate offers free Web-based translation, but it is unknown whether its translation accuracy is sufficient to use in systematic reviews to mitigate concerns about language bias. Methods: We compared data extraction from non-English language studies with extraction from translations by Google Translate of 10 studies in each of five languages (Chinese, French, German, Japanese and Spanish). Fluent speakers double-extracted original-language articles. Researchers who did not speak the given language double-extracted translated articles along with 10 additional English language trials. Using the original language extractions as a gold standard, we estimated the probability and odds ratio of correctly extracting items from translated articles compared with English, adjusting for reviewer and language. Results: Translation required about 30 minutes per article and extraction of translated articles required additional extraction time. The likelihood of correct extractions was greater for study design and intervention domain items than for outcome descriptions and, particularly, study results. Translated Spanish articles yielded the highest percentage of items (93%) that were correctly extracted more than half the time (followed by German and Japanese 89%, French 85%, and Chinese 78%) but Chinese articles yielded the highest percentage of items (41%) that were correctly extracted >98% of the time (followed by Spanish 30%, French 26%, German 22%, and Japanese 19%). In general, extractors' confidence in translations was not associated with their accuracy. Conclusions: Translation by Google Translate generally required few resources. Based on our analysis of translations from five languages, using machine translation has the potential to reduce language bias in systematic reviews; however, pending additional empirical data, reviewers should be cautious about using translated data. There remains a trade-off between completeness of systematic reviews (including all available studies) and risk of error (due to poor translation).
引用
收藏
页数:6
相关论文
共 9 条
[1]  
Balk E.M, 2012, AHRQ PUBLICATION
[2]  
Balk EM, 2013, AHRQ PUBLICATION
[3]   Language bias in randomised controlled trials published in English and German [J].
Egger, E ;
ZellwegerZahner, T ;
Schneider, M ;
Junker, C ;
Lengeler, C ;
Antes, G .
LANCET, 1997, 350 (9074) :326-329
[4]  
Freitas De Souza R, 2009, 17 COCHR C 2009 OCT
[5]   Data extraction errors in meta-analyses that use standardized mean differences [J].
Gotzsche, Peter C. ;
Hrobjartsson, Asbjorn ;
Maric, Katja ;
Tendal, Britta .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2007, 298 (04) :430-437
[6]   Systematic review data extraction: cross-sectional study showed that experience did not increase accuracy [J].
Horton, Jennifer ;
Vandermeer, Ben ;
Hartling, Lisa ;
Tjosvold, Lisa ;
Klassen, Terry P. ;
Buscemi, Nina .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2010, 63 (03) :289-298
[7]  
Institute of Medicine, 2011, FIND WHAT WORKS HLTH
[8]   High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews [J].
Jones, AP ;
Remmington, T ;
Williamson, PR ;
Ashby, D ;
Smyth, RL .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2005, 58 (07) :741-742
[9]  
Trikalinos TA, 2008, DATA EXTRACTION ERRO