Corpus Tools and Methods, Today and Tomorrow: Incorporating Linguists' Manual Annotations

被引:8
作者
Smith, Nicholas [1 ]
Hoffmann, Sebastian [2 ]
Rayson, Paul [3 ]
机构
[1] Univ Salford, Sch English Sociol Polit & Contemporary Hist, Manchester M5 4WT, Lancs, England
[2] Univ Lancaster, Dept Linguist & English Language, Lancaster, England
[3] Univ Lancaster, Dept Comp, Lancaster, England
来源
LITERARY AND LINGUISTIC COMPUTING | 2008年 / 23卷 / 02期
关键词
D O I
10.1093/llc/fqn004
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Today's corpus tools offer the user a wide range of features that greatly facilitate the linguistic analysis of large amounts of authentic language data ( e. g. frequency distributions, collocations, keywords, etc.). However, these tools typically fail to address the fundamental need of the linguist to add interpretive information to a concordance or query result, by coding individual concordance lines for structural, functional, discoursal, and other features in a flexible way. The ability to add such qualitative data is indispensable to a fuller understanding of the phenomenon under investigation as it allows the linguist to produce more rigorous descriptions-and theories-about language in use. Our article has two aims: first, to assess the merits and drawbacks of existing solutions, by surveying what can be achieved using state-of-the-art corpus tools and generic database software; second, we draw up a set of desiderata and recommendations for the incorporation of flexible encoding features into future corpus tools. We describe an initial step in this direction, with a recent enhancement to the BNCweb corpus analysis software. More generally, we hope our suggestions will lead to linguists and software developers working together more closely to ensure that the needs of the former are provided for by the available technology.
引用
收藏
页码:163 / 180
页数:18
相关论文
共 32 条
[1]   Text analysis software: Commonalities, differences and limitations: The results of a review [J].
Alexa, M ;
Zuell, C .
QUALITY & QUANTITY, 2000, 34 (03) :299-321
[2]  
ARI O, 2006, LANGUAGE LEARNING TE, V10, P30
[3]  
Biber D., 1998, CORPUS LINGUISTICS, DOI DOI 10.1017/CBO9780511804489
[4]  
CARLETTA J, 2005, CORPUS LINGUISTICS R, P449
[5]  
Christ Oliver, 1994, P COMPLEX 94 BUD, V10, P23
[6]  
DEHAAN P, 1984, RECENT DEV USE COMPU, P123
[7]  
EVERT S, 2005, CQP QUERY LANG UNPUB
[8]  
Hans-Martin Lehmann, 2000, CORPORA GALORE ANAL, P259
[9]  
HOCKEY S, 2001, CORPUS LINGUISTICS N, P76
[10]  
Hoffmann S., 2006, CORPUS TECHNOLOGY LA, V3, P177