共 4 条
cqp4rdf: Towards a Suite for RDF-Based Corpus Linguistics
被引:2
作者:
Ionov, Maxim
[1
]
Stein, Florian
[1
]
Sehgal, Sagar
[2
]
Chiarcos, Christian
[1
]
机构:
[1] Goethe Univ Frankfurt, Appl Computat Linguist Lab, Frankfurt, Germany
[2] Indian Inst Informat Technol, Sri City, India
来源:
SEMANTIC WEB: ESWC 2020 SATELLITE EVENTS
|
2020年
/
12124卷
关键词:
Linguistic linked data;
Corpus linguistics;
SPARQL;
CQP;
D O I:
10.1007/978-3-030-62327-2_20
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
In this paper, we present cqp4rdf, a set of tools for creating and querying corpora with linguistic annotations. cqp4rdf builds on CQP, an established corpus query language widely used in the areas of computational lexicography and empirical linguistics, and allows to apply it to corpora represented in RDF. This is in line with the emerging trend of RDF-based corpus formats that provides several benefits over more traditional ways, such as support for virtually unlimited types of annotation, linking of corpus elements between multiple datasets, and simultaneously querying distributed language resources and corpora with different annotations. On the other hand, application support tailored for such corpora is virtually nonexistent, leaving corpus linguist with SPARQL as the query language. Being extremely powerful, it has a relatively steep learning curve, especially for people without computer science background. At the same time, using query languages designed for classic corpus management software limits the vast possibilities of RDF-based corpora. We present the middle ground aiming to bridge the gap: the interface that allows to query RDF corpora and explore the results in a linguist-friendly way.
引用
收藏
页码:115 / 121
页数:7
相关论文