Using Syllables As Indexing Terms in Full-Text Information Retrieval

被引:5
|
作者
Kettunen, Kimmo [1 ]
Mcnamee, Paul [2 ]
Baskaya, Feza [3 ]
机构
[1] Kymenlaakso Univ Appl Sci, FIN-45100 Kouvola, Finland
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
[3] Univ Tampere, Tampere, Finland
来源
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE | 2010年 / 219卷
关键词
full-text information retrieval; syllables; management of word form variation; syllables as index terms;
D O I
10.3233/978-1-60750-641-6-225
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes empirical results of information retrieval in 13 languages of the Cross Language Evaluation Forum (CLEF) collection augmented with results of Turkish using syllables as a means to manage morphological variation in the languages. This kind of approach has been used in speech retrieval [1], but for some reason it has not been much tried out in text-based IR, although it has many clear advantages. Firstly, a quite well working version of it can be implemented with a very simple syllabification algorithm, consisting of only variants of one syllable structure rule, CV, consonant vowel. Secondly, although syllable-based word form variation management resembles n-gramming [2], it has the advantage, that the number of grams with syllables is more restricted which keeps the size of the text index smaller and retrieval faster. Thirdly, syllable-based approach makes possible to use different types of syllabification procedures, which can be either very fine grained, i.e. language specific or very coarse, i.e. more language independent. Fourthly, syllable based methods work for both speech and text retrieval. Our results show, that the two different CV syllabification procedures produced good results with four morphologically complex languages of the CLEF collection. For Turkish they produced also good results. For three of the languages that got good results with the CV syllabification (De, Fi and Tu), we tried also language specific, accurate syllabification procedures. Accurate syllabification was not able to produce as good IR results as CV procedures, but it was not far behind in performance.
引用
收藏
页码:225 / 232
页数:8
相关论文
共 50 条
  • [1] Automated indexing for full-text information retrieval
    Berrios, DC
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2000, : 71 - 75
  • [2] FULL-TEXT INFORMATION RETRIEVAL
    FAY, RJ
    LAW LIBRARY JOURNAL, 1971, 64 (02): : 167 - 175
  • [3] A novel full-text indexing model for Chinese text retrieval
    Zhou, SG
    Hu, YF
    Hu, JT
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 2001, 2113 : 370 - 379
  • [4] Full-text information retrieval: Introduction
    Sievert, MC
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (04): : 261 - 262
  • [5] Expanded information retrieval using full-text searching
    Kostoff, Ronald N.
    JOURNAL OF INFORMATION SCIENCE, 2010, 36 (01) : 104 - 113
  • [6] A COMPARISON OF INDEXING AND FULL-TEXT FOR THE RETRIEVAL OF CLINICAL MEDICAL LITERATURE
    SIEVERT, M
    MCKININ, EJ
    SLOUGH, M
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1988, 25 : 143 - 146
  • [7] An efficient synchronous indexing technique for full-text retrieval in distributed databases
    Hassen, Fadoua
    Amel, Grissa Touzi
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 811 - 821
  • [8] Improved self-indexing inverted files for full-text retrieval
    College of Compute Science, South-Central University for Nationalities, Wuhan 430074, China
    不详
    J. Comput. Inf. Syst., 2009, 2 (1017-1024):
  • [9] A full-text information retrieval system for an epidemiological registry
    Cuggia, Marc
    Bayat, Sahar
    Garcelon, Nicolas
    Sanders, Lauren
    Rouget, Florence
    Coursin, Arnaud
    Pladys, Patrick
    MEDINFO 2010, PTS I AND II, 2010, 160 : 491 - 495
  • [10] Terms, relationships and representativity in text indexing for information retrieval
    Gonzalez, Marco
    de Lima, Vera L. S.
    de Lima, Jose V.
    LETRAS DE HOJE-ESTUDOS E DEBATES EM LINGUISTICA LITERATURA E LINGUA PORTUGUESA, 2006, 41 (02): : 65 - 87