Text Information Retrieval in Tetun

被引:1
作者
de Jesus, Gabriel [1 ,2 ]
机构
[1] Univ Porto FEUP, INESC TEC, Rua Dr Roberto Frias, P-4200465 Porto, Portugal
[2] Univ Porto FEUP, Fac Engn, Rua Dr Roberto Frias, P-4200465 Porto, Portugal
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III | 2023年 / 13982卷
关键词
Information retrieval; Tetun; Search; Ad-hoc retrieval; Low-resource language;
D O I
10.1007/978-3-031-28241-6_48
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tetun is one of Timor-Leste's official languages alongside Portuguese. It is a low-resource language with over 932,000 speakers that started developing when Timor-Leste restored its independence in 2002. Newspapers mainly use Tetun and more than ten national online news websites actively broadcast news in Tetun every day. However, since information retrieval-based solutions for Tetun do not exist, finding Tetun information on the internet and digital platforms is challenging. This work aims to investigate and develop solutions that can enable the application of information retrieval techniques to develop search solutions for Tetun using Tetun INL and focus on the ad-hoc text retrieval task. As a result, we expect to have effective search solutions for Tetun and contribute to the innovation in information retrieval for low-resource languages, including making Tetun datasets available for future researchers.
引用
收藏
页码:429 / 435
页数:7
相关论文
共 36 条
[1]   Hamshahri: A standard Persian text collection [J].
AleAhmad, Abolfazl ;
Amiri, Hadi ;
Darrudi, Ehsan ;
Rahgozar, Masoud ;
Oroumchian, Farhad .
KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) :382-387
[2]  
[Anonymous], TIM NEWS ONL NEWS AG
[3]  
[Anonymous], GOV DECR LAW NO 1 20
[4]  
[Anonymous], 2008, Introduction to information retrieval
[5]  
[Anonymous], 2011, Modern Information Retrieval: The concepts and technology behind search
[6]  
archive.org, STAND ORTH TET LANG
[7]  
Artetxe M, 2022, Arxiv, DOI arXiv:2203.08111
[8]  
Bruce Croft W., 2009, Search Engines-Information Retrieval in Practice
[9]   Focused crawling: a new approach to topic-specific Web resource discovery [J].
Chakrabarti, S ;
van den Berg, M ;
Dom, B .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16) :1623-1640
[10]  
Chavula C, 2021, PROCEEDINGS OF THE 2021 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL, ICTIR 2021, P137, DOI 10.1145/3471158.3472251