AI-Based Assistance for Management of Oral Community Knowledge in Low-Resource and Colloquial Kannada Language

被引:0
|
作者
Aparna, M. [1 ]
Srivatsa, Sharath [1 ]
Madhavan, G. Sai [1 ]
Dinesh, T. B. [2 ]
Srinivasa, Srinath [1 ]
机构
[1] Int Inst Informat Technol, 26-C,Elect City Phase 1, Bangalore, Karnataka, India
[2] IruWay Rural Res Lab, Janastu, Durgadahalli, India
来源
BIG DATA ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2023 | 2024年 / 14516卷
关键词
Community Knowledge Management; Low-resource Languages; Automatic Speech Recognition; Keyword Search; Named Entity Recognition; Large Language Model; Big Data;
D O I
10.1007/978-3-031-58502-9_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge in rural communities is largely created, preserved, and is transferred verbally, and it is limited. This information is valuable to these communities, and managing and making it available digitally with state-of-the-art approaches enriches awareness and collective knowledge of people of these communities. The large amounts of data and information produced on the Internet are inaccessible to the population in these rural communities due to factors like lack of infrastructure, connectivity, and limited literacy. Knowledge internal to rural communities is also not conserved and made available in any global Big Data information systems. Artificial Intelligence (AI) technologies such as Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) provide substantial assistance when vast quantities of data, like Big Data, are available to build solutions. In the case of low-resource languages like Kannada and rural colloquial dialects, publicly available corpora are significantly less. Building state-of-the-art AI solutions is challenging in this context, and we address this problem in this work. Knowledge management in rural communities requires a low-cost and efficient approach that social workers can use. This paper proposes an architecture for oral knowledge management for rural communities speaking colloquial Kannada. The proposed architecture has an interface for oral knowledge retrieval using text processing on transcripts generated from the smallest state-of-the-art ASR model. We propose three interfaces to search for content: an n-gram based fuzzy search to search for texts in audios, the most frequent entities search based on the Kannada Named Entity Recognition (NER) model, and question answering with Large Language Model (LLM) using a community knowledge vector store.
引用
收藏
页码:3 / 16
页数:14
相关论文
共 3 条
  • [1] Knowledge Management Framework Over Low Resource Indian Colloquial Language Audio Contents
    Srivatsa, Sharath
    Aparna, M.
    Madhavan, G. Sai
    Srinivasa, Srinath
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 553 - 557
  • [2] Generative AI-based knowledge graphs for the illustration and development of mHealth self-management content
    Blanchard, Marc
    Venerito, Vincenzo
    Azevedo, Pedro Ming
    Hugle, Thomas
    FRONTIERS IN DIGITAL HEALTH, 2024, 6
  • [3] No worry, dat sick go finish small time: Encouraging local community participation in global healthcare using de-terminologization as a low-resource language translation strategy
    Tekwa, Kizito
    Tazoacha, Francis
    LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 2022, 21 : 222 - 252