The Role of Transliterated Words in Linking Bilingual News Articles in an Archive

被引:1
作者
Khan, Muzammil [1 ]
Khan, Sarwar Shah [1 ]
Alharbi, Yasser [2 ]
Alferaidi, Ali [2 ]
Alharbi, Talal Saad [2 ]
Yadav, Kusum [2 ]
机构
[1] Univ Swat, Dept Comp & Software Technol, Mingora 19130, Pakistan
[2] Univ Hail, Coll Comp Sci & Engn, Hail 55473, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
关键词
transliterated words; news archiving; news linking; dual lingual archive; digital libraries; similarity measure; RECOMMENDATION;
D O I
10.3390/app13074435
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are included in the archive. Computing similarity against a query and among news articles in huge and evolving collections may be inaccurate and time-consuming at run time. This paper introduces a Similarity Measure based on Transliteration Words (SMTW) from the English language in the Urdu scripts for linking news articles extracted from multiple online sources during the preservation process. The SMTW link Urdu-to-English news articles using an upgraded Urdu-to-English lexicon, including transliteration words. The SMTW was exhaustively evaluated to assess the effectiveness using different size datasets and the results were compared with the Common Ratio Measure for Dual Language (CRMDL). The experimental results show that the SMTW was more effective than the CRMDL for linking Urdu-to-English news articles. The precision improved from 50% to 60%, recall improved from 67% to 82%, and the impact of common terms also improved.
引用
收藏
页数:17
相关论文
共 3 条
  • [1] Understanding the Research Challenges in Low-Resource Language and Linking Bilingual News Articles in Multilingual News Archive
    Khan, Muzammil
    Ullah, Kifayat
    Alharbi, Yasser
    Alferaidi, Ali
    Alharbi, Talal Saad
    Yadav, Kusum
    Alsharabi, Naif
    Ahmad, Aakash
    APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [2] A content-based technique for linking dual language news articles in an archive
    Khan, Muzammil
    Rahman, Arif Ur
    Ahmad, Arshad
    Khan, Sarwar Shah
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 57 - 70
  • [3] The role of news title for linking during preservation process in digital archives
    Khan, Muzammil
    Khan, Sarwar Shah
    Ahmad, Arshad
    Rahman, Arif Ur
    LIBRARY HI TECH, 2022, 40 (05) : 1359 - 1383