On Building an Interpretable Topic Modeling Approach for the Urdu Language

被引:0
|
作者
Nasim, Zarmeen [1 ]
机构
[1] Inst Business Adm IBA, Artificial Intelligence Lab, Karachi, Pakistan
来源
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often uninterpretable, as suggested topics without integrating them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semantically rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clustering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.
引用
收藏
页码:5200 / 5201
页数:2
相关论文
共 50 条
  • [1] Statistical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Rehman, Zobia
    Akram, Junaid
    Ali, Waqar
    Shah, Munam Ali
    Salman, Muhammad
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 62 - 67
  • [2] Hierarchical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Khan, Ali Haider
    Aftab, Mustansar
    Rehman, Zobia
    Shah, Munam Ali
    2019 25TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), 2019, : 464 - 469
  • [3] Building interpretable fuzzy systems: a new approach to fuzzy modeling
    Contreras Montes, Juan
    Misa Llorca, Roger
    Paz Grau, Juan
    CERMA2006: ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE, VOL 1, PROCEEDINGS, 2006, : 117 - 122
  • [4] PLSA BASED TOPIC MIXTURE LANGUAGE MODELING APPROACH
    Bai, Shuanhu
    Li, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 185 - 188
  • [5] Topic Modeling for Interpretable Text Classification From EHRs
    Rijcken, Emil
    Kaymak, Uzay
    Scheepers, Floortje
    Mosteiro, Pablo
    Zervanou, Kalliopi
    Spruit, Marco
    FRONTIERS IN BIG DATA, 2022, 5
  • [6] A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)
    Shakeel, Khadija
    Tahir, Ghulam Rasool
    Tehseen, Irsha
    Ali, Mubashir
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 117 - 123
  • [7] A Language Independent Approach to Develop Urdu Stemmer
    Husain, Mohd Shahid
    Ahamad, Faiyaz
    Khalid, Saba
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 3, 2013, 178 : 45 - +
  • [8] Transformer-Based Topic Modeling for Urdu Translations of the Holy Quran
    Zafar, Amna
    Wasim, Muhammad
    Zulfiqar, Shaista
    Waheed, Talha
    Siddique, Abubakar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)
  • [9] Towards building a Urdu Language Corpus using Common Crawl
    Shafiq, Hafiz Muhammad
    Tahir, Bilal
    Mehmood, Muhammad Amir
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2445 - 2455
  • [10] Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling
    Mustafa, Mubashar
    Zeng, Feng
    Ghulam, Hussain
    Muhammad Arslan, Hafiz
    INFORMATION, 2020, 11 (11) : 1 - 16