Context-aware Urdu Information Retrieval System

被引：1

作者：

Shoaib, Umar ^{[1
]}

Fiaz, Laiba ^{[1
]}

Chakraborty, Chinmay ^{[2
]}

Rauf, Hafiz Tayyab ^{[3
]}

机构：

[1] Univ Gujrat, Dept Comp Sci, Gujrat 50700, Pakistan

[2] Birla Inst Technol, Elect & Commun Engn, Mesra, Jharkhand, India

[3] Univ Bradford, Fac Engn & Informat, Dept Comp Sci, Bradford BD7 1DP, W Yorkshire, England

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 03期

关键词：

Urdu language; information retrieval; semantic web; ontology; triplets; quad extraction; context-based; Web Semantic Search Engine; WSA; searching and indexing; keywords; corpus; Uniform Resource Identifier;

D O I：

10.1145/3502854

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

World Wide Web (WWW) is playing a vital role for sharing dynamic knowledge in every field of life. The information on web comprises a huge amount of data in different forms such as structured, semi structured, or few is totally in unstructured format. Due to huge size of information, searching from larger textual data about the specific topic or getting precise information is a challenging task. All this leads to the problem of word sense ambiguity (WSA). Urdu language-based information retrieval system using different techniques related toWeb Semantic Search Engine architecture is proposed to efficiently retrieve the relevant information and solve the problem of WSA. The proposed system has average precision ratio 96% as compared to average precision ratio of 74% and 75% average precision Google for single word query. For the long text queries, our system outperforms the existing famous search engines with 92% accuracy such as Bing and Google having 16.50% and 16% accuracy, respectively. Similarly, the proposed system for single word query, the recall ratio is 32.25% as compared to 25% and 25% of Bing and Google. The results of recall ratio for long text query are improved as well, showing 6.38% as compared to 6.20% and 4.8% of Bing and Google, respectively. The results showed that the proposed system gives better and efficient results as compared to the existing systems for Urdu language.

引用

页数：19

共 60 条

[1]

Abebe Minale A., 2016, 2016 19th IEEE International Conference on Computational Science and Engineering (CSE), IEEE 14th International Conference on Embedded and Ubiquitous Computing (EUC), and 15th International Symposium on Distributed Computing and Applications for Business Engineering (DCABES). Proceedings, P512, DOI 10.1109/CSE-EUC-DCABES.2016.234

[2] Automatic extraction of ontological relations from Arabic text [J].

Al Zamil, Mohammed G. H. ;

Al-Radaideh, Qasem .

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) :462-472

[3]

Alromima W, 2015, 2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), P620, DOI 10.1109/IntelCIS.2015.7397287

[4]

Andrei Conicov, 2012, THESIS U KARLOVA

[5]

[Anonymous], 2013, INT C CLOUD BIG DATA

[6]

Ansari N. A., 2019, PAKIST J LANG STUD, V3, P31

[7]

Asma Naseer, 2009, SUPERVISED WORD SENS

[8]

Ayaz B, 2016, 2016 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS AND TECHNOLOGIES (ICOSST), P42, DOI 10.1109/ICOSST.2016.7838575

[9] The anatomy of a large-scale hypertextual Web search engine [J].

Brin, S ;

Page, L .

COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117

[10]

Celino I, 2007, LECT NOTES COMPUT SC, V4607, P485

← 1 2 3 4 5 6 →