Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

被引:26
|
作者
Chiang, I-Jen [1 ,2 ]
Liu, Charles Chih-Ho [2 ]
Tsai, Yi-Hsin [2 ]
Kumar, Ajit [3 ]
机构
[1] Taipei Med Univ, Grad Inst Biomed Informat, Taipei 10617, Taiwan
[2] Natl Taiwan Univ, Inst Biomed Engn, Taipei 10617, Taiwan
[3] Goa Inst Management, Ribandar 403006, India
基金
美国国家科学基金会;
关键词
Fuzzy aggregation algorithm; fuzzy linguistic topological space; fuzzy semantic topology; fuzzy web hierarchical clustering; named entity recognition (NER);
D O I
10.1109/TFUZZ.2015.2403878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.
引用
收藏
页码:2122 / 2134
页数:13
相关论文
共 50 条
  • [1] Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents
    Patil, Pramod D.
    Kulkarni, Parag
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (08) : 503 - 514
  • [2] Fast fuzzy clustering of Web documents
    Wang, Jian-Hui
    Jiang, Long-Bin
    Yang, Shu
    Chang'an Daxue Xuebao (Ziran Kexue Ban)/Journal of Chang'an University (Natural Science Edition), 2007, 27 (02): : 107 - 110
  • [3] A new approach for fuzzy clustering of web documents
    Friedman, M
    Last, M
    Zaafrany, O
    Schneider, M
    Kandel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 377 - 381
  • [4] Fuzzy co-clustering of web documents
    William-Chandra, T
    Chen, L
    2005 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2005, : 545 - 551
  • [5] Classification of web documents using fuzzy logic categorical data clustering
    Tsekouras, George E.
    Anagnostopoulos, Christos
    Gavalas, Damianos
    Dafri, Economou
    ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 93 - +
  • [6] Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology
    Friedman, Menahem
    Last, Mark
    Makover, Yaniv
    Kandel, Abraham
    INFORMATION SCIENCES, 2007, 177 (02) : 467 - 475
  • [7] Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means
    Avanija, J.
    Ramar, K.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2015, 14 (03)
  • [8] Discovering conceptual web-knowledge in web documents
    Yoo, SY
    Hoffmann, A
    ENGINEERING KNOWLEDGE IN THE AGE OF THE SEMANTIC WEB, PROCEEDINGS, 2004, 3257 : 504 - 505
  • [9] Latent semantics for hotspot information clustering
    He, Ping
    Wang, Xi
    Xu, Xiaofei
    Li, Li
    Journal of Computational Information Systems, 2014, 10 (15): : 6517 - 6525
  • [10] Fuzzy multisets and fuzzy clustering of documents
    Miyamoto, S
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1539 - 1542