Information retrieval in hydrochemical data using the latent semantic indexing approach

被引:4
|
作者
Praus, Petr
Praks, Pavel
机构
[1] Tech Univ Ostrava, Dept Analyt Chem & Mat Testing, Ostrava 70833, Czech Republic
[2] Tech Univ Ostrava, Dept Appl Math, Dept Math & Descript Geometry, Ostrava 70833, Czech Republic
关键词
hydrochemistry; information retrieval; latent semantic indexing; principal component analysis; similarity;
D O I
10.2166/hydro.2007.003b
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The latent semantic indexing (LSI) method was applied for the retrieval of similar samples (those samples with a similar composition) in a dataset of groundwater samples The LSI procedure was. based on two steps: (i) reduction of the data dimensionality by principal component analysis (PCA) and (ii) calculation of a similarity between selected samples (queries) and other samples. The similarity measures were expressed as the cosine similarity, the Euclidean and Manhattan distances. Five queries were chosen so as to represent different sampling localities. The original data space of 14 variables measured in 95 samples of groundwater was reduced to the three-dimensional space of the three largest principal components which explained nearly 80% of the total variance. The five most proximity samples to each query were evaluated. The LSI outputs were compared with the retrievals in the orthogonal system of all variables transformed by PCA and in the system of standardized original variables. most of these retrievals did not agree with the LSI ones, most likely because both systems contained the interfering data noise which was not preliminary removed by the dimensionality reduction. Therefore the LSI approach based on the noise filtration was considered to be a promising strategy for information retrieval in real hydrochemical data.
引用
收藏
页码:135 / 143
页数:9
相关论文
共 50 条
  • [21] Latent semantic indexing for web service retrieval
    Czyszczoń, Adam (adam.czyszczon@pwr.edu.pl), 1600, Springer Verlag (8733):
  • [22] Latent Semantic Indexing for Web Service Retrieval
    Czyszczon, Adam
    Zgrzywa, Aleksander
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, ICCCI 2014, 2014, 8733 : 694 - 702
  • [23] A latent semantic indexing and WordNet based information retrieval model for digital forensics
    Du, Lan
    Jin, Huidong
    de Vel, Olivier
    Liu, Nianjun
    ISI 2008: 2008 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2008, : 70 - +
  • [24] Cross-language information retrieval using latent semantic indexing and self-organizing maps
    Ampazis, N
    Iakovaki, H
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 751 - 755
  • [25] Latent semantic indexing-based intelligent information retrieval system for digital libraries
    School of Computer Sciences, Vellore Institute of Technology, Deemed University, Vellore
    632014, India
    J. Compt. Inf. Technol., 2006, 3 (191-196):
  • [26] Application of latent semantic indexing on Malay-English cross language information retrieval
    Abdullah, MT
    Ahmad, F
    Mahmod, R
    Sembok, TMT
    DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 663 - 665
  • [27] Enhanced approach for latent semantic indexing using wavelet transform
    Jaber, T.
    Amira, A.
    Milligan, P.
    IET IMAGE PROCESSING, 2012, 6 (09) : 1236 - 1245
  • [28] Image Semantic Extraction Using Latent Semantic Indexing On Image Retrieval Automatic-Annotation
    Herdiyeni, Yeni
    Nurdiati, Sri
    Abu Daud, Imam
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 283 - 288
  • [29] Using Latent Semantic Indexing for Morph-based Spoken Document Retrieval
    Turunen, Ville T.
    Kurimo, Mikko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 341 - 344
  • [30] Information retrieval and text categorization with semantic indexing
    Rosso, P
    Molina, A
    Pla, F
    Jiménez, D
    Vidal, V
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 596 - 600