Information retrieval in hydrochemical data using the latent semantic indexing approach

被引:4
|
作者
Praus, Petr
Praks, Pavel
机构
[1] Tech Univ Ostrava, Dept Analyt Chem & Mat Testing, Ostrava 70833, Czech Republic
[2] Tech Univ Ostrava, Dept Appl Math, Dept Math & Descript Geometry, Ostrava 70833, Czech Republic
关键词
hydrochemistry; information retrieval; latent semantic indexing; principal component analysis; similarity;
D O I
10.2166/hydro.2007.003b
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The latent semantic indexing (LSI) method was applied for the retrieval of similar samples (those samples with a similar composition) in a dataset of groundwater samples The LSI procedure was. based on two steps: (i) reduction of the data dimensionality by principal component analysis (PCA) and (ii) calculation of a similarity between selected samples (queries) and other samples. The similarity measures were expressed as the cosine similarity, the Euclidean and Manhattan distances. Five queries were chosen so as to represent different sampling localities. The original data space of 14 variables measured in 95 samples of groundwater was reduced to the three-dimensional space of the three largest principal components which explained nearly 80% of the total variance. The five most proximity samples to each query were evaluated. The LSI outputs were compared with the retrievals in the orthogonal system of all variables transformed by PCA and in the system of standardized original variables. most of these retrievals did not agree with the LSI ones, most likely because both systems contained the interfering data noise which was not preliminary removed by the dimensionality reduction. Therefore the LSI approach based on the noise filtration was considered to be a promising strategy for information retrieval in real hydrochemical data.
引用
收藏
页码:135 / 143
页数:9
相关论文
共 50 条
  • [1] Using latent semantic indexing for multilanguage information retrieval
    Berry, MW
    Young, PG
    COMPUTERS AND THE HUMANITIES, 1995, 29 (06): : 413 - 429
  • [2] An approach to semantic indexing and information retrieval
    Suarez Baron, Marco
    Salinas Valencia, Kathleen
    REVISTA FACULTAD DE INGENIERIA-UNIVERSIDAD DE ANTIOQUIA, 2009, (48): : 174 - 187
  • [3] A neural network model for information retrieval using latent semantic indexing
    Syu, I
    Lang, SD
    Deo, N
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1318 - 1323
  • [4] Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval
    School of Computing Sciences, Vellore Institute of Technology, Deemed University, Vellore - 632014, India
    不详
    Int. J. Appl. Math. Comput. Sci., 2006, 4 (551-558):
  • [5] IMPROVING INFORMATION-RETRIEVAL WITH LATENT SEMANTIC INDEXING
    DEERWESTER, S
    DUMAIS, S
    LANDAUER, T
    FURNAS, G
    BECK, L
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1988, 25 : 36 - 40
  • [6] Personal information retrieval based on latent semantic indexing
    Yang, Z
    Deng, GS
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 287 - 291
  • [7] A simplified Latent Semantic Indexing approach for multi-linguistic information retrieval
    Liu, Y
    Lu, HM
    Lu, ZX
    Wang, P
    PACLIC 17: LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2003, : 69 - 79
  • [8] A semidiscrete matrix decomposition for latent semantic indexing in information retrieval
    Kolda, TG
    O'Leary, DP
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1998, 16 (04) : 322 - 346
  • [9] Downdating the latent semantic indexing model for conceptual information retrieval
    Witter, Dian I.
    Berry, Michael W.
    Computer Journal, 41 (08): : 589 - 601
  • [10] Downdating the latent semantic indexing model for conceptual information retrieval
    Witter, DI
    Berry, MW
    COMPUTER JOURNAL, 1998, 41 (08): : 589 - 601