Introducing suffix forest for mining tri-clusters from time-series data

被引:2
作者
Mondal, Kartick Chandra [1 ]
Ghosh, Moumita [1 ]
Fajriyah, Rohmatul [2 ]
Roy, Anirban [3 ]
机构
[1] Jadavpur Univ, Kolkata, India
[2] Islamic Univ Indonesia, Yogyakarta, Indonesia
[3] Govt West Bengal, Dept Environm, West Bengal Biodivers Board, Kolkata, India
关键词
Tri-clustering; Suffix forest; Time-series data; Biodiversity; Forest cover data; ALGORITHM;
D O I
10.1007/s11334-022-00489-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Three-dimensional data is becoming more prevalent these days. Unsupervised data analysis can be used to find hypothesized patterns of interest from the three-dimensional data. In this context, clustering can be used to group observations along a single dimension, but its usage is restricted in three-dimensional data domains as the observations are significantly connected in subspaces of the overall space. Bi-clustering addresses the issue of subspace clustering but ignores the third dimension. As a result, the concept of tri-clustering, coherent subspaces within three-dimensional data, was introduced. To deal with these issues, tri-clustering, the identification of coherent subspaces within three-dimensional data, has been extensively studied. Despite the wide range of contributions to this topic, there is still room for improvement in terms of a more structured view of tri-clustering, extracting multiple forms (e.g., row-major clusters, regular and irregular clusters), and improved algorithmic techniques. This paper introduces a novel data structure suffix forest to design a tri-clustering algorithm. The application of this data mining algorithm is reflected on the Indian Forest Dataset published by the Forest Survey of India. Here, we were successfully able to implement the tri-clustering concept with an informative structure where changes in forest cover and mangrove cover over time are monitored in different states and union territories. This kind of study may be the pioneer for research on biodiversity data analysis for exploring the relationships of different biodiversity traits with respect to both time and geographical region would be one of our future research works.
引用
收藏
页码:765 / 787
页数:23
相关论文
共 42 条
[1]   A hierarchical Bayesian model for flexible module discovery in three-way time-series data [J].
Amar, David ;
Yekutieli, Daniel ;
Maron-Katz, Adi ;
Hendler, Talma ;
Shamir, Ron .
BIOINFORMATICS, 2015, 31 (12) :17-26
[2]  
[Anonymous], 2013, CLA, DOI DOI 10.1007/S10994-015-5487-Y
[3]  
Bieganski P., 1994, Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences. Vol.V: Biotechnology Computing (Cat. No.94TH0607-2), P35, DOI 10.1109/HICSS.1994.323593
[4]   TriRNSC: triclustering of gene expression microarray data using restricted neighbourhood search [J].
Biswal, Bhawani Sankar ;
Patra, Sabyasachi ;
Mohapatra, Anjali ;
Vipsita, Swati .
IET SYSTEMS BIOLOGY, 2020, 14 (06) :323-333
[5]   On suffix extensions in suffix trees [J].
Breslauer, Dany ;
Italiano, Giuseppe F. .
THEORETICAL COMPUTER SCIENCE, 2012, 457 :27-34
[6]   Multi-objective evolutionary triclustering with constraints of time-series gene expression data [J].
Chen, Lei ;
Liu, Hai-Lin ;
Tang, Weiseng .
INTEGRATED COMPUTER-AIDED ENGINEERING, 2019, 26 (04) :399-410
[7]  
Chino DYT., 2012, J INF DATA MANAG, V3, P101
[8]  
Ghosh M, 2021, IN PRESS
[9]  
Ghosh M, 2021, P INT C EMERGING APP, P1
[10]   From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction [J].
Giegerich, R ;
Kurtz, S .
ALGORITHMICA, 1997, 19 (03) :331-353