Data Perturbation and Recovery of Time Series Gene Expression Data

被引:0
|
作者
Sarkar, Aisharjya [1 ]
Mishra, Prabhat [1 ]
Kahveci, Tamer [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Time series analysis; Time measurement; Gene expression; Perturbation methods; RNA; Data models; Time factors; Time-series gene expression; data perturbation; missing time point; multiple time-series alignment; MULTIPLE SEQUENCE ALIGNMENT;
D O I
10.1109/TCBB.2021.3058342
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cells, in order to regulate their activities, process transcripts by controlling which genes to transcribe and by what amount. The transcription level of genes often change over time. Rate of change of gene transcription varies between genes. It can even change for the same gene across different members of a population. Thus, for a given gene, it is important to study the transcription level not only at a single time point, but across multiple time points to capture changes in patterns of gene expression which underlies several phenotypic or external factors. In such a dataset perturbation can happen due to which it may have missing transcription values for different samples at different time points. In this paper, we define three data perturbation models that are significant with respect to random deletion. We also define a recovery method that recovers data loss in the perturbed dataset such that the error is minimized. Our experimental results show that the recovery method compensates for the loss made by perturbation models. We show by means of two measures, namely, normalized distance and Pearson's correlation coefficient that the distance between the original and perturbed dataset is more than the distance between original and recovered dataset.
引用
收藏
页码:830 / 842
页数:13
相关论文
共 50 条
  • [1] Analyzing time series gene expression data
    Bar-Joseph, Z
    BIOINFORMATICS, 2004, 20 (16) : 2493 - 2503
  • [2] Gene Selection in Time-Series Gene Expression Data
    Adhikari, Prem Raj
    Upadhyaya, Bimal Babu
    Meng, Chen
    Hollmen, Jaakko
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 145 - +
  • [3] Clustering short time series gene expression data
    Ernst, J
    Nau, GJ
    Bar-Joseph, Z
    BIOINFORMATICS, 2005, 21 : I159 - I168
  • [4] Time series analysis of gene expression and location data
    Yeang, CH
    Jaakkola, T
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 305 - 312
  • [5] Time series analysis of gene expression and location data
    Yeang, CH
    Jaakkola, T
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (05) : 755 - 769
  • [6] A linear time biclustering algorithm for time series gene expression data
    Madeira, SC
    Oliveira, AL
    ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2005, 3692 : 39 - 52
  • [7] Clustered alignments of gene-expression time series data
    Smith, Adam A.
    Vollrath, Aaron
    Bradfield, Christopher A.
    Craven, Mark
    BIOINFORMATICS, 2009, 25 (12) : I119 - I127
  • [8] Hybrid method for the analysis of time series gene expression data
    Han, Lixin
    Yan, Hong
    KNOWLEDGE-BASED SYSTEMS, 2012, 35 : 14 - 20
  • [9] Continuous representations of time-series gene expression data
    Bar-Joseph, Z
    Gerber, GK
    Gifford, DK
    Jaakkola, TS
    Simon, I
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) : 341 - 356
  • [10] Constrained Subspace Clustering for Time Series Gene Expression Data
    Qu, Jibin
    Ng, Michael
    Chen, Luonan
    COMPUTATIONAL SYSTEMS BIOLOGY, 2010, 13 : 323 - +