A two-step procedure for detecting change points in genomic sequences

被引:0
作者
Anjum, Arfa [1 ]
Jaggi, Seema [2 ]
Lall, Shwetank [3 ]
Varghese, Eldho [4 ]
Rai, Anil [1 ]
Bhowmik, Arpan [3 ]
Mishra, Dwijesh Chandra [1 ]
机构
[1] ICAR Indian Agr Stat Res Inst, Ctr Agr Bioinformat, New Delhi 110012, India
[2] ICAR Indian Agr Stat Res Inst, Agr Educ Div, New Delhi 110012, India
[3] ICAR Indian Agr Stat Res Inst, Div Design Expt, New Delhi 110012, India
[4] ICAR Cent Marine Fisheries Res Inst, Fishery Resources Assessment Div, Kochi 682018, India
来源
CURRENT SCIENCE | 2024年 / 126卷 / 01期
关键词
Anomalies; change points; genomic sequences; segmentation; two-step procedure; SEGMENTATION;
D O I
10.18520/cs/v126/i1/54-58
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The field of whole genomic studies and investigations is currently focused on change-point detection. Over time, various segmentation techniques have been proposed to identify these change points. To effectively locate segments within a genome, it is helpful to pinpoint the intervals or boundaries between them, which are known as change points. By treating these change points as outliers, they can be identified. The anomalies or outliers in a dataset are the observations which are significantly different from the rest of the observations. They can be attributed to some measurement errors or properties of the data themselves. Studying the fluctuations over different segments also revealed the heterogeneity bet-ween consecutive segments. In this paper, anomaly iden-tification approach or influential point detection has been discussed and studied in cow genome data of chromo-some 25. Furthermore, the observed anomalies have been confirmed to determine whether or not they are true change points. The two-step technique resulted in the identification of change sites based on observed abnormalities and is efficient in terms of calculation time and cost. This study aims to detect any anomalies in genomic data and determine the exact points at which the data segment significantly differed from the rest of the segments. We have developed relevant R codes for data processing and applied methodologies.
引用
收藏
页码:54 / 58
页数:5
相关论文
共 24 条
  • [1] On a new multivariate two-sample test
    Baringhaus, L
    Franz, C
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) : 190 - 206
  • [2] Transcript mapping based on dRNA-seq data
    Bischler, Thorsten
    Kopf, Matthias
    Voss, Bjoern
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [3] Braun JV, 1998, STAT SCI, V13, P142
  • [4] Statistical Approach for Improving Genomic Prediction Accuracy through Efficient Diagnostic Measure of Influential Observation
    Budhlakoti, Neeraj
    Rai, Anil
    Mishra, D. C.
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [5] DETECTION OF INFLUENTIAL OBSERVATION IN LINEAR-REGRESSION
    COOK, RD
    [J]. TECHNOMETRICS, 1977, 19 (01) : 15 - 18
  • [6] INFLUENTIAL OBSERVATIONS IN LINEAR-REGRESSION
    COOK, RD
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (365) : 169 - 174
  • [7] Comparative Testing of DNA Segmentation Algorithms Using Benchmark Simulations
    Elhaik, Eran
    Graur, Dan
    Josic, Kresimir
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (05) : 1015 - 1024
  • [8] A fast Bayesian change point analysis for the segmentation of microarray data
    Erdman, Chandra
    Emerson, John W.
    [J]. BIOINFORMATICS, 2008, 24 (19) : 2143 - 2148
  • [9] On-line inference for multiple changepoint problems
    Fearnhead, Paul
    Liu, Zhen
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2007, 69 : 589 - 605
  • [10] iSeg: an efficient algorithm for segmentation of genomic and epigenomic data
    Girimurugan, Senthil B.
    Liu, Yuhang
    Lung, Pei-Yau
    Vera, Daniel L.
    Dennis, Jonathan H.
    Bass, Hank W.
    Zhang, Jinfeng
    [J]. BMC BIOINFORMATICS, 2018, 19