KoNA: Korean Nucleotide Archive as A New Data Repository for Nucleotide Sequence Data

被引:3
作者
Ko, Gunhwan [1 ]
Lee, Jae Ho [1 ]
Sim, Young Mi [1 ]
Song, Wangho [1 ]
Yoon, Byung-Ha [1 ]
Byeon, Iksu [1 ]
Lee, Bang Hyuck [1 ]
Kim, Sang-Ok [1 ]
Choi, Jinhyuk [1 ]
Jang, Insoo [1 ]
Kim, Hyerin [1 ]
Yang, Jin Ok [1 ]
Jang, Kiwon [1 ]
Kim, Sora [1 ]
Kim, Jong-Hwan [1 ]
Jeon, Jongbum [1 ]
Jung, Jaeeun [1 ]
Hwang, Seungwoo [1 ]
Park, Ji-Hwan [1 ]
Kim, Pan-Gyu [1 ]
Kim, Seon-Young [1 ]
Lee, Byungwook [1 ]
机构
[1] Korea Res Inst Biosci & Biotechnol, Korea Bioinformat Ctr, Daejeon 34141, South Korea
基金
新加坡国家研究基金会;
关键词
Korea BioData Station; Nucleotide sequence; Next-generation sequencing repository; Genomics; Deposition and access of big data;
D O I
10.1093/gpbjnl/qzae017
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.
引用
收藏
页数:8
相关论文
共 15 条
  • [1] [Anonymous], 2022, Database resources of the National Genomics Data Center, V51, pD18
  • [2] The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types
    Chen, Tingting
    Chen, Xu
    Zhang, Sisi
    Zhu, Junwei
    Tang, Bixia
    Wang, Anke
    Dong, Lili
    Zhang, Zhewen
    Yu, Caixia
    Sun, Yanling
    Chi, Lianjiang
    Chen, Huanxin
    Zhai, Shuang
    Sun, Yubin
    Lan, Li
    Zhang, Xin
    Xiao, Jingfa
    Bao, Yiming
    Wang, Yanqing
    Zhang, Zhang
    Zhao, Wenming
    [J]. GENOMICS PROTEOMICS & BIOINFORMATICS, 2021, 19 (04) : 578 - 583
  • [3] iCSDB: an integrated database of CRISPR screens
    Choi, Ahyoung
    Jang, Insu
    Han, Heewon
    Kim, Min-Seo
    Choi, Jinhyuk
    Lee, Jieun
    Cho, Sung-Yup
    Jun, Yukyung
    Lee, Charles
    Kim, Jaesang
    Lee, Byungwook
    Lee, Sanghyuk
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D956 - D961
  • [4] The European Nucleotide Archive in 2021
    Cummins, Carla
    Ahamed, Alisha
    Aslam, Raheela
    Burgin, Josephine
    Devraj, Rajkumar
    Edbali, Ossama
    Gupta, Dipayan
    Harrison, Peter W.
    Haseeb, Muhammad
    Holt, Sam
    Ibrahim, Talal
    Ivanov, Eugene
    Jayathilaka, Suran
    Kadhirvelu, Vishnukumar
    Kay, Simon
    Kumar, Manish
    Lathi, Ankur
    Leinonen, Rasko
    Madeira, Fabio
    Madhusoodanan, Nandana
    Mansurova, Milena
    O'Cathail, Colman
    Pearce, Matt
    Pesant, Stephane
    Rahman, Nadim
    Rajan, Jeena
    Rinck, Gabriele
    Selvakumar, Sandeep
    Sokolov, Alexey
    Suman, Swati
    Thorne, Ross
    Totoo, Prabhat
    Vijayaraja, Senthilnathan
    Waheed, Zahra
    Zyoud, Ahmad
    Lopez, Rodrigo
    Burdett, Tony
    Cochrane, Guy
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D106 - D110
  • [5] Sharing Biomedical Data Obtained Through Government-Funded Research and Development Projects in Korea
    Hwang, Seungwoo
    Kong, Hyoun-Joong
    [J]. HEALTHCARE INFORMATICS RESEARCH, 2021, 27 (04) : 265 - 266
  • [6] ChimerDB 4.0: an updated and expanded database of fusion genes
    Jang, Ye Eun
    Jang, Insu
    Kim, Sunkyu
    Cho, Subin
    Kim, Daehan
    Kim, Keonwoo
    Kim, Jaewon
    Hwang, Jimin
    Kim, Sangok
    Kim, Jaesang
    Kang, Jaewoo
    Lee, Byungwook
    Lee, Sanghyuk
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D817 - D824
  • [7] GEMiCCL: mining genotype and expression data of cancer cell lines with elaborate visualization
    Jeong, Inhae
    Yu, Namhee
    Jang, Insu
    Jun, Yukyung
    Kim, Min-Seo
    Choi, Jinhyuk
    Lee, Byungwook
    Lee, Sanghyuk
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [8] The Sequence Read Archive: a decade more of explosive growth
    Katz, Kenneth
    Shutov, Oleg
    Lapoint, Richard
    Kimelman, Michael
    Brister, J. Rodney
    O'Sullivan, Christopher
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D387 - D390
  • [9] Ko Gunhwan, 2020, Genomics & Informatics, V18, pe8, DOI 10.5808/GI.2020.18.1.e8
  • [10] Closha: bioinformatics workflow system for the analysis of massive sequencing data
    Ko, GunHwan
    Kim, Pan-Gyu
    Yoon, Jongcheol
    Han, Gukhee
    Park, Seong-Jin
    Song, Wangho
    Lee, Byungwook
    [J]. BMC BIOINFORMATICS, 2018, 19