The USDA-ARS Ag100Pest Initiative: High-Quality Genome Assemblies for Agricultural Pest Arthropod Research

被引:40
作者
Childers, Anna K. [1 ]
Geib, Scott M. [2 ]
Sim, Sheina B. [2 ]
Poelchau, Monica F. [3 ]
Coates, Brad S. [4 ]
Simmonds, Tyler J. [2 ,5 ]
Scully, Erin D. [6 ]
Smith, Timothy P. L. [7 ]
Childers, Christopher P. [3 ]
Corpuz, Renee L. [2 ]
Hackett, Kevin [8 ]
Scheffler, Brian [9 ]
机构
[1] USDA, Bee Res Lab, Beltsville Agr Res Ctr, Agr Res Serv, 10300 Baltimore Ave, Beltsville, MD 20705 USA
[2] USDA, Trop Crop & Commod Protect Res Unit, Daniel K Inouye US Pacific Basin Agr Res Ctr, Agr Res Serv, 64 Nowelo St, Hilo, HI 96720 USA
[3] USDA, Natl Agr Lib, Agr Res Serv, Beltsville, MD 10301 USA
[4] USDA, Corn Insects & Crop Genet Res Unit, Agr Res Serv, 2310 Pammel Dr, Ames, IA 50011 USA
[5] Oak Ridge Inst Sci & Educ, POB 117, Oak Ridge, TN 37831 USA
[6] USDA, Stored Prod Insect & Engn Res Unit, Ctr Grain & Anim Hlth Res, Agr Res Serv, 1515 Coll Ave, Manhattan, KS 66502 USA
[7] USDA, Genet & Breeding Res Unit, US Meat Anim Res Ctr, Agr Res Serv, State Spur 18D, Clay Ctr, NE 20705 USA
[8] USDA, Off Natl Programs Crop Prod & Protect, Agr Res Serv, 5601 Sunnyside Ave, Beltsville, MD 20705 USA
[9] USDA, Genom & Bioinformat Res Unit, Jamie Whitten Delta States Res Ctr, Agr Res Serv, 141 Expt Stn Rd, Stoneville, MS 38776 USA
基金
美国农业部;
关键词
Arthropoda; pests; invasive pests; genome sequencing; long-read sequencing; low-input DNA; HiC scaffolding; genome assembly; genomics; GENE PREDICTION; UNITED-STATES; ANNOTATION; VISUALIZATION; MECHANISMS; RESISTANCE;
D O I
10.3390/insects12070626
中图分类号
Q96 [昆虫学];
学科分类号
摘要
Simple Summary High-quality genome assemblies are essential tools for modern biological research. In the past, creating genome assemblies was prohibitively expensive and time-consuming for most non-model insect species due to, in part, the technical challenge of isolating the necessary quantity and quality of DNA from many species. Sequencing methods have now improved such that many insect genomes can be sequenced and assembled at scale. We created the Ag100Pest Initiative to propel agricultural research forward by assembling reference-quality genomes of important arthropod pest species. Here, we describe the Ag100Pest Initiative's processes and experimental procedures. We show that the Ag100Pest Initiative will greatly expand the diversity of publicly available arthropod genome assemblies. We also demonstrate the high quality of preliminary contig assemblies. We share arthropod-specific technical details and insights that we have gained during the project. The methods and preliminary results presented herein should help other researchers attain similarly high-quality assemblies, effectively changing the landscape of insect genomics. The phylum Arthropoda includes species crucial for ecosystem stability, soil health, crop production, and others that present obstacles to crop and animal agriculture. The United States Department of Agriculture's Agricultural Research Service initiated the Ag100Pest Initiative to generate reference genome assemblies of arthropods that are (or may become) pests to agricultural production and global food security. We describe the project goals, process, status, and future. The first three years of the project were focused on species selection, specimen collection, and the construction of lab and bioinformatics pipelines for the efficient production of assemblies at scale. Contig-level assemblies of 47 species are presented, all of which were generated from single specimens. Lessons learned and optimizations leading to the current pipeline are discussed. The project name implies a target of 100 species, but the efficiencies gained during the project have supported an expansion of the original goal and a total of 158 species are currently in the pipeline. We anticipate that the processes described in the paper will help other arthropod research groups or other consortia considering genome assembly at scale.
引用
收藏
页数:14
相关论文
共 59 条
[1]   MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics [J].
Allio, Remi ;
Schomaker-Bastos, Alex ;
Romiguier, Jonathan ;
Prosdocimi, Francisco ;
Nabholz, Benoit ;
Delsuc, Frederic .
MOLECULAR ECOLOGY RESOURCES, 2020, 20 (04) :892-905
[2]   Opportunities and challenges in long-read sequencing data analysis [J].
Amarasinghe, Shanika L. ;
Su, Shian ;
Dong, Xueyi ;
Zappia, Luke ;
Ritchie, Matthew E. ;
Gouil, Quentin .
GENOME BIOLOGY, 2020, 21 (01)
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Herbivory in global climate change research: direct effects of rising temperature on insect herbivores [J].
Bale, JS ;
Masters, GJ ;
Hodkinson, ID ;
Awmack, C ;
Bezemer, TM ;
Brown, VK ;
Butterfield, J ;
Buse, A ;
Coulson, JC ;
Farrar, J ;
Good, JEG ;
Harrington, R ;
Hartley, S ;
Jones, TH ;
Lindroth, RL ;
Press, MC ;
Symrnioudis, I ;
Watt, AD ;
Whittaker, JB .
GLOBAL CHANGE BIOLOGY, 2002, 8 (01) :1-16
[5]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[6]   The InterPro protein families and domains database: 20 years on [J].
Blum, Matthias ;
Chang, Hsin-Yu ;
Chuguransky, Sara ;
Grego, Tiago ;
Kandasaamy, Swaathi ;
Mitchell, Alex ;
Nuka, Gift ;
Paysan-Lafosse, Typhaine ;
Qureshi, Matloob ;
Raj, Shriya ;
Richardson, Lorna ;
Salazar, Gustavo A. ;
Williams, Lowri ;
Bork, Peer ;
Bridge, Alan ;
Gough, Julian ;
Haft, Daniel H. ;
Letunic, Ivica ;
Marchler-Bauer, Aron ;
Mi, Huaiyu ;
Natale, Darren A. ;
Necci, Marco ;
Orengo, Christine A. ;
Pandurangan, Arun P. ;
Rivoire, Catherine ;
Sigrist, Christian J. A. ;
Sillitoe, Ian ;
Thanki, Narmada ;
Thomas, Paul D. ;
Tosatto, Silvio C. E. ;
Wu, Cathy H. ;
Bateman, Alex ;
Finn, Robert D. .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D344-D354
[7]   Lessons from modENCODE [J].
Brown, James B. ;
Celniker, Susan E. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 16, 2015, 16 :31-53
[8]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[9]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[10]   BlobToolKit - Interactive Quality Assessment of Genome Assemblies [J].
Challis, Richard ;
Richards, Edward ;
Rajan, Jeena ;
Cochrane, Guy ;
Blaxter, Mark .
G3-GENES GENOMES GENETICS, 2020, 10 (04) :1361-1374