Tech

NIH makes its coronavirus genomic data publicly accessible in the cloud

Researchers can now quickly access the data for free, so long as they have an NIH award.

August 18, 2020

(Getty Images)

The National Institutes of Health is making genomic data about the coronavirus publicly accessible to researchers in the cloud for the first time.

Created by the National Center for Biotechnology Information, the Coronavirus Genome Sequence Dataset consists of researcher-submitted data, including normalized Sequence Read Archive (SRA) file formats. The SRA is a bioinformatics repository of DNA sequences.

Researchers with active NIH awards can now quickly access the dataset at no cost via the Registry of Open Data on Amazon Web Services, and the agency plans to make it available on more public data cloud platforms.

“Containing COVID-19 outbreaks and preparing for future pandemics will require a deep understanding of the SARS-CoV-2 genome in the context of other COVID-19 patients and the broader Coronaviridae family,” said Ryan Layer, assistant professor at the University of Colorado Boulder’s BioFrontiers Institute, in a statement. “The NCBI Coronavirus Genome Sequence Dataset makes over a decade of viral genome data publicly accessible for researchers, empowering anyone in the research community to participate in the pandemic response.”

The dataset contains more than 13,000 SRA runs, NIH says. The project is part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative. STRIDES is a collaboration between NIH and AWS to use the cloud to assist researchers with active NIH awards.

The data being made available will help researchers understand not only COVID-19 but other pandemic diseases. Differences in genetic sequences among infected patients help researchers determine how quickly the virus is evolving, and genetics are thought to play a role in how patients react to infection. Diagnostic testing can also be fine tuned.

The dataset itself consists of two buckets: one containing raw and normalized files categorized by SRA accession code and another containing accession metadata that will soon be queryable within the Amazon Athena interactive query service.

NIH makes its coronavirus genomic data publicly accessible in the cloud

More Like This

The software you can’t use at NASA

Amid scrutiny into the US Secret Service, a look at how the agency uses technology

VA moving to Login.gov and ID.me for managing health care services

Top Stories

More than 1,300 devices have been reported missing to USAID, document shows

Harris likely to combine Biden AI policies with Silicon Valley-informed approach

GOP lawmakers, financial leaders ‘leery’ of rushing AI rules on the sector

CrowdStrike outage briefly impacted national organ transplant matching system

NIST seeks organization to stand up institute focused on AI to boost manufacturing

New TMF investments support AI Safety Institute, upgrades to nuclear emergency response

More Scoops

With shift to increased remote work and zero trust, NIH eyes cloud solution for identity

HHS makes Palantir data analytics platform available to all its agencies

Biden calls on Congress to fund ‘DARPA for health’ in State of the Union address

NIH awards Palantir further contract for COVID-19 data enclave

NIH’s COVID-19 data enclave continues to evolve with the virus

HHS data collection and sharing continues to evolve with the pandemic

VA expanding clinical data access to improve COVID-19, suicide prevention outcomes

Latest Podcasts

The VA extends its EHR contract with Oracle Center for another 11 months.

Leveraging AI to modernize government IT systems

The Coast Guard’s AI chief takes a new role focused on the 2024 presidential transition

TMF funds enhancements in nuclear and AI safety; Federal initiatives strengthen child online protection

Tech

Defense

Cyber

FedScoop TV