DOD needs some help digitizing a massive collection of respiratory disease samples

The DOD wants to turn more than 86 million specimens into digital data to be run through machine learning algorithms.
(Getty Images)

The Department of Defense has the world’s largest collection of pathology specimens, including “invaluable” data from the 1918 influenza pandemic. Now it wants help to digitize it.

Digitizing the collection of more than a hundred years of data —in the form of 55 million glass slides, 31 million paraffin-embedded tissue blocks and 500,000 wet tissue samples — would create a potentially exquisite machine learning database for computers to gain broader understanding of global health issues.

The “complete digitization” of all those objects would be a major lift, with digital images and barcodes containing information for every sample. A new sources sought notice is looking for companies up to the task. The Defense Digital Services (DDS) and the Defense Health Agency‘s Joint Pathology Center (JPC), the entity that oversees the repository database, are spearheading the project.

“An example of the repository’s invaluable influence on medicine is the use of tissue specimens in the repository to sequence the 1918 influenza virus,” the announcement states. “The resulting research ultimately provided guidance for avoiding future influenza outbreaks that could affect military readiness, fighting strength, and global health.” (The announcement makes no mention of the coronavirus pandemic.)


Beyond physically scanning and labeling the millions of samples, the digital modernization project would need to allow the JPC continue to support consulting services and open research conducted by other government agencies, such as the Department of Veterans Affairs and its massive health care system.

The type of access the digital project would need to grant is called “hub-and-spoke,” meaning medical facilities will be able to access a single “hub” cloud storing the data. New data obtained by consulting medical facilities will also need to be able to be sent to the hub electronically, but still support physical slides as back-ups.

It is critical to the program that the final database of the millions of specimens be readable and database-queryable to support AI and ML analysis, according to the announcement. The data should also be able to be interoperable with a patient’s electronic health record.

“The goal of this is to utilize final pathology reports in AI/ML generation,” according to the sources sought document. “To achieve this, the final reports should be amenable to database queries and analysis.”

Latest Podcasts