DOD must prioritize quality data collection to train AI, officials say

The DOD should contractually require training data be generated in programs for AI development, the Air Force's top AI officer said.
data, data strategy, action plan
(Getty Images)

Collecting and generating quality data sets to train artificial intelligence models needs to be a priority for the department, with some officials arguing it should be a requirement in contracts moving forward.

By being proactive about collecting and generating data, the future of AI can be built on quality inputs, Michael Kanaan, director of operations at the Air Force’s AI Accelerator at MIT, said Tuesday during the AFCEA  DCAI and ML Technology Summit. Other technology officials endorsed the idea of being more aggressive about data collection rather than being “opportunistic” or working on old, lower quality data sets.

For instance, the Air Force used quality data to train a machine learning model that turned the boards that officials use to manually track flight times into an automated, intelligent system. The ML system that replaced the “puck boards” ensured pilots got enough hours to maintain mission readiness. While the program is relatively small, it shows the promise of what good data management practices could yield across the military, said Kanaan, who recently published a book on the global competition for AI dominance.

“You should try to set policies for those words ‘training’ and ‘quality data’ in your contracts,” he added.


Kanaan said now that the Joint AI Center (JAIC) has launched its Joint Common Foundation development platform, there are even more tools available to finally operationalize quality data sources.

Much of the military has talked about migrating legacy data sets into the cloud and “cleaning” old data to power for machine learning. Kanaan said that while cloud computing is critical, using old data sets is often just not worth it. The leader of the JAIC similarly has said he wants to see smaller, more “boutique” data sets for AI development.

DOD Chief Data Officer Dave Spirk also endorsed being proactive about data collection and generation for training. Spirk said that he aims to finish a departmentwide data strategy by the end of the month.

Latest Podcasts