Tech

GSA challenge found industry machine-learning models can make do with limited training data

Techniques like transfer learning have come a long way and were used to fine-tune models so they could read end-user license agreements.

By Dave Nyczepir

November 18, 2020

(Getty Images)

Several companies recently impressed the General Services Administration with their ability to use limited training data in supervised machine-learning (ML) models, says Ryan Day, director of the agency’s Digital Services Division.

As part of a recent contest on Challenge.gov, GSA tasked entrants with using ML or artificial intelligence to speed up reviews of software end-user license agreements (EULAs) — but the agency only provided several thousand rows of text with the use case.

“The first thing we learned was that industry could actually do this,” Day said Tuesday during the first day of FedTalks presented by FedScoop. “Our use case, going in we didn’t have any assumptions about whether or not it could be done with machine learning, but we found that it was a good fit.”

Normally supervised ML requires large amounts of data, but many of the 20 entries GSA received were “high quality” and used workaround techniques like transfer learning,

Transfer learning is used in natural language processing when open-source models are pre-trained with vast amounts of other text and then fine-tuned with data specific to an individual use case — in this case the EULAs.

Contracting officers (COs) generally take one to two weeks reviewing EULAs to ensure their terms and conditions align with federal law as part of the software acquisition process. COs may coordinate a legal review with the Office of General Counsel to negotiate the removal of problematic language.

The AI and Machine Learning Challenge allowed GSA to test current commercial practices, with multiple teams using the Bidirectional Encoder Representations from Transformers (BERT) language model for transfer learning.

Other teams found creative ways to augment and generate new training data, with one using a cloud tool to translate clauses into hundreds of other languages and then back into English, Day said. The new clauses had the same meaning but different diction and syntax, serving as new training data.

Yet another team proposed an application programming interface-based approach to breaking down Microsoft Word and PDF documents into clauses that predictions could be run on for determining viability.

Dev Technology placed first in October winning $15,000, while second-place Gaussian Solutions won $2,500 and third-place Team SoKat $2,500.

Meanwhile, GSA’s challenge allowed it to test commercial capabilities before developing proofs of concept, pilots and scaling into production.

“We can move some of the things that we learned into actual requirements from a business perspective, as well as a technology perspective, said Keith Nakasone, deputy assistant commissioner for acquisition in GSA’s Office of IT Category, at FedTalks. “So I think this is a good way to start; the challenge gave us some really good insight into the tools available.”

Ethical AI

As the Department of Defense, intelligence community and Department of Homeland Security begin exploring ML and AI technologies they’ve opted to establish ethical AI principles for their agencies to follow.

GSA is taking a slightly different approach by gathering ethical AI concepts from agencies participating in its AI Community of Practice, Nakasone said.

“It brings the agencies together so we can learn best practices, we can share information and also glean what we can do from creating templates and playbooks,” he said.

Industry has a role to play in informing GSA’s understanding of ethical AI as well, Nakasone said.

“Companies that are putting ethical principles out there for us to leverage is also another thing that we can consider from a contract and acquisition perspective,” he said.

GSA challenge found industry machine-learning models can make do with limited training data

Ethical AI

More Like This

The software you can’t use at NASA

Amid scrutiny into the US Secret Service, a look at how the agency uses technology

Harris likely to combine Biden AI policies with Silicon Valley-informed approach

Top Stories

More than 1,300 devices have been reported missing to USAID, document shows

GOP lawmakers, financial leaders ‘leery’ of rushing AI rules on the sector

CrowdStrike outage briefly impacted national organ transplant matching system

NIST seeks organization to stand up institute focused on AI to boost manufacturing

New TMF investments support AI Safety Institute, upgrades to nuclear emergency response

More Scoops

GSA challenges developers to speed up end-user license agreement reviews

Latest Podcasts

The VA extends its EHR contract with Oracle Center for another 11 months.

Leveraging AI to modernize government IT systems

The Coast Guard’s AI chief takes a new role focused on the 2024 presidential transition

TMF funds enhancements in nuclear and AI safety; Federal initiatives strengthen child online protection

Tech

Defense

Cyber

FedScoop TV