The Chief Digital and Artificial Intelligence Office (CDAO) has successfully completed a Crowdsourced AI Red-Teaming (CAIRT) Assurance Program pilot project focused on the use of large language model (LLM) chatbots in the context of military medicine. The CAIRT program supports the Department of Defense (DoD) in creating bottom-up, crowdsourced approaches to AI assurance and AI risk mitigation. Through crowdsourcing, projects are able to capture large volumes of data and engage a wide range of stakeholders.
This CAIRT LLM pilot project was conducted by Humane Intelligence, a technology company that is building a community of practice around algorithmic assessments, in partnership with the Defense Health Agency (DHA) and the Program Executive Office, Defense Health Management Systems (PEO DHMS). Through a red-teaming methodology - using adversary techniques to internally test system resilience - Humane Intelligence was able to effectively uncover specific system vulnerabilities. In addition, red-teaming attracts participants who want to engage with new technologies and, as potential future beneficiaries, gain the opportunity to contribute to system improvements. Previously, in the spring of 2024, CDAO hosted a valuable CAIRT exercise using the financial rewards of red-teaming.
In a recent pilot program, Humane Intelligence used crowdsourced red-teaming for two promising use cases in the context of military medicine: clinical note summarization and a chatbot for medical advice. More than 200 participants, including clinical providers and health analysts from DHA, the Uniformed Services University of the Health Sciences, and the services, took part in the exercise, which compared three popular LLMs. The exercise uncovered more than 800 findings of potential vulnerabilities and biases related to the use of these capabilities in these prospective use cases. This exercise will result in a repeatable and scalable output through the development of benchmark datasets that can be used to evaluate future vendors and tools for alignment with expected performance. In addition, these findings will play a key role in shaping DoD policies and best practices for the responsible use of Generative Artificial Intelligence (GenAI), ultimately leading to improved military medical care. If these prospective post-mission use cases include covered AI as defined in OMB M-24-10, they will follow all required risk management practices.
"As the use of GenAI for these purposes within the DoD is in the early stages of piloting and experimentation, this program is acting as an essential pioneer to generate a wealth of test data, uncover areas for consideration, and validate mitigation options that will shape future research, development, and assurance of GenAI systems that may be deployed in the future," noted the CDAO's lead for the initiative, Dr. Matthew Johnson.
As the recent pilot and others have shown, continued testing of LLM and AI systems through the CAIRT Assurance program will be critical to accelerating the CDAO's rapid AI capability cell, improving GenAI mission effectiveness, and contributing to justifiable confidence in all DoD use cases.
ABOUT CDAO
Launched in June 2022, the CDAO is dedicated to integrating and optimizing AI capabilities across DoD. The office is responsible for accelerating the adoption of data, analytics, and AI in DoD, enabling the Department's digital infrastructure and policy adoption to deliver scalable AI-based solutions for enterprise and joint use cases and protect the nation from current and emerging threats.
Pentagon/ gnews- RoZ
ILLUSTRATIVE PHOTO - pixabay