A newly assembled FDA advisory committee recommended several approaches to how the agency should handle regulation of generative artificial intelligence (AI)-enabled medical devices during a 2-day meeting that wrapped up Thursday.
The Digital Health Advisory Committee (DHAC) held its first meeting to offer guidance to the FDA on a slew of questions related to the development, evaluation, implementation, and continued monitoring of AI-enabled medical devices.
During the opening remarks, FDA Commissioner Robert Califf, MD, said the DHAC would provide important advice and recommendations on the benefits and risks associated with all digital health technologies, including generative AI-enabled medical devices.
"We've established this committee because we see great potential for digital health technologies to help address critical healthcare issues that we face today, and we need these technologies to be developed, deployed, and used responsibly in the best interest of patients and consumers," Califf said, adding that "artificial intelligence is changing how we think about health and healthcare, and it's one of the most exciting and promising areas of science because it's built to transcend boundaries."
In the shared before the meeting, FDA staff asked the committee to consider planning and design, data collection and management, model building and tuning, verification and validation, model deployment, operation and monitoring, and real-world performance evaluation.
The discussion produced a substantial number of considerations and ideas for how the FDA should approach all of the phases of the development process, said committee chair Ami Bhatt, MD, the chief innovation officer at the American College of Cardiology.
"There are a lot of eyes, a lot of opinions, many companies, research labs, friends of ours, and -- most importantly -- patients who are hopeful for the promise of generative AI, and that's why we consider our job here the development of an infrastructure for growth with guardrails," she said. "This is not the end, but only the beginning of a process of continuous change."
While the committee did not vote on specific recommendations for the agency, Bhatt noted that they were able to create "an actionable framework" for how generative AI-enabled devices should be handled by the FDA moving forward. Following the FDA's list of discussion questions for the meeting, the committee members offered a framework based on three distinct areas: premarket performance evaluation, risk management, and postmarket performance monitoring.
Among the premarket performance considerations, the committee members said the agency should develop custom, multi-dimensional frameworks to evaluate the overall correctness of AI models, the effectiveness of the intended generative AI outputs based on user inputs or prompts, and the ability to identify potential harms and risks introduced by generative AI-enabled devices.
Along those lines, the committee noted that the FDA should consider developing a small set of widely accepted metrics and methods to be used in the evaluation of such devices.
They also said the agency should develop standard definitions of terms and concepts to discuss generative AI, especially for key limitations such as , data drift, and hallucinations. Notably, the lack of consistent definitions for several terms related to generative AI presented challenges during the meeting on several occasions.
The committee also pointed to a lack of sufficient study designs to test these devices for clinical use, and urged the FDA to explore potential use of alternative study approaches, such as synthetic control trials, to improve evaluation of the comparative efficacy of these devices.
For postmarket performance evaluation, the committee said the agency should consider approaches for scaling evaluations after a device is widely adopted by clinicians or consumers. They also emphasized the need to automate those monitoring and evaluation processes to avoid time-consuming and costly human review of these devices once they are used at larger scales.
In addition, they said the FDA should consider establishing new frameworks for understanding the impact of generative AI-enabled devices on society once they are on the market.
For this, the committee recommended that the agency consider establishing a centralized data repository and reporting mechanism that can be used to track errors and harms caused by these devices, and noted that these tools could also be used to continuously monitor device performance across various populations and settings.
The committee also emphasized that the agency must keep in mind the impact of these devices on health equity. They recommended that the FDA develop requirements for companies to implement and demonstrate how safeguards are protecting against built-in or learned biases over time.
Finally, the committee said that the agency should develop certification programs or other standards to ensure that companies who develop these devices understand the risk for bias in their generative AI-enabled devices.
After 2 days of in-depth discussion about these issues, the committee members noted that developing this regulatory infrastructure would be an ongoing process. Bhatt acknowledged that this process would be incremental, but that establishing clear guidelines for the implementation of generative AI in the healthcare setting could help improve healthcare delivery across the country in the near future.
"One challenge we face, because generative AI is oftentimes related to clinical guidelines or clinical decision support, is what the gold standard is," Bhatt said. "When we think about whether or not we're delivering gold-standard clinical guideline-derived treatment throughout the United States throughout all of the different specialties, the answer is generally no."
This technology could help improve the overall quality of care that is offered to patients, she added, so the question we should be asking is "how close does generative AI bring us to the gold standard?"