Transforming Brain Signals Into Speech Sounds

— Novel brain-computer interface produces naturally-sounding synthetic speech

by Judy George, Senior Staff Writer, 番茄社区 April 24, 2019

A graphic depicting an interface that uses brain signals to control a virtual vocal tract.

A novel brain-computer interface, with sensors implanted temporarily into individuals whose skulls had been opened for other purposes, translated neural activity into intelligible speech sounds, researchers reported.

The system, which decoded neural signals for the jaw, larynx, lip, and tongue movements involved in speaking, showed it's possible to create a synthesized version of a person's voice that is controlled by brain activity, reported Edward Chang, MD, of the University of California San Francisco (UCSF), and co-authors in .

The advancement could have a profound effect one day for people who cannot speak. "It's been a long-standing goal of our lab to create technologies to restore communication for patients with severe speech disability either from neurological conditions, such as stroke or other forms of paralysis, or conditions that result in the inability to speak," Chang said in a press conference.

"We want to create technologies that can reproduce speech directly from human brain activity," he continued. "This study provides a proof of principle that this is possible."

Existing brain-computer interfaces typically are letter-by-letter devices that rely on small eye or facial muscle movements, allowing people with paralysis to type about eight or ten words a minute. Natural speech, in contrast, averages about 150 words a minute.

The new system builds on a recent study that demonstrated how the sensorimotor cortex choreographs to produce fluent speech. To develop the interface, Chang and colleagues worked with five volunteers with intact speech who had electrodes temporarily implanted for intracranial monitoring during epilepsy surgery. The research team recorded high-density electrocorticography (ECoG) signals while the volunteers spoke several hundred sentences aloud, including passages from Sleeping Beauty and Alice in Wonderland.

Example of the array of intracranial electrodes used to record brain activity. Credit: UCSF

The researchers then used a two-stage decoding approach: they first transformed the neural signals into representations by figuring out the vocal tract movements needed to pronounce each word, then converted the decoded movements into spoken words and sentences. By mapping sound with anatomy, the researchers created a virtual vocal tract each volunteer could control by brain activity.

"The relationship between the movements of the vocal tract and the speech sounds that are produced is a complicated one," said co-author Gopala Anumanchipalli, PhD, also of UCSF. "We reasoned that if these speech centers in the brain are encoding movements rather than sounds, we should try to do the same in decoding those signals."

The resulting synthetic speech was easier to understand than synthetic speech decoded straight from brain activity without the vocal tract simulations. To test its intelligibility, the researchers conducted two listening tasks on Amazon Mechanical Turk -- one involving single-word identification, the other involving sentence-level transcription -- and found that naïve listeners could understand the reconstructed speech, with up to 43% of the synthesized trials transcribed perfectly.

The decoder also could synthesize speech when a volunteer silently mouthed sentences, though the performance of mimed speech was inferior.

These results provide "a compelling proof of concept for a speech-synthesis brain-computer interface, both in terms of the accuracy of audio reconstruction and in the ability of listeners to classify the words and sentences produced," observed Chethan Pandarinath, PhD, and research assistant Yahia Ali, both of Emory University and Georgia Institute of Technology in Atlanta, in an .

But challenges remain before a clinically viable interface is available, they noted. "The intelligibility of the reconstructed speech was still much lower than that of natural speech," they pointed out, although additional improvements might be obtained with neural interfaces that record more localized brain activity than ECoG, such as intracortical microelectrode arrays.

Whether people who cannot produce speech-related movements can use a system like this is not known, but similar questions were raised after the first proof-of-concept studies of interfaces to control arm and hand movements, Pandarinath and Ali added. "Subsequent clinical trials have compellingly demonstrated rapid communication, control of robotic arms, and restoration of sensation and movement of paralyzed limbs in humans using these brain-computer interfaces," they wrote.

The researchers currently are experimenting with higher-density electrode arrays and more advanced machine learning algorithms to improve synthesized speech. "We've got to make it more natural, more intelligible," Chang said. "There's a lot of engineering going on in the group here to figure out how to improve it."

And while the experiment was conducted in people who spoke normally, the ultimate goal of the research is to help people who have a communication disability, Chang emphasized. It's not clear whether the same algorithms will work for someone who is not speaking, he noted; that may need to be figured out through a clinical trial. "But we're getting there," he said. "We're getting close."

Disclosures

This research was funded by the NIH BRAIN Initiative and supported by the New York Stem Cell Foundation, the Howard Hughes Medical Institute, the McKnight Foundation, the Shurl and Kay Curci Foundation, and the William K. Bowes Foundation.

The authors declared no competing interests.

Primary Source

Nature

Anumanchipalli1 G, "Speech synthesis from neural decoding of spoken sentences" Nature 2019; DOI:https://doi.org/10.1038/s41586-019-1119-1.

Secondary Source

Nature

Pandarinath C, Ali Y "Brain implants that let you speak your mind" Nature 2019: 568:466-467.