According to new research led by CU Boulder computer scientist Theodora Chaspari, some artificial intelligence tools for healthcare may be confused by the way people of different genders and races talk.
The study hinges on a, perhaps unspoken, fact of human society: not everyone talks the same. Women, for example, tend to speak louder than men, while similar differences may appear between white and black speakers.
Now, researchers have found that those natural variations can confound algorithms that screen humans for mental health concerns like anxiety or depression. The findings add to a growing body of research showing that AI, like people, can make assumptions based on race or gender.
“If the AI is not well trained, or doesn't include enough representative data, it can propagate these human or social biases,” said Chaspari, associate professor in the Department of Computer Science.
He and his colleagues published their findings July 24 in the journal Frontiers in Digital Health.
Chaspari noted that AI could be a promising technology in the healthcare world. Fine-tuned algorithms can sift through recordings of people speaking, looking for subtle changes in the way they speak that could indicate underlying mental health concerns.
But these devices have to perform consistently for patients across many demographic groups, the computer scientist said. To find out if the AI was up to the job, the researchers fed audio samples from real humans into a combined set of machine learning algorithms. The findings raised some red flags: For example, the AI tools underdiagnosed women as having a higher risk of depression than men — a result that in the real world prevents people from getting the care they need. can stop
“With artificial intelligence, we can identify fine patterns that humans can't always perceive,” said Chaspari, who worked as a faculty member at Texas A&M University. “However, while this opportunity exists, there is also a great deal of risk.”
Speech and emotions
She added that the way people talk can be a powerful window into their underlying emotions and well-being — something poets and playwrights have long known.
Research shows that people diagnosed with clinical depression often speak more softly and consistently than others. People with anxiety disorders, meanwhile, tend to speak louder and with more “irritability,” a measure of breathiness in speech.
“We know that speech is greatly influenced by one's anatomy,” Chaspari said. “For depression, there have been some studies that have shown changes in the way the vocal folds vibrate, or even how the voice is modulated by the vocal tract.”
Over the years, scientists have developed AI tools to look for just these kinds of changes.
Chaspari and his colleagues decided to put the algorithm under the microscope. To do this, the team drew on recordings of humans talking in different scenarios: In one, people had to talk to a group of strangers for 10 to 15 minutes. In another, men and women talked longer in a setting similar to a doctor's visit. In both cases, the speakers separately filled out questionnaires about their mental health. The study involved Texas A&M undergraduate students Michael Yang and Abdullah Al-Attar.
Correcting biases
The results were visible everywhere.
In recordings of public speaking, for example, Latino participants reported feeling significantly more nervous on average than white or black speakers. However, the AI failed to detect this growing anxiety. In another experiment, the algorithm also marked an equal number of men and women as being at risk for depression. In fact, female speakers experienced depressive symptoms at much higher rates.
Chaspari noted that the team's findings are only a first step. Researchers will need to analyze recordings of many more people from a wide range of demographic groups before they can understand why the AI performed poorly in some cases — and how to correct those biases.
But, he said, the study is a sign that AI developers should proceed with caution before bringing AI tools to the medical world:
“If we think an algorithm actually reduces depression for a certain group, that's something we need to inform clinicians about.”