
As part of its fantastic body of work on speech and voice models, Apple has just published a new study that takes a very human-centric approach to a tricky machine learning problem: not just recognizing what was said, but how it was said. And the accessibility implications are monumental.
In the paper, researchers introduce a framework for analyzing speech using what they call Voice Quality Dimensions (VQDs), which are interpretable traits like intelligibility, harshness, breathiness,…