IBM's new speech AI creates a natural voice from 5 minutes of audio

Matti Robinson
1 Oct 2019 12:15

Artificial intelligence has been developing in massive leaps and bounds in the recent years. This is probably most evident to customers in products like Google Assistant but there's a lot of influential work being done behind the scenes that advance the capabilities.
One of the companies working on improvements in AI is IBM. They are developing the algorithms and models that recognize speech patterns and styles. Their text-to-speech (TTS) modeling has advanced so much that it can create a nearly indiscernible and infinitely adaptable human voice from just 5 minutes of talking.

The speech synthesizer can improve itself even after this, and after 10 or especially 20 minutes of "listening" to sample speech it can reproduce the voice in whatever text very naturally.
IBM says that the trick to the impressive performance is the modular architecture of the neural speech synthesis. This means that the system detects and trains each aspect of the voice independently.

This makes the result retain the original character of the voice.

But you don't have to believe us, here you can listen to the samples (5 min, 10 min, 20 min) for different voices. You can also make them speak text of your choice here.

More from us
Tags
Text-to-speech Artificial Intelligence IBM
We use cookies to improve our service.