IBM's new speech AI creates a natural voice from 5 minutes of audio

Matti Robinson
1 Oct 2019 12:15

Artificial intelligence has been developing in massive leaps and bounds in the recent years. This is probably most evident to customers in products like Google Assistant but there's a lot of influential work being done behind the scenes that advance the capabilities.

One of the companies working on improvements in AI is IBM. They are developing the algorithms and models that recognize speech patterns and styles. Their text-to-speech (TTS) modeling has advanced so much that it can create a nearly indiscernible and infinitely adaptable human voice from just 5 minutes of talking.

The speech synthesizer can improve itself even after this, and after 10 or especially 20 minutes of "listening" to sample speech it can reproduce the voice in whatever text very naturally.

IBM says that the trick to the impressive performance is the modular architecture of the neural speech synthesis. This means that the system detects and trains each aspect of the voice independently.

This makes the result retain the original character of the voice.

But you don't have to believe us, here you can listen to the samples (5 min, 10 min, 20 min) for different voices. You can also make them speak text of your choice here.

More from us

Napster Sold – Yet Again

03/27/2025 13:16

YouTube viewing now happens in TVs

02/11/2025 14:43

Sony to end production of writable Blu-ray discs, MiniDiscs and MiniDV tapes

01/24/2025 11:27

Nintendo announces Switch 2 - Long video released, but specs still under wraps

01/16/2025 11:33

IBM's new speech AI creates a natural voice from 5 minutes of audio

More from us

Tags