adamwilcox.org

Digital Narration

Apple today announced the launch of Digital Narration; or artificial recordings of books.

Apple Books digital narration brings together advanced speech synthesis technology with important work by teams of linguists, quality control specialists, and audio engineers to produce high-quality audiobooks from an ebook file.

There are a couple of example clips on Apple’s website, which are absolutely worth your time listening to.

The previews are certainly impressive; a little off to my ears but then again if I didn’t know it wasn’t a real human voice would I notice? Difficult to say.

To start with, Apple’s efforts are targeting “independent Authors”, without the backing of large publishing houses and the money to invest in a professional reader. That said, it’s probably only a matter of time before this is more widely used. It’s clear that AI will slowly devour the world, or at least take large chunks out of obvious, low-hanging fruit.

However, the examples Apple provides are unsurprisingly all scene setting sections of narrative. They do not include dialogue. This is important.

Here’s a clip from the Corgi Audio abridged recording of Small Gods, written by Terry Pratchett, read by Tony Robinson:

The choices Tony Robinson made; the pace, the characterisation, the performance of a very specific breed of evil - these are not something easily automated.

A professional voice over artist can portray conflicted emotions and complex human feelings with just their voice.

You’re listening to a fiction piece and a character is supposed to be Scottish. How Scottish? What specific Scottish dialect? Or you have a variety of characters that appear throughout the book. A professional actor will have a picked a different voice for each, based on the character traits in the text.

The underworld gangster, the cynical detective, the sozzled vicar, and the plucky young adventurer. You can hear each of their voices in your head, but can an AI reader accurately perform each voice or make logical choices as to the sound of each voice?

According to Apple, it currently takes one to two months to process the book and conduct quality checks. So presumably there is some human direction in the process.

I wonder if audiobooks, though clearly an increasingly profitable segment of the publishing industry, are the right place for AI narration. Documentaries and other television programmes seems like a more obvious starting point; they are often short bursts of description or scene setting, and minimal emotion. A service like Audm, which currently uses profesional actors to voice longform journalism, seems like a good candidate. Or what about the few lines needed in an 40 second television advert? I imagine soon there will be specific AI voices licensed to a brand, like a corporate font or logo.

When ebooks were first taking off, I expected they would finish off the publishing industry in short order. Printing presses would close, and that would be the end of it. But it didn’t happen. Over the last 12 months, I’ve bought a surprising number of real, paper books. The delight of holding them in my hands, the physicality of each book, it’s cover and weight and texture is important. I haven’t bought any ebooks in years.