Creating audiobooks


How to create audiobooks from epub and mobi using text-to-speech

Toby Kurien articles audiobooks epub mobi pandoc text-to-speech

I've recently started listening to more audiobooks than actually reading books (physical or on an e-reader), as it allows me to consume more books and long-form articles. This is because I can be listening to a book while outside, or while performing a chore, whereas reading requires setting aside a lot of uninterrupted time. I also feel that reading while outdoors in a beautiful setting seems like a waste of the setting, since I'm sucked into the book and not actually looking around and enjoying the environment.

The downside of audiobooks is that there may not always be an audiobook version of what you want to read. Then there's the DRM-riddled services which offer audiobooks, but you must listen on their terms, and if they delete your account, you loose access to the books. For these reasons, I never really bothered with audiobooks before.

I've found three options for DRM-free audiobooks for any book or text (e.g. long-form articles): find the audiobook on YouTube and download the audio track (using for example Invidious or youtube-dl); use an ebook reader app such as Librera/FBReader to read the book using text-to-speech; or create an audiobook using a text-to-speech service. The real-time text-to-speech option (using an app) has downsides: it either needs a constant internet connection to use a web service to perform the TTS, or else it's an offline version but the quality of TTS is usually pretty bad. I decided that I'd much prefer good quality offline audiobooks that I can listen to anytime, anywhere.

To create my audiobook, I follow this process: convert the article/book to text, clean up the text if necessary, generate an audio file using text-to-speech, and post-process the audio file if necessary.

Convert to text

For long form articles from the web, I simply copy and paste the article content into a text file. For epub and mobi ebooks, I use the pandoc utility to convert to plain text as follows:

pandoc -i "thebook.epub" -t plain -o "thebook.txt"

Depending on the book, you may need to clean up the text, for example it may contain many dashes separating chapters, which would come out in the audio as "dash dash dash dash ...". You'll probably also want to remove the preface, table of contents, index, etc. I find vi to be a great editor for this, since the text files are usually very large (which many editors will struggle with) and vi makes it easy to "delete everything up to here" or "delete everything from here".

Use text-to-speech to generate audio

I tried several solutions for this (such as flite, espeak, svox, festival, etc.), but the best one I could find makes use of Google Translate to do the text-to-speech synthesis. The default voice sounds quite natural, and can pronounce names, technical terms, and common abbreviations correctly, which other TTS systems fail on.

An easy way to make use of Google Translate for text-to-speech is to use the Python package gtts:

pip3 install gtts
gtts-cli -f "mybook.txt" > "mybook.mp3"

The speech synthesis takes really long, it's only a little faster than real-time, so a 10 hour book could take 6 or more hours to generate. Best to leave it to run overnight. Sometimes it may time out, leaving you with a portion of a book. In this case, instead of re-generating the entire thing, listen to the last portion, find the corresponding text in the text file, and delete everything above it, then generate a new file. It's easy to join these up afterwards. Alternatively, generate each chapter as a separate file, allowing you to skip chapters and making it easier to recover from failed TTS runs.

Audio post-processing

One thing you may notice with the Google TTS method used above is that the text is read back very slowly. This is usually not a problem, as you can simply increase playback speed to compensate for this. I use Audacity to increase the playback speed by using the "Change Tempo" effect to increase the tempo by 50%. Audacity also provides an easy way to stitch together multiple files if needed.

Final thoughts

This might all seem like a lot of effort to go to in order to listen to an ebook, but if you care about having a DRM-free copy of your ebook that you can listen to on your own terms, it might just be worth the effort. I get some of my books from DRM-free sites that support independent authors (such as SmashWords), and this gives me a great way to make audiobook versions that I can listen to at my leisure. It is also one of those rare instances when proprietary Big Tech services can be used to my advantage, so I'm keeping my fingers crossed that gtts will continue to work for a long time.