Michal Fapšo: Using Google Text-to-Speech

Sunday, January 29, 2012

Using Google Text-to-Speech

Few examples first:

...reading a short excerpt of the GNU GPL licence in various languages:

English (en):
German (de):
French (fr):
Spanish (es):
Hungarian (hu):
Czech (cs):
Slovak (sk):

What is it good for?

Sometimes a text-to-speech (TTS) may come in handy. When you are on a bike or on a walk or your eyes are tired of reading text from your computer screen, just convert the text to MP3 and listen to it anywhere.

Why Google TTS?

They support lots of languages. In Google Translate there is a speaker icon under the translated text, so you can listen to the translation. However, it only works for short texts (under 100 characters).

A simple way of using the Google TTS in Perl is here: http://tonyvirelli.com/slider/sweet-google-tts/

How to use it for longer texts?

I didn't test it in Windows or Mac, but if you are able to install perl and sox, it should run fine.
For Linux:

Download this script: speak.pl
Install these packages: libwww-perl sox libsox-fmt-mp3
Usage:
```
 echo "Hello world" | ./speak.pl en speech.mp3
 cat file.txt       | ./speak.pl en speech.mp3
 
```
It reads text from the standard input and generates speech.mp3 as output. For Slovak language use "sk" instead of "en". For code names of other languages, look at the table below.

Note: if sox complains about the mp3 format, download the source code here: http://sox.sourceforge.net/, install packages libmp3lame-dev libmad0-dev and compile sox.

How does it work?

The script splits the input text to at most 100 characters long chunks. Each chunk is then sent to the Google TTS and the received mp3 output is stored. Silence at the beginning and end is cut off, because it kind of disconnects the chunks. Then a shorter silence is appended to each chunk depending on its last character. After a dot "." the silence is longer then between ordinary words.

Punctuation marks ".!?," indicate end of chunk, but sometimes a sentence is too long without any punctuation mark, so the split sounds more artificial.

Of course, feel free to modify the source code to suit your needs.

Other implementations

SpeakIt (Google Chrome Plug-in)

Supported languages

Code name	Language
af	Afrikaans
sq	Albanian
am	Amharic
ar	Arabic
hy	Armenian
az	Azerbaijani
eu	Basque
be	Belarusian
bn	Bengali
bh	Bihari
bs	Bosnian
br	Breton
bg	Bulgarian
km	Cambodian
ca	Catalan
zh-CN	Chinese (Simplified)
zh-TW	Chinese (Traditional)
co	Corsican
hr	Croatian
cs	Czech
da	Danish
nl	Dutch
en	English
eo	Esperanto
et	Estonian
fo	Faroese
tl	Filipino
fi	Finnish
fr	French
fy	Frisian
gl	Galician
ka	Georgian
de	German
el	Greek
gn	Guarani
gu	Gujarati
ha	Hausa
iw	Hebrew
hi	Hindi
hu	Hungarian
is	Icelandic
id	Indonesian
ia	Interlingua
ga	Irish
it	Italian
ja	Japanese
jw	Javanese
kn	Kannada
kk	Kazakh
rw	Kinyarwanda
rn	Kirundi
ko	Korean
ku	Kurdish
ky	Kyrgyz
lo	Laothian
la	Latin
lv	Latvian
ln	Lingala
lt	Lithuanian
mk	Macedonian
mg	Malagasy
ms	Malay
ml	Malayalam
mt	Maltese
mi	Maori
mr	Marathi
mo	Moldavian
mn	Mongolian
sr-ME	Montenegrin
ne	Nepali
no	Norwegian
nn	Norwegian (Nynorsk)
oc	Occitan
or	Oriya
om	Oromo
ps	Pashto
fa	Persian
pl	Polish
pt-BR	Portuguese (Brazil)
pt-PT	Portuguese (Portugal)
pa	Punjabi
qu	Quechua
ro	Romanian
rm	Romansh
ru	Russian
gd	Scots Gaelic
sr	Serbian
sh	Serbo-Croatian
st	Sesotho
sn	Shona
sd	Sindhi
si	Sinhalese
sk	Slovak
sl	Slovenian
so	Somali
es	Spanish
su	Sundanese
sw	Swahili
sv	Swedish
tg	Tajik
ta	Tamil
tt	Tatar
te	Telugu
th	Thai
ti	Tigrinya
to	Tonga
tr	Turkish
tk	Turkmen
tw	Twi
ug	Uighur
uk	Ukrainian
ur	Urdu
uz	Uzbek
vi	Vietnamese
cy	Welsh
xh	Xhosa
yi	Yiddish
yo	Yoruba
zu	Zulu

23 comments:

Anonymous said...: Hi,
nice job of wrapping the google translate synthesis to produce speech.
Have you ever thought of wrapping android svox (pico2wav) the same way?
I have svox running on my ubunu, but it accepts only very short strings, so splitting them in a similar manner could be also useful.
best regards,
newsgrabber@poczta.onet.pl; February 5, 2012 at 9:18 PM
Michal Fapšo said...: Thanks, I didn't try the Svox Text-to-Speech, but if it has a command line interface, you can easily modify my script speak.pl, namely the function SentenceToMp3() which takes a short text and an index of the sentence and calls the Google TTS over HTTP, stores the mp3 file and returns the filename. Just replace the expression

my $resp = $browser->get(...)

with

my $resp = system("pico2wav -language $language -output-file $mp3_out -text '$sentence'");

I don't know the correct names of arguments of pico2wav, so replace them with the real ones.

Then also remove this line which replaces space with + sign. It makes sense only for URLs:

$sentence =~ s/ /+/g;

and look for the number 100 and replace it with the maximum number of characters which can be processed by single pico2wav call.

Hope that helped :o); February 6, 2012 at 1:09 PM
Unknown said...: Hi,
sorry for the off topic, but maybe you can help me.

I am working for a long time with google traslator, I mean with the voice of TTS, an english female voice.

Google has recently (in october 2012 I think) change this voice and now is a male voice.

I use the voice as a singer of my songs. You can take a look at http://loudsouldisease.wordpress.com/2012/05/07/the-wait/ and listen to her.
I need her voice to finish my nusical work.

Can you tell me, do you kown where I can find this female voice?

I have listened the english voice of espeak, but is not the same.

I have emailed to google but no answer.

Tank you; November 2, 2012 at 9:52 PM
Michal Fapšo said...: Hi "Loud Sin Desire",

Google uses their own voices and as far as I know, they do not provide those voices to public.

Maybe you could check it on some Android smartphone. There should be Google's TTS and maybe you could even switch between male and female voices.

Good luck with your music!
Michal; November 4, 2012 at 9:47 AM
Anonümus said...: Great script. I've been using it to have my Raspberry Pi read the weather report. Recently Google started to provide the MP3s in varying sample rates which makes SOX fail. I guess it needs a step to resample the MP3 chunks. I am currently trying to integrate a one-liner to fix this but you would probably find a nicer perl way for your script.

The batch resampler one-liner:
find . -maxdepth 1 -name '*.mp3' -type f -print0 | xargs -0 -t -r -I {} sox {} -r 16000 16000/{}
(it needs the subdirectory 16000 to exist); November 16, 2012 at 9:08 AM
Emad William said...: Google Translate changed the sampling rate.

To make the script work again, all you need to do is to search for "22050" and replace it with "16000" (without quotes); February 12, 2013 at 9:42 PM
KirknesS said...: Thanks nice job back there using google translate to produce speech.

Well I heard about some text to voice websites like www.tingwo.co/ are one of the best appearing comments, in my view, especially considering that they are all currently 100 % free. They seem to have more modifications on how it says certain words, which creates studying more time content more digestible and less automatic.; February 19, 2013 at 7:02 AM
Nelle said...: It was my first time to use Google text-to-speech tool and I really didn't know how to do it. I am lucky to find this post of yours because now I understand why it very useful and how to use it.

Thanks!; February 23, 2013 at 5:44 PM
patternpusher said...: FYI, you can pass a more specific locale in the tl parameter, e.g.

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_us

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_gb

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_au; February 26, 2013 at 12:51 AM
Sergio said...: I had to install :: perl -MCPAN -e 'install WWW::Mechanize'

thanks :); June 16, 2013 at 2:17 PM
Anonymous said...: some of the languages you listed are not supported. for example, Oromo and amharic; September 13, 2013 at 3:46 PM
Anonymous said...: Hi,

I tried speak.pl on a windows machine (has sox and perl installed on it), but it does not generate the .mp3 file. Do you have any instruction on how to run this on a windows machine?; October 1, 2013 at 2:48 PM
Anonymous said...: Adding to my previous post this is the error I get when running in windows cmd (same Lame lilbrary is missing!)

C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
sox.exe FAIL util: Unable to load LAME encoder library (libmp3lame).
sox.exe FAIL formats: can't open output file `text.mp3.tmp/0002_sil.mp3':
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...sox.exe FAIL util: Unable to load MAD decoder library (libmad).
sox.exe FAIL formats: can't open input file `text.mp3.tmp/0002_sil.mp3':
done; October 1, 2013 at 3:16 PM
Anonymous said...: Again it is me. I managed to install the lame encoders. Now speak.pl seems to execute fine. But text.mp3 is empty (i.e. silence):

C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
mp3_out: text.mp3.tmp/0000.mp3
http://translate.google.com/translate_tts?q=+Hi+there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
exec sox.exe -n -r 16000 text.mp3.tmp/0002_sil.mp3 trim 0.0 0.05
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...
exec sox.exe text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3 text.mp3
done; October 1, 2013 at 3:56 PM
Michal Fapšo said...: Hi everyone, I just fixed the sample rate issue. Thanks for pointing it out, Anonümus :o)

Now for, Anonymous :o), the problem might be in your sox binary. Did you check the mp3 files inside the text.mp3.tmp folder? Are they also empty?

Here is my testing output:

$ echo "Hello world" > test.txt
$ ./speak.pl en test.txt test.mp3
line: Hello world
sentence[0]: Hello world
URL: http://translate.google.com/translate_tts?tl=en&q=+Hello+world
Concatenate: test.mp3.tmp/0000_trim.mp3 test.mp3.tmp/0002_sil.mp3
Writing output to test.mp3...done; October 2, 2013 at 11:35 AM
Anonymous said...: Inside text.mp3.tmp folder there is:

0000.mp3
0000_trim.mp3
0002_sil.mp3

They all seem to be 0 seconds long.

So you think it is a problem with the sox binary!; October 2, 2013 at 1:09 PM
Michal Fapšo said...: 0000.mp3 is the file you got from google. If it is empty, then it has nothing to do with sox. Does it really have 0 bytes?

If you open this link: http://translate.google.com/translate_tts?tl=en&q=+Hi+there ...you should hear the mp3 file. Is that mp3 correct?; October 2, 2013 at 1:18 PM
Anonymous said...: 1) The mp3 file is OK when I open link http://translate.google.com/translate_tts?tl=en&q=+Hi+there

1) 0000.mp3 is 4KB and is playing (1 second long)
2) 0000_trim.mp3 is 1KB and 0 seconds
3) 0002_sil.mp3 is 1KB and 0 seconds
4) text.mp3 (output mp3) is 1KB and 0 seconds long

I have the impression there is a problem with generating trim and sil. Is the following piped commands correct in msdos:

exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse

because this is where I get this warning: sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error; October 3, 2013 at 10:23 AM
eshe' said...: Hey there. I'm trying to use the google tts to play foreign translations of individual english words.

The audio works the first time but, after a day, disappears and won't reappear until i refresh each link in my own browser. It seems the issue is permissions? But there's no API key for tts. Is there any way to use rel=noreferrer or some other coding to work around the permissions issue?; May 5, 2014 at 8:00 PM
Anonymous said...: More required packages on Ubuntu:

libwww-mechanize-perl; November 29, 2014 at 7:07 PM
Anonymous said...: How would you #!/bin/bash this into an .sh file?; May 4, 2015 at 1:37 PM
Anna Schafer said...: Have your student/child think of more words that rhyme with the example word. best virtual assistant program; March 10, 2016 at 11:29 AM
Unknown said...: Just explore and build text to speech system with a strong significance on rhythm and prosody of speech that is closer to the natural enunciation.; November 13, 2017 at 7:23 AM