Sunday, January 29, 2012

Using Google Text-to-Speech

Few examples first:

...reading a short excerpt of the GNU GPL licence in various languages:
English (en):
German (de):
French (fr):
Spanish (es):
Hungarian (hu):
Czech (cs):
Slovak (sk):

What is it good for?

Sometimes a text-to-speech (TTS) may come in handy. When you are on a bike or on a walk or your eyes are tired of reading text from your computer screen, just convert the text to MP3 and listen to it anywhere.

Why Google TTS?

They support lots of languages. In Google Translate there is a speaker icon under the translated text, so you can listen to the translation. However, it only works for short texts (under 100 characters).

A simple way of using the Google TTS in Perl is here: http://tonyvirelli.com/slider/sweet-google-tts/

How to use it for longer texts?

I didn't test it in Windows or Mac, but if you are able to install perl and sox, it should run fine.
For Linux:
  1. Download this script: speak.pl
  2. Install these packages: libwww-perl sox libsox-fmt-mp3
  3. Usage:
     echo "Hello world" | ./speak.pl en speech.mp3
     cat file.txt       | ./speak.pl en speech.mp3
     
    It reads text from the standard input and generates speech.mp3 as output. For Slovak language use "sk" instead of "en". For code names of other languages, look at the table below.
Note: if sox complains about the mp3 format, download the source code here: http://sox.sourceforge.net/, install packages libmp3lame-dev libmad0-dev and compile sox.

How does it work?

The script splits the input text to at most 100 characters long chunks. Each chunk is then sent to the Google TTS and the received mp3 output is stored. Silence at the beginning and end is cut off, because it kind of disconnects the chunks. Then a shorter silence is appended to each chunk depending on its last character. After a dot "." the silence is longer then between ordinary words.

Punctuation marks ".!?," indicate end of chunk, but sometimes a sentence is too long without any punctuation mark, so the split sounds more artificial.

Of course, feel free to modify the source code to suit your needs.

Other implementations

Supported languages

Code name   Language
afAfrikaans
sqAlbanian
amAmharic
arArabic
hyArmenian
azAzerbaijani
euBasque
beBelarusian
bnBengali
bhBihari
bsBosnian
brBreton
bgBulgarian
kmCambodian
caCatalan
zh-CNChinese (Simplified)
zh-TWChinese (Traditional)
coCorsican
hrCroatian
csCzech
daDanish
nlDutch
enEnglish
eoEsperanto
etEstonian
foFaroese
tlFilipino
fiFinnish
frFrench
fyFrisian
glGalician
kaGeorgian
deGerman
elGreek
gnGuarani
guGujarati
haHausa
iwHebrew
hiHindi
huHungarian
isIcelandic
idIndonesian
iaInterlingua
gaIrish
itItalian
jaJapanese
jwJavanese
knKannada
kkKazakh
rwKinyarwanda
rnKirundi
koKorean
kuKurdish
kyKyrgyz
loLaothian
laLatin
lvLatvian
lnLingala
ltLithuanian
mkMacedonian
mgMalagasy
msMalay
mlMalayalam
mtMaltese
miMaori
mrMarathi
moMoldavian
mnMongolian
sr-MEMontenegrin
neNepali
noNorwegian
nnNorwegian (Nynorsk)
ocOccitan
orOriya
omOromo
psPashto
faPersian
plPolish
pt-BRPortuguese (Brazil)
pt-PTPortuguese (Portugal)
paPunjabi
quQuechua
roRomanian
rmRomansh
ruRussian
gdScots Gaelic
srSerbian
shSerbo-Croatian
stSesotho
snShona
sdSindhi
siSinhalese
skSlovak
slSlovenian
soSomali
esSpanish
suSundanese
swSwahili
svSwedish
tgTajik
taTamil
ttTatar
teTelugu
thThai
tiTigrinya
toTonga
trTurkish
tkTurkmen
twTwi
ugUighur
ukUkrainian
urUrdu
uzUzbek
viVietnamese
cyWelsh
xhXhosa
yiYiddish
yoYoruba
zuZulu

23 comments:

Anonymous said...

Hi,
nice job of wrapping the google translate synthesis to produce speech.
Have you ever thought of wrapping android svox (pico2wav) the same way?
I have svox running on my ubunu, but it accepts only very short strings, so splitting them in a similar manner could be also useful.
best regards,
newsgrabber@poczta.onet.pl

Michal Fapšo said...

Thanks, I didn't try the Svox Text-to-Speech, but if it has a command line interface, you can easily modify my script speak.pl, namely the function SentenceToMp3() which takes a short text and an index of the sentence and calls the Google TTS over HTTP, stores the mp3 file and returns the filename. Just replace the expression

my $resp = $browser->get(...)

with

my $resp = system("pico2wav -language $language -output-file $mp3_out -text '$sentence'");

I don't know the correct names of arguments of pico2wav, so replace them with the real ones.

Then also remove this line which replaces space with + sign. It makes sense only for URLs:

$sentence =~ s/ /+/g;

and look for the number 100 and replace it with the maximum number of characters which can be processed by single pico2wav call.

Hope that helped :o)

Loud Sin Desire said...

Hi,
sorry for the off topic, but maybe you can help me.

I am working for a long time with google traslator, I mean with the voice of TTS, an english female voice.

Google has recently (in october 2012 I think) change this voice and now is a male voice.

I use the voice as a singer of my songs. You can take a look at http://loudsouldisease.wordpress.com/2012/05/07/the-wait/ and listen to her.
I need her voice to finish my nusical work.

Can you tell me, do you kown where I can find this female voice?

I have listened the english voice of espeak, but is not the same.

I have emailed to google but no answer.

Tank you

Michal Fapšo said...

Hi "Loud Sin Desire",

Google uses their own voices and as far as I know, they do not provide those voices to public.

Maybe you could check it on some Android smartphone. There should be Google's TTS and maybe you could even switch between male and female voices.

Good luck with your music!
Michal

Anonümus said...

Great script. I've been using it to have my Raspberry Pi read the weather report. Recently Google started to provide the MP3s in varying sample rates which makes SOX fail. I guess it needs a step to resample the MP3 chunks. I am currently trying to integrate a one-liner to fix this but you would probably find a nicer perl way for your script.

The batch resampler one-liner:
find . -maxdepth 1 -name '*.mp3' -type f -print0 | xargs -0 -t -r -I {} sox {} -r 16000 16000/{}
(it needs the subdirectory 16000 to exist)

Emad William said...

Google Translate changed the sampling rate.

To make the script work again, all you need to do is to search for "22050" and replace it with "16000" (without quotes)

Willard Catalan said...

Thanks nice job back there using google translate to produce speech.

Well I heard about some text to voice websites like www.tingwo.co/ are one of the best appearing comments, in my view, especially considering that they are all currently 100 % free. They seem to have more modifications on how it says certain words, which creates studying more time content more digestible and less automatic.

fix iphone said...

It was my first time to use Google text-to-speech tool and I really didn't know how to do it. I am lucky to find this post of yours because now I understand why it very useful and how to use it.

Thanks!

patternpusher said...

FYI, you can pass a more specific locale in the tl parameter, e.g.

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_us

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_gb

http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_au

Sergio Luiz Araújo Silva said...

I had to install :: perl -MCPAN -e 'install WWW::Mechanize'

thanks :)

Anonymous said...

some of the languages you listed are not supported. for example, Oromo and amharic

Anonymous said...

Hi,

I tried speak.pl on a windows machine (has sox and perl installed on it), but it does not generate the .mp3 file. Do you have any instruction on how to run this on a windows machine?

Anonymous said...

Adding to my previous post this is the error I get when running in windows cmd (same Lame lilbrary is missing!)

C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
sox.exe FAIL util: Unable to load LAME encoder library (libmp3lame).
sox.exe FAIL formats: can't open output file `text.mp3.tmp/0002_sil.mp3':
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...sox.exe FAIL util: Unable to load MAD decoder library (libmad).
sox.exe FAIL formats: can't open input file `text.mp3.tmp/0002_sil.mp3':
done

Anonymous said...

Again it is me. I managed to install the lame encoders. Now speak.pl seems to execute fine. But text.mp3 is empty (i.e. silence):

C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
mp3_out: text.mp3.tmp/0000.mp3
http://translate.google.com/translate_tts?q=+Hi+there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
exec sox.exe -n -r 16000 text.mp3.tmp/0002_sil.mp3 trim 0.0 0.05
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...
exec sox.exe text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3 text.mp3
done

Michal Fapšo said...

Hi everyone, I just fixed the sample rate issue. Thanks for pointing it out, Anonümus :o)

Now for, Anonymous :o), the problem might be in your sox binary. Did you check the mp3 files inside the text.mp3.tmp folder? Are they also empty?

Here is my testing output:

$ echo "Hello world" > test.txt
$ ./speak.pl en test.txt test.mp3
line: Hello world
sentence[0]: Hello world
URL: http://translate.google.com/translate_tts?tl=en&q=+Hello+world
Concatenate: test.mp3.tmp/0000_trim.mp3 test.mp3.tmp/0002_sil.mp3
Writing output to test.mp3...done

Anonymous said...

Inside text.mp3.tmp folder there is:

0000.mp3
0000_trim.mp3
0002_sil.mp3

They all seem to be 0 seconds long.

So you think it is a problem with the sox binary!

Michal Fapšo said...

0000.mp3 is the file you got from google. If it is empty, then it has nothing to do with sox. Does it really have 0 bytes?

If you open this link: http://translate.google.com/translate_tts?tl=en&q=+Hi+there ...you should hear the mp3 file. Is that mp3 correct?

Anonymous said...

1) The mp3 file is OK when I open link http://translate.google.com/translate_tts?tl=en&q=+Hi+there

1) 0000.mp3 is 4KB and is playing (1 second long)
2) 0000_trim.mp3 is 1KB and 0 seconds
3) 0002_sil.mp3 is 1KB and 0 seconds
4) text.mp3 (output mp3) is 1KB and 0 seconds long

I have the impression there is a problem with generating trim and sil. Is the following piped commands correct in msdos:

exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse

because this is where I get this warning: sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error

eshe' said...

Hey there. I'm trying to use the google tts to play foreign translations of individual english words.


The audio works the first time but, after a day, disappears and won't reappear until i refresh each link in my own browser. It seems the issue is permissions? But there's no API key for tts. Is there any way to use rel=noreferrer or some other coding to work around the permissions issue?

Anonymous said...

More required packages on Ubuntu:

libwww-mechanize-perl

Manuel Cevallos said...

How would you #!/bin/bash this into an .sh file?

jowdjbrown said...

Have your student/child think of more words that rhyme with the example word. best virtual assistant program

Indian Tts said...

Just explore and build text to speech system with a strong significance on rhythm and prosody of speech that is closer to the natural enunciation.