Few examples first:
...reading a short excerpt of the GNU GPL licence in various languages:English (en): | |
German (de): | |
French (fr): | |
Spanish (es): | |
Hungarian (hu): | |
Czech (cs): | |
Slovak (sk): |
What is it good for?
Sometimes a text-to-speech (TTS) may come in handy. When you are on a bike or on a walk or your eyes are tired of reading text from your computer screen, just convert the text to MP3 and listen to it anywhere.Why Google TTS?
They support lots of languages. In Google Translate there is a speaker icon under the translated text, so you can listen to the translation. However, it only works for short texts (under 100 characters).A simple way of using the Google TTS in Perl is here: http://tonyvirelli.com/slider/sweet-google-tts/
How to use it for longer texts?
I didn't test it in Windows or Mac, but if you are able to install perl and sox, it should run fine.For Linux:
-
Download this script: speak.pl
Install these packages: libwww-perl sox libsox-fmt-mp3
Usage:
echo "Hello world" | ./speak.pl en speech.mp3 cat file.txt | ./speak.pl en speech.mp3It reads text from the standard input and generates speech.mp3 as output. For Slovak language use "sk" instead of "en". For code names of other languages, look at the table below.
How does it work?
The script splits the input text to at most 100 characters long chunks. Each chunk is then sent to the Google TTS and the received mp3 output is stored. Silence at the beginning and end is cut off, because it kind of disconnects the chunks. Then a shorter silence is appended to each chunk depending on its last character. After a dot "." the silence is longer then between ordinary words.Punctuation marks ".!?," indicate end of chunk, but sometimes a sentence is too long without any punctuation mark, so the split sounds more artificial.
Of course, feel free to modify the source code to suit your needs.
Other implementations
Supported languages
Code name | Language |
---|---|
af | Afrikaans |
sq | Albanian |
am | Amharic |
ar | Arabic |
hy | Armenian |
az | Azerbaijani |
eu | Basque |
be | Belarusian |
bn | Bengali |
bh | Bihari |
bs | Bosnian |
br | Breton |
bg | Bulgarian |
km | Cambodian |
ca | Catalan |
zh-CN | Chinese (Simplified) |
zh-TW | Chinese (Traditional) |
co | Corsican |
hr | Croatian |
cs | Czech |
da | Danish |
nl | Dutch |
en | English |
eo | Esperanto |
et | Estonian |
fo | Faroese |
tl | Filipino |
fi | Finnish |
fr | French |
fy | Frisian |
gl | Galician |
ka | Georgian |
de | German |
el | Greek |
gn | Guarani |
gu | Gujarati |
ha | Hausa |
iw | Hebrew |
hi | Hindi |
hu | Hungarian |
is | Icelandic |
id | Indonesian |
ia | Interlingua |
ga | Irish |
it | Italian |
ja | Japanese |
jw | Javanese |
kn | Kannada |
kk | Kazakh |
rw | Kinyarwanda |
rn | Kirundi |
ko | Korean |
ku | Kurdish |
ky | Kyrgyz |
lo | Laothian |
la | Latin |
lv | Latvian |
ln | Lingala |
lt | Lithuanian |
mk | Macedonian |
mg | Malagasy |
ms | Malay |
ml | Malayalam |
mt | Maltese |
mi | Maori |
mr | Marathi |
mo | Moldavian |
mn | Mongolian |
sr-ME | Montenegrin |
ne | Nepali |
no | Norwegian |
nn | Norwegian (Nynorsk) |
oc | Occitan |
or | Oriya |
om | Oromo |
ps | Pashto |
fa | Persian |
pl | Polish |
pt-BR | Portuguese (Brazil) |
pt-PT | Portuguese (Portugal) |
pa | Punjabi |
qu | Quechua |
ro | Romanian |
rm | Romansh |
ru | Russian |
gd | Scots Gaelic |
sr | Serbian |
sh | Serbo-Croatian |
st | Sesotho |
sn | Shona |
sd | Sindhi |
si | Sinhalese |
sk | Slovak |
sl | Slovenian |
so | Somali |
es | Spanish |
su | Sundanese |
sw | Swahili |
sv | Swedish |
tg | Tajik |
ta | Tamil |
tt | Tatar |
te | Telugu |
th | Thai |
ti | Tigrinya |
to | Tonga |
tr | Turkish |
tk | Turkmen |
tw | Twi |
ug | Uighur |
uk | Ukrainian |
ur | Urdu |
uz | Uzbek |
vi | Vietnamese |
cy | Welsh |
xh | Xhosa |
yi | Yiddish |
yo | Yoruba |
zu | Zulu |
23 comments:
Hi,
nice job of wrapping the google translate synthesis to produce speech.
Have you ever thought of wrapping android svox (pico2wav) the same way?
I have svox running on my ubunu, but it accepts only very short strings, so splitting them in a similar manner could be also useful.
best regards,
newsgrabber@poczta.onet.pl
Thanks, I didn't try the Svox Text-to-Speech, but if it has a command line interface, you can easily modify my script speak.pl, namely the function SentenceToMp3() which takes a short text and an index of the sentence and calls the Google TTS over HTTP, stores the mp3 file and returns the filename. Just replace the expression
my $resp = $browser->get(...)
with
my $resp = system("pico2wav -language $language -output-file $mp3_out -text '$sentence'");
I don't know the correct names of arguments of pico2wav, so replace them with the real ones.
Then also remove this line which replaces space with + sign. It makes sense only for URLs:
$sentence =~ s/ /+/g;
and look for the number 100 and replace it with the maximum number of characters which can be processed by single pico2wav call.
Hope that helped :o)
Hi,
sorry for the off topic, but maybe you can help me.
I am working for a long time with google traslator, I mean with the voice of TTS, an english female voice.
Google has recently (in october 2012 I think) change this voice and now is a male voice.
I use the voice as a singer of my songs. You can take a look at http://loudsouldisease.wordpress.com/2012/05/07/the-wait/ and listen to her.
I need her voice to finish my nusical work.
Can you tell me, do you kown where I can find this female voice?
I have listened the english voice of espeak, but is not the same.
I have emailed to google but no answer.
Tank you
Hi "Loud Sin Desire",
Google uses their own voices and as far as I know, they do not provide those voices to public.
Maybe you could check it on some Android smartphone. There should be Google's TTS and maybe you could even switch between male and female voices.
Good luck with your music!
Michal
Great script. I've been using it to have my Raspberry Pi read the weather report. Recently Google started to provide the MP3s in varying sample rates which makes SOX fail. I guess it needs a step to resample the MP3 chunks. I am currently trying to integrate a one-liner to fix this but you would probably find a nicer perl way for your script.
The batch resampler one-liner:
find . -maxdepth 1 -name '*.mp3' -type f -print0 | xargs -0 -t -r -I {} sox {} -r 16000 16000/{}
(it needs the subdirectory 16000 to exist)
Google Translate changed the sampling rate.
To make the script work again, all you need to do is to search for "22050" and replace it with "16000" (without quotes)
Thanks nice job back there using google translate to produce speech.
Well I heard about some text to voice websites like www.tingwo.co/ are one of the best appearing comments, in my view, especially considering that they are all currently 100 % free. They seem to have more modifications on how it says certain words, which creates studying more time content more digestible and less automatic.
It was my first time to use Google text-to-speech tool and I really didn't know how to do it. I am lucky to find this post of yours because now I understand why it very useful and how to use it.
Thanks!
FYI, you can pass a more specific locale in the tl parameter, e.g.
http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_us
http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_gb
http://translate.google.com/translate_tts?q=testing+1+2+3&tl=en_au
I had to install :: perl -MCPAN -e 'install WWW::Mechanize'
thanks :)
some of the languages you listed are not supported. for example, Oromo and amharic
Hi,
I tried speak.pl on a windows machine (has sox and perl installed on it), but it does not generate the .mp3 file. Do you have any instruction on how to run this on a windows machine?
Adding to my previous post this is the error I get when running in windows cmd (same Lame lilbrary is missing!)
C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
sox.exe FAIL util: Unable to load LAME encoder library (libmp3lame).
sox.exe FAIL formats: can't open output file `text.mp3.tmp/0002_sil.mp3':
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...sox.exe FAIL util: Unable to load MAD decoder library (libmad).
sox.exe FAIL formats: can't open input file `text.mp3.tmp/0002_sil.mp3':
done
Again it is me. I managed to install the lame encoders. Now speak.pl seems to execute fine. But text.mp3 is empty (i.e. silence):
C:\googletts>speak.pl en text.txt text.mp3
line: Hi there
sentence[0]: Hi there
mp3_out: text.mp3.tmp/0000.mp3
http://translate.google.com/translate_tts?q=+Hi+there
URL: http://translate.google.com/translate_tts?tl=en&q=+Hi+there
exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
sox.exe WARN mp3-util: recoverable MAD error
exec sox.exe -n -r 16000 text.mp3.tmp/0002_sil.mp3 trim 0.0 0.05
Concatenate: text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3
Writing output to text.mp3...
exec sox.exe text.mp3.tmp/0000_trim.mp3 text.mp3.tmp/0002_sil.mp3 text.mp3
done
Hi everyone, I just fixed the sample rate issue. Thanks for pointing it out, Anonümus :o)
Now for, Anonymous :o), the problem might be in your sox binary. Did you check the mp3 files inside the text.mp3.tmp folder? Are they also empty?
Here is my testing output:
$ echo "Hello world" > test.txt
$ ./speak.pl en test.txt test.mp3
line: Hello world
sentence[0]: Hello world
URL: http://translate.google.com/translate_tts?tl=en&q=+Hello+world
Concatenate: test.mp3.tmp/0000_trim.mp3 test.mp3.tmp/0002_sil.mp3
Writing output to test.mp3...done
Inside text.mp3.tmp folder there is:
0000.mp3
0000_trim.mp3
0002_sil.mp3
They all seem to be 0 seconds long.
So you think it is a problem with the sox binary!
0000.mp3 is the file you got from google. If it is empty, then it has nothing to do with sox. Does it really have 0 bytes?
If you open this link: http://translate.google.com/translate_tts?tl=en&q=+Hi+there ...you should hear the mp3 file. Is that mp3 correct?
1) The mp3 file is OK when I open link http://translate.google.com/translate_tts?tl=en&q=+Hi+there
1) 0000.mp3 is 4KB and is playing (1 second long)
2) 0000_trim.mp3 is 1KB and 0 seconds
3) 0002_sil.mp3 is 1KB and 0 seconds
4) text.mp3 (output mp3) is 1KB and 0 seconds long
I have the impression there is a problem with generating trim and sil. Is the following piped commands correct in msdos:
exec sox.exe text.mp3.tmp/0000.mp3 -p silence 1 0.1 -60d | sox.exe -p -p reverse | sox.exe -p -p silence 1 0.1 -60d | sox.exe -p text.mp3.tmp/0000_trim.mp3 reverse
because this is where I get this warning: sox.exe WARN mp3-util: MAD lost sync
sox.exe WARN mp3-util: recoverable MAD error
Hey there. I'm trying to use the google tts to play foreign translations of individual english words.
The audio works the first time but, after a day, disappears and won't reappear until i refresh each link in my own browser. It seems the issue is permissions? But there's no API key for tts. Is there any way to use rel=noreferrer or some other coding to work around the permissions issue?
More required packages on Ubuntu:
libwww-mechanize-perl
How would you #!/bin/bash this into an .sh file?
Have your student/child think of more words that rhyme with the example word. best virtual assistant program
Just explore and build text to speech system with a strong significance on rhythm and prosody of speech that is closer to the natural enunciation.
Post a Comment