Text-to-Speech on a Low-Power Chip 263
bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"
Done that (Score:4, Interesting)
Re:Done that (Score:1)
Re:Done that (Score:1)
Dr. Sbaitso (Score:1)
I believe Radiohead used it as the voice for their track "Fitter Happier" on OK Computer.
Re:Dr. Sbaitso (Score:1)
(The bundled voices for text to speech on MacOS 8+)
Re:Dr. Sbaitso (Score:4, Funny)
In other news, "Man or Astroman wants all the party people.. to say.... yeeeaaaaahhhhhhhhhhh"
And by the way, the voice on "Fitter, Happier" (Radiohead) was actually Thom during an especially intense episode of innebriation >:P
Re:Dr. Sbaitso (Score:1)
How many people know where the name Dr. Sbaitso actually came from?
Re:Dr. Sbaitso (Score:2)
ai = "artificial intelligence"
tso = ???
text-to-speech output?
Re:Dr. Sbaitso (Score:3, Informative)
Re:Dr. Sbaitso - no it's bruce (Score:1)
The lyrics are here [followmearound.com]
Mac OS has that (Score:1)
Re:Mac OS has that (Score:3, Funny)
My Amiga was talking to me 15 years ago.
Actually, my Timex Sinclair 1000 was talking to me 20 years ago, but I think that was the acid...
Re:Mac OS has that (Score:2, Insightful)
Actually, on a more serious note, is there anyone working on an open source speech synthesis project?
Re:Mac OS has that (Score:3, Informative)
Yup; it's called Festival [ed.ac.uk].
Re:Mac OS has that (Score:2, Informative)
Commodore 64 has that! (Score:2)
Does anyone remember the name of this program? I think it was something like "Simon Says".
Re:Commodore 64 has that! (Score:2)
Re:Mac OS has that (Score:2)
--Blair
Oh no...... (Score:2)
Re:Oh no...... (Score:2, Funny)
Re:Oh no...... (Score:1)
Not in this case, since the chip translates text in to written phonemes. This just means that the Chinese language text would have to be written phonetically, in simple text. How it handles the tonal aspect of Mandarin is another story, but I suspect that there is some phonetic writing scheme that accounts for this.
Re:Oh no...... (Score:1)
Great for all sorts of devices. (Score:3, Interesting)
I wonder how lifelike the voice is though. I don't think any text-speech tools are going to become very mainstream untill they sound better.
Re:Great for all sorts of devices. (Score:2)
All computers should sound like the Voice of World Control. Audio clips from Colossus: The Forbin Project are here [uiuc.edu] for your enjoyment.
If Stephen Hawking sounded like this, he would have taken over the world long ago.
wouldn't it be easier.... (Score:5, Informative)
Re:wouldn't it be easier.... (Score:1)
Re:wouldn't it be easier.... (Score:1)
Converting text to an orange ... (Score:2, Funny)
Re:Converting text to an orange ... (Score:1)
Great... (Score:2, Funny)
That's all I need, Stephen Hawking's voice coming at me from my cell phone:
Anonymous cowards love the rich meaty taste of spam.
Re:Great... (Score:2)
From: root
Subject: cron kill `ps -ef | grep username | awk '{print $2}'`
Just imagine how that sounds read back to you over your cell phone. It really beats having to lug a laptop with me just to check my email, but the kinds of email a sysadmin receives often don't translate well into spoken English. However, it's fun to hear this female voice try to get it right. One of these days, I've gotta get together with the programmers here and make sure these things get read right, like "kill back-tick pee-ess dash ee-eff pipe grep user-name"...
pr0n related use? (Score:2, Funny)
A Possibility... (Score:2, Funny)
Nothing new (Score:1)
There's really nothing new about this product, except for its ability to speak Mandarin. And given the state of the Chinese economy, it's not very likely that many citizens over there will be in the market for talking electronic devices anytime soon. Most of them are still trying to get phone service and running water.
-CT
Re:Nothing new (Score:5, Informative)
Check out quadravox [quadravox.com] for boards that emulate the SP0256, using ISD's analog flash memory and a microcontroller.
(My misadventure with the old GI chip: -12 instead of +5, just for a split second. After that, it developed an stutter!)
Re:Nothing new (Score:1)
The only problem with this.... (Score:2)
applications... (Score:1)
Think of the applications for blow up dolls and pr0n!
Text 2 Speech on Linux (Score:1)
Why dump more tech than necessary into the phone? (Score:4, Insightful)
The cellphone may have all the power of an original Palm Pilot these days, but we don't need to make it into a Onyx Server.
Re:Why dump more tech than necessary into the phon (Score:1, Interesting)
p.s. and don't even get me started on digital phones... converting analog to digital to analog baseband to RF, and then back again!
Re:Why dump more tech than necessary into the phon (Score:1)
Re:Why dump more tech than necessary into the phon (Score:5, Insightful)
I've got a co-worker, our Oracle admin, who's blind. As things stand, with most cell phones he can't do anything except dial out and answer calls. He can't use the built-in address book to place calls for example, because all of the info is in text on a tiny screen. With text-to-speech software on the phone, he'd be able to use the address book just like sighted folks, read text messages he received earlier even when he's in an area with no coverage just like sighted folks, and so on. This is a good idea.
Re:Why dump more tech than necessary into the phon (Score:3, Informative)
I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!
Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....
Re:Why dump more tech than necessary into the phon (Score:3, Interesting)
He's got a variety of tools at his disposal. Just the other day, he gave a demo of some of them to a bunch of us.
He's got an 8-dot braile terminal that gives him enough characters to do C and Perl programming. He's got a hardware speech synthesizer he cranks up to something like 200+ words per minute. I tried, and could only understand a few phrases when it was cranked up to 95 words per minute.
And when a web site he needs or wants to access is inaccessible, he complains to them, and sometimes things get fixed. He can navigate web sites that use alt tags remarkably well. A good rule of thumb is that if a site makes sense with images turned off (or in lynx), then it'll work for him.
seeking more info (Score:2)
Are there any websites where you can get a review by a blind person? or anything similar?
We can talk about web standards until we are blue in the face, but when we stop certain people from being about to use the web, that's more than a failure of standard.
thanks Kynn (Score:2)
Re:Why dump more tech than necessary into the phon (Score:2)
I hope he practices safe cell phone use and doesn't call out while he's driving.../humour
Re:Why dump more tech than necessary into the phon (Score:2, Insightful)
Re:Why dump more tech than necessary into the phon (Score:2)
Re:Why dump more tech than necessary into the phon (Score:2)
The problem is, the idea of using this tech in phones is fighting against hundreds and hundreds of millions of deployed telephones without any tech newer than perhaps a microchip for caller ID. Over the long-term, text-to-speech embedded in the device is the more efficient and user-controllable format. Over the short haul, though, we're going to see many years still of central-office-controlled voice apps on your phone.
Niche applications, like on a Pocket PC, now there something like this would absolutely rock. Get a toehold, and eventually low-power text-to-speech and speech-to-text devices will be all the rage.
Now if only someone would perfect a speech-to-text engine that didn't require hours of training to recognize my accent...
Old news man ... (Score:1)
Oh the possibilities! (Score:1)
Seriously, that could have tremendous bussiness implications for those who are doing bussiness in other countries.
Their usage of EEPROM is nothing ut ingenious, why hasn't anyone done this before? Or have they? It makes a lot more sense then a flash card, and it's cheaper too.
I bet it will choke... (Score:5, Insightful)
The lead story read: "Unionized environmental health workers object to new chip that can read un-ionized lead levels."
Reading english is a lot tougher than most English speaking people think.
-- MarkusQ
Re:I bet it will choke... (Score:2, Insightful)
You're right though: rough, bough, cough, laughter, slaughter etc. might give it trouble (they certainly gave Ricky Ricardo a headache).
It would have to do a lot more than simply translate the text to phonemes to be effective with English.
Re:I bet it will choke... (Score:2, Informative)
LEXX
Re:I bet it will choke... (Score:2)
Taiwan = Mandarin
Mainland China = Mandarin
HongKong = Cantonese
Toronto = Cantonese
+1 Funny, +1 Informative on the MQR standard (Score:2)
G. Nolst Trenité
Dearest creature in creation,
Study English pronunciation.
Spot on! Not only would it have to disambiguate homonyms by semantic context, it would even need to use poetic context. Great poem!
-- MarkusQ
Good for the Blind. (Score:2, Insightful)
Don't count on Hearing a Slashdot headline (Score:1)
M$'s Front Page is one of the worst offenders. It's full of useless font adjustments and other needless code. Worse, it lables images crypticaly and encourages all of the worst practices.
As Bill Gates once said, software is what is lacking in a world full of technology. He aims to keep it that way for those who trust him.
Kavita Maharaj (Score:2)
Because I swear, sexy though it is, her voice is synthesized.
--Blair
Phenomes? (Score:4, Insightful)
But is it smart enough to pronounce the boldfaced word above as "phonemes"?
Hrmm... (Score:2)
Incidentally, a guy I work with has a father who designs for Chrysler. He said that the big D-C was "really interested" in applications of text to speech. Think about it: ebooks that read themselves to you while you drive, driving directions and traffic info read to you rather than displayed on a screen (most nav screens require you to take your eyes entirely off the road and down the dash as much as 18 inches...eep!). You've got a much more useful interface, and with a low cost(though they'll charge you a grand, i'm sure) , easy to interface chip, they'll have no excuse not to bring this much safer system for data interaction to my dash today, and not six years from now.
Re:Hrmm... (Score:2)
Of course, a text to speech reader isn't going to sound anywhere near as nice as an audiobook...but until you can find an audiobook of my email, or my students' papers, or the latest press release from Sun Microsystems, i'll take the coder.
read your e-mail outloud? (Score:1)
Re:read your e-mail outloud? (Score:2)
"As your accountant I need to inform you that..",
"Here is your divorce settlement proposal..",
"This is your doctor. Test results came in. You have..",
etc..
In comparison some x-rated junk mail might actually make some poor fellows day..
This'll be great for phone spam (Score:2, Funny)
Phone: "Are you looking for hot [chicks|sex|pussy|love]?"
Wife: "um... what was that, honey?"
Phone: "Get your University diploma!"
Wife: "What, I'm not good enough the way I am?"
Phone: "Get out of debt now!"
Wife: "Okay, you know what? That's your birthday present on the Credit card, bucko. That's it. I'm leaving..."
Companies in this business (Score:1)
But there are some implementation issues here. Example, if you have GNU. How do you say it? What about if you have Jekka Pukka Sarasate? If you were to take the literal English pronounciation you might never even be able to understand what it's trying to say. Figuring out how to solve that is an interesting CS problem.
But this is a cool invention. Low power wireless research is just taking off. Before we were trying to figure out how to just transmit wireless well. Now we can have fun with it. I truly look forward to a wireless life
Me..
Better in the Next Generation (Score:2, Funny)
"You've got mail, baby."
Great for ordering Chinese food. (Score:1)
Catching up with an Amiga? (Score:1)
With all the horsepower available in any modern handheld device -- surely much more than an 8mhz 68000 with 512K of memory (of which only a fraction was used I'm sure) -- I don't understand why a dedicated chip would be needed to pull this off.
Oops, 1986 (Score:1)
Re:Catching up with an Amiga? (Score:2)
> forgot what intonation was but could alter its voice half an octave to simulate slight masculine
> or slight feminine undertones" then I'll agree with you.
Well, that was if you fed text to the translation device, which did its best to generate the required phonetic output--also in ASCII--that was fed to the speak device. This translation could be pretty rough, and could be much improved upon if you generated your own raw phonetic output. You could smooth out, lengthen, shorten, or intonate individual phonemes that way, making the output sound much better. Basically, the translation device needed a good rewrite.
Yeah, but... what about everyone else? (Score:1)
So what? (Score:1)
Post your Ideas here! (please) (Score:2)
OK, so just imagine that in the near future anything and everything will have one of these small, low cost chips. Now, imagine the possibilities! Everyone I'm sure has their own ideas on how cool this could be, so go ahead and reply with yours!
Re:Post your Ideas here! (please) (Score:3, Interesting)
2) Text messengers for blind people. You know those little IM devices all the kiddies have? Well just put brail on the keys and have one of these chips installed... there you go.
3) Watches. The next time somebody says "what time is it?" you just press a button and the voice chip in your watch simulating someone who sounds extremely pissed off shouts the time.
Well, that's it for now...
Remember when Speak'nSpell was a new thing (Score:1)
They've had stuff like this for a while ... (Score:1)
It was hilarious sending him obscene and or ridiculous emails and listening to the recorded voice play them back
Already done. (Score:2)
Fundamentally it's a different approach than today's "voice portal" technology. Voice Portals retrieve data for you, and read it over standard cell or PSTN network. There are many benefits to this approach, principal among them being improved processing power for additional functionality such as voice-processing (speech to text, or compressing speech for reply email voice attachment). By putting the power into the phone, instead of at an expensive central office, this chip could either be a great advancement for text-to-speech technology, or a "killer app" that puts my company out of business
Regardless, I'm excited to see this happening. I've long envisioned a PDA with the only interface being spoken, rather than requiring any video component. This would bring the power consumption and delicacy of these devices down within reason for extended usage. The downside is that speech is necessarily a rather slow interface to a machine; it will be interesting to see how we adapt speech for greater speed with speech-based devices, and how English as a whole will fare.
Now that I've used voice-enabled email, it would be really hard to go back to the "old" way. I still do an enormous amount of correspondence every day by typing, but when I'm on the road I don't need to bother with a laptop since I can have my email read to me over the phone *and reply* with a voice message via email. Until you've used it, it's tough to realize how convenient it is.
I want one of these for my Agenda VR3! Or something...
Its about time... (Score:2)
That was... 21 years ago. Its sad that this aspect of human computer interaction has been overlooked for so long. Its nice to finally see some development.
Mandarin (not the orange) (Score:4, Insightful)
Just what I need (Score:2, Funny)
Winbond's Whitepaper (Score:2, Informative)
Winbond [winbond.com]
Bandwidth problem -- audio is dead (Score:3, Interesting)
The most important thing about the Internet is "bandwidth". I'm not talking bits on the wire, I'm talking how fast information flows into my brain. Speech is vastly slower than text as a medium for transfering information into my brain. I'm so accustomed to Internet speeds for information, I can no longer watch TV news -- the bandwidth is too slow. I'm glad I don't go to school anymore -- I could barely stand lectures when I was a kid, I would never be able to sit through them as an adult.
Five years ago everyone in Japan walked around with their phone to their ears. These days, everyone in Japan walks around looking at their phone (instant messaging, etc.). I'm not sure if people "get" the bandwidth problem. Sound must be multiplexed into half-bandwidth, serialized communication. By this I mean you can only input or output at the same time, but not both. Also, incoming messages must arrive separately, not in parallel. With audio, I can only talk to one person at a time, with messaging, I can carry on multiple text-based conversations simultaneously. I mean, text-to-voice has long been availabe on PCs, but nobody uses it for ICQ/AIM/YahooIM/MSIM.
As far as I can tell, audio is dead. Maybe somebody will invent some sort of hyperfast language (didn't Heinlein describe something like that in a book?), but I think the next wave is going to be something new that replaces reading text, not something that goes backwards to audio.
Re:Bandwidth problem -- audio is dead (Score:2)
Consider this: have you ever been IMing w/ several people, but then called one of them because it was _really_ important to get that conversation done _fast_? I have, and it makes it much harder to try to keep all the _other_ conversations going.
Fast speach-to-text could give you the best of both worlds, but that's, of course, still a long way off.
Commodore C64: SAM (Score:3, Funny)
Has anything new happened lately?
Re:Commodore C64: SAM (Score:2)
Great achievement, my Commodore C64 could do that so many years ago that I don't even remember when it was. SAM, the speech synthesizer which could even "sing".
::nods:: I remember programming my TI-99/4A to read me a menu of games whenver I started it up. Then there was that text-to-speech program for my 386 that came with my soundblaster that enabled me to make my computer announce that it was booted up and ready for his l33tness Velex himself to use dos. Just a few months ago, my roommate download this monkey called Bonzi that talked to him, but Bonzi got annoying so my roommate shot him.
I'm sure that there's been tons of text-to-speech programs that I've never heard of, no will I ever, because it's been done so many times before, and the AI required to get the computer to talk in an un-Vice Fearless Leader #42 fashion is beyond the grasp of even the most 31337 at the moment. What I would really like for my mobile phone is rudementary speech recognition.
As it's been pointed out before, text is just simply faster than speech, and who knows, maybe twenty years down the road we'll all carry around little AIM or ICQ devices. What I'd really like to do, though, is skip the minature keyboards or fumbling on a keypad. Speech regonition is the way to go. I'd much rather tell my phone, "call pink" and have it call pink back at the hotel, than have it announcing all my spam to the world.
No, unfortunatly nothing new's happened lately
Technology advances... (Score:3, Funny)
No way in hell do I want to read email on a cell phone (it's a PHONE. You _talk_ to people in it. If it was a generic mail reader it would have at least a 17 inch monitor and a keyboard that lets you type faster than
At least until the phone can give me an (intelligent) summary when I say 'Get to the point'.
What Happened to TI (Score:2)
ttyl
Farrell
Re:What Happened to TI (Score:2)
Re:What Happened to TI (Score:2, Informative)
Still, I was part of the team that made the first Apple II (at least in the State I lived in at the time) that could read from the screen back in 1981 -- to an "Echo II Speech Synthesizer" which IIRC came from Radio Shack.
We took some of our stuff to the linguistics department at the University across town, and of all things, had the darn machine speaking understandable Japanese (from Romaji, or romanized letters) within a few days because the Japanese language is consistent not only in phonetic translation but also in inflection. It still sounded like a machine, but that was a limitation of the sound chip's internal phoneme library in the Echo II. The same program with one of today's chips would have sounded very near normal.
Goes to show you how much more difficult spoken English is than most of us native speakers tend to realize, because I have yet to see a low cost implementation of a text to speech translator that was all that much better than what we were doing back in '81. (not that I have seen everything out there by the way -- I do have a life outside the PC world....occasionally :-)
language transulation? (Score:2)
Now all we need are really good speach to text converters....
Memory density (Score:2, Funny)
Unlike everybody who posted "big deal, my Commodore 64 used to hold long, sexy conversations with my Speak & Spell about the meaning of Wargames," I actually read the article. Near the end it says "The multilevel storage memory system allows the chip to store up to 256 different voltage levels, or the equivalent of 8 bits, into one EEPROM cell, which is up to 8x the capacity of conventional memories..."
Being a software geek with my last classes in EE/CE several years safely my sordid past, I'm out of touch. Is this a big deal?
I'll show you low power... (Score:2)
Re:I'll show you low power... (Score:2)
For obsolete machines, they still pack a punch as a server for text->speech conversion.
Any one remember... (Score:2)
The '256 took coded phonemes an outputted audio,
while the other chip in the set (don't remember the name) took ASCII serial data and
converted it to phoneme codes the '256 could understand.
This set has been around for prolly close to 20 years now. (I remember finding a variant of it in
the Intellivison voice module ["Bee Sevunteen Bahlllllmer"!] that I believe was circa 1984.)
The '256 has been discontinued for a long time now, and I'm kinda excited to see
something similar to it show up, it was a cool gadget.
C-X C-S
Hey (Score:2, Interesting)
I'd pay for it, and I bet a bunch of other people would too.
Good for accessibility (Score:2)
Such a device will be very handy for people that have visual impairments. Instead of the current bulky and expensive kits, this will be an improvement, especially for VI users out-and-about.
What can you do? Make your web pages accessible [w3.org] for a start.
The State of the Art... (Score:2, Informative)
been around for a while... The problem with
their acception is that they have poor voice
quality. Actually, ther are tho quite different
technologies available to produce text nowadays:
1. Diphone synthesis and its variations. The idea
is to have one sample of each sound compination
(diphone) in a speechase and produce the actual
speech by manipulating those sounds. This is what
give computer-syntethized, somewhat metallic speech
that most people have already heard somewhere and
this is what actually used in low-powered devices,
handhelds and speaking dictionaries.
2. Corpus-based synthesis. The idea is to store
a few hour of the speech of a highly trained
speaker in the speechbase and select fragments
of this speech that suit best for the genaration.
The second approach gives astonishing results with
the quality of the speech being sometimes
undistinguishable for the human. However, the size
of the speechbase is an issue. You can not fit a
300Mb speechbase onto a handheld hevice yet
and hardware optimizations dont help much when
it conserns fetching data from the speechbase
and performing text-to-phonemes conversion.
Several companies have corpus-based synthesis
demos on-line. Check out SpeechWorks' and
Lernout & Hauspie's sites