SunPin asks:
"I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation. I've been a dictation user since 1990. My preference is 'discrete' speech because of very low resource consumption and its effectively infinite flexibility. Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet. Drop the game and there's nothing that I shouldn't be allowed to do on the go. The problem is that I can't. Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors... not exactly demanding by today's standards and, unless I'm outright wrong, not demanding by today's PDA standards. Why hasn't it occurred yet?"
"In the disability offices of the hundreds of universities across the US, such software would be a major money saver because not all students need a high-powered laptop. While natural speech is great from a marketing perspective, it is simply impractical for general use and cannot adapt to mildly noisy environments. IBM, L & H and Microsoft have all given me the run-around. IBM refused to entertain the possibility. L & H is on life support, in a deep coma. Only Microsoft had a remotely positive response saying that they were testing natural recognition in Mandarin Chinese in their Beijing research office. Does anyone believe in keeping it simple, anymore?"
Well... (Score:3, Interesting)
First off, buying a dictaphone is still much cheaper than a PDA with software.
And secondly the whole voice/word recognition program market hasn't really accomplished any great leaps or bounds over the past five years, not to mention it's not popular in the mainstream yet.
Re:Well... (Score:5, Funny)
DICTAPHONE? DICTAPHONE?
re-vulcanize my tires, post-haste. And make sure this post is on the next auto-gyro to Prussia.
ack! (Score:3, Informative)
http://www.dictaphone.com
Re:Well... (Score:2)
Re:Well... (Score:4, Funny)
"IBM: Where software goes to die."
Re:Well... (Score:2)
It makes them feel like they are a super tech think tank ala PARC...
They do come up with some great stuff and I would bet that if IBM were a japanese company the entire tech industry would look totally different.
dragon solution (Score:2)
Re:dragon solution (Score:2)
Simputer (Score:3, Informative)
Re:Simputer (Score:3, Insightful)
hello world (Score:4, Informative)
My god (Score:5, Funny)
Hey! (Score:4, Funny)
Re:Hey! (Score:4, Funny)
Get a wife.
(I hope you're all happy, that comment cost me an expensive dinner.)
More to do with perception (Score:5, Interesting)
I have not had much experience, but I think the other thing is that people are averse to any sort of training or teaching required, no matter the long term dividents.
Like most things, it comes down not to fact, but to perception and prejuidice. Most people base their buying decisions on 30-second spots, not informed research, so the cost of educating people to is too high for producers to incur.
Re:More to do with perception (Score:5, Interesting)
The ability of the Zaurus to take a MIC input makes a big difference since a good MIC is important due to noise cancelling features they have. All the PDA's with no external MIC option are pretty much useless for VR/Dictation.
LoB
Re:More to do with perception (Score:4, Informative)
Since it doesn't look too promising I think you may want to expand your search beyond PDAs. I saw several references to the linux based simputer, maybe one of those with Linux based speech-to-text software is the way to go?
It's not just the processor... (Score:5, Interesting)
I'm also going to assume that the current line of speech recognition products are MUCH better than what ran on your old 386.
Re:It's not just the processor... (Score:2, Informative)
The problem is in recognizing what you said, the best software out there still sucks and you have to train it forever. No matter what you will have to train it to recognize your voice. My saying car and some one from Boston saying car are drastically different but they are the same word. Given a lot of training you can get something halfway decent but it still requires corrections. This is especially true if you have a cold, you just woke up or are sleepy.
It's a very complex thing and I don't see any signifigant breakthroughs anytime soon. I've used quite a lot of programs (with a good microphone) and you can get ok results especially for simple things like "Open" "Close" but I think we're a long way from really good dictation software.
-Chris
Re:It's not just the processor... (Score:5, Funny)
My saying car and some one from Boston saying car are drastically different but they are the same word.
Hey! I resent that remahk! You ah stereotyping heah, and it's not fa-uh. Some of us from Bahston can say cah just like the rest of you. Just jealous, that's what you ah. Come up heah, and you'll be wicked sorry that you did. :-)
Re:It's not just the processor... (Score:2, Funny)
Re:It's not just the processor... (Score:5, Funny)
Plantronics makes Sound-Cardless Headphones (Score:3, Informative)
http://www.plantronics.com
and search for their DSP-*00 series. I picked up their DSP-500 (normally $110) for $40 on a deal.
Re:It's not just the processor... (Score:2, Interesting)
It's the microphone circuit on the soundcard.
My brand new AWE-64 had a crap mic circuit.
The el-cheapo replacement was excellent.
Re:It's not just the processor... (Score:5, Interesting)
NC microphones (Score:4, Informative)
Re:It's not just the processor... (Score:3, Informative)
There are not two microphones in that headset - that would just make it worse, since no PC it would run on is real time enough to match the sound samples together, etc, etc, etc.
Instead they use a dual port microphone. The element lies between the front of the mic (towards the speaker) and the back (towards ambient noise). Sound pressure from ambient noise tends to hit both the front and back simultanously, while sound pressure from the speaker hits only the front. The difference gives mainly the speaker, with muted external sound
Even cheap mics have that now. The main difference between a good mic and a bad one is its construction and materials, which affect its response characteristics.
-Adam
Because (Score:5, Informative)
IBM can't even manage to do this on, for example, a P3 733EB. How they're going to do it on a 300MHz XScale or SH chip or similar (let alone a Motorola Dragonball) is beyond me. I think your head is in the clouds.
With that said, voice recognition is very much on everyone's minds and it is coming. The limiting factor in handhelds right now is battery technology, which seems to be advancing more rapidly now than it has been in the last decade or so. With more power density comes faster processors and more ram, and the ability to perform these kinds of operations on smaller computers.
Re:Because (Score:2, Insightful)
Re:Because (Score:5, Insightful)
Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day. He doesn't want continuous speech that doesn't have to be trained and all that jazz - just simple old school voice recognition. Is it so much to ask that someone port the old algorithms to the palm?
Re:Because (Score:5, Insightful)
The author might be happy with what he had those days. The rest of the market would not be happy with that. In fact, the market is not happy with what we have now, as witnessed by the very low penetration of voice-recognition software. So why would we expect companies to spend the resources porting the old stuff when the new stuff won't even sell ?
Re:Because (Score:2, Insightful)
Re:Because (Score:3, Interesting)
Sometimes niche markets turn out not to be. Just look at a lot of "desktop publishing" software. Back in 1986 that was still largely a niche market. Now it is indespensible for many, many people.
Re:Because (Score:3, Insightful)
Almost as frightening as an office full of people all using telephones.
You don't remember typewriters and adding machines, or for that matter, the dictaphone, do you?
Re:Because (Score:2)
Re:Because (Score:2, Interesting)
Frankly I don't want the din of dozens of coworkers talking at their computers around me. I'll stick with my qwerty keyboard. And this means those with physical disabilities will be condemned to a corner of the market, getting less attention and as a result more expensive and less quality products.
-FF
Re:Because (Score:2)
Er, sorry Wil.
Anyway, which is more "natural"... opening word and typing, or saying "Computer, please dictate a letter to such and such"? I think the answer to this is clear. It won't be replacing the secretary any time soon but this is how many people (I think most) do/would prefer to control their computers. Some things will likely always be best done with a keyboard; don't expect the keyboard to vanish any time soon. But especially in the case of portable computers which either have no keyboard or a substandard one, I would expect voice control to be the norm within five years or so. Text input on portable computers is simply too tedious.
With that said, I think there's also room for dictation on your PDA and then non-realtime conversion to text while you're not doing anything with it, or conversion done on your PC (of course that's also non-realtime) when you dock. Also what with mobile wireless internet getting cheaper you may actually find yourself speaking to your mobile device, which then sends an audio stream somewhere else for processing. If communications technology continues to outpace battery technology, this seems likely.
Re:Because (Score:5, Funny)
I guess you haven't seen 2001: A Space Odyssey....
"Open the pod bay doors HAL."
"I'm sorry Dave, I'm afraid I can't do that."
Maybe it wasn't that Hal was insane, just his speech recognition software failed....
Re:Because (Score:5, Insightful)
And have you ever actually tried speaking for eight to ten hours at a stretch? I'm not talking about random, occasional speech acts, but sustained, focused speech. You'd have about three weeks until laryngitis became an occupational hazard among white-collar workers.
Speech is nice, but it is very much a niche application. Not only now, but ever. A keyboard is faster than speech, and does not contribute to noise level or occupational damage nearly as much as sustained speech would. It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on will rightly love the interface. For mainstream use, however, it's just not good enough even when it's perfect.
It could become an accessory input, on the lines of replacing menu commands for an app: mark text, say "cut", mark a place, say "paste" and so on, but it just would never replace keyboard input in any mainstream application.
Dragon Dictate is portable (Score:5, Informative)
Storage space? (Score:5, Insightful)
dictaphone's EXSpeech (Score:5, Informative)
Re:dictaphone's EXSpeech (Score:3)
"EXSpeech(TM) offers a highly accurate continuous speech recognition solution that's fully integrated with Dictaphone's industry-standard Enterprise Express® voice and text management system. This state-of-the-art speech recognition technology, incorporated into a complete patient information workflow management system, can reduce transcription costs by more than 20% while speeding report turnaround."
i hope that voice recognition never really flies (Score:2, Interesting)
I dont get it (Score:2, Informative)
Possible Reasons (Score:2, Insightful)
Second, people with PDA's usually are using them somewhere in public, so you have a lot more background noise (as well as different accoustics) which varies as the person moves about and would probably make it difficut to filter out.
Third, you'd want a mic on a cable, seperate from the PDA. This way you can keep the PDA where you can read whats on it, and not have to shout at it from arms length, or shove it in your face and talk into it.
Finally, (and this is the reason I think voice control never caught on in a business setting), imagine a roomful of people talking to their machines. Each machine would have to identify its user's voice uniquely out of the babble, otherwise it would just take one ticked off guy with a bullhorn to issue the command to delete everything.
Re:Possible Reasons (Score:3, Funny)
One Reason (Score:2, Insightful)
ViaVoice (Score:5, Interesting)
http://www-916.ibm.com/press/prnews.nsf/jan/9E2
This shipped with my iPAQ 3835 and seemed to work pretty well for the 5 minutes that I used it before installing Linux on my iPAQ.
Embedded ViaVoice (Score:2, Informative)
Re:ViaVoice (Score:3, Informative)
Re:ViaVoice (Score:2)
Re:ViaVoice (Score:2)
Voice Recog for WinCE (Score:4, Interesting)
Because it was a relatively processor intesive task, I would imagine that time + improvements in Sound Card DSP's would make these better. (But I've been known to be wrong
Interesting that a Google brought up this company
CyberTron [cyberon.com.tw]
as well as IBM ViaVoice Mobility
IBM ViaVoice Mobility [ibm.com]
Simple answer... (Score:3, Insightful)
It hasn't been proven to be market viable or cost effective for sale. You need to be knocking on the doors of the PocketPC and Palm folks and give them conclusive evidense that making such a device will bring in the cash. Most capitalist companies are in business to make money, not give out charity unless it can fatten their profits through marketing. Find more people in your situation and gang up! The squeky wheel gets the grease.
However, I use an Ericsson T68m [sonyericsson.com] voice dictation which gives me roughly 25 minutes of recording time. I just transcribe when I get home.
Dependable dictation (Score:3, Interesting)
There are dictation services availiable on the net, basically you e-mail them an MP3 and they e-mail back a fully typed document.
As far as the reason for voice recognition not being on a PDA, I think it's space requirements. Of the two packages i've tried (dragon dictate and IBM) both of them require a lot of disk space to contain the recognition engine and your personal voice pattern files. Much more than your average PDA can hold. We're probably only a few years off from PDA's having that type of storage.
Text to Speech, why so crappy? (Score:2)
I recently downloaded Microsoft Reader along with a text to speech add-in and it sounded horrible. Same thing with Adobe's eBook Reader (well, their's was a little better).
But why is this so? Why is text to speech even difficult? If you just have a human person speak all the different phonetic sounds shouldn't it be a simple matter of stringing together those sounds in a relatively seemless way?
Re:Text to Speech, why so crappy? (Score:5, Insightful)
Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
It's not just a hard task for computers, but people too.
Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Reading aloud may be simple, reading it well and naturally is a skill.
Re:Text to Speech, why so crappy? (Score:2)
Heuristics? You should be able to come up with a rudimentary rule set for certain things. And really the only limit to how accurate you can get is how much time you are willing to put into refining and lengthening the number of rules.
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
How do we know? By matching key words and phrases. Is there even an attempt at this?
It's not just a hard task for computers, but people too. Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Even if it is too hard a task for a computer to leap beyond dull drap monotone for straight text to speech, do you know of any attempts at emphasis tags?
I find it really hard to beleive that this hasn't advanced at all since the 80s.Re:Text to Speech, why so crappy? (Score:2)
Re:Text to Speech, why so crappy? (Score:2)
You actually hear the voices all the time over the phone (recordings and such), but you just think it's prerecorded, and then spliced. I think part of GM's OnStar service may use TTS.
Re:Text to Speech, why so crappy? (Score:2)
No. For the complete answer take an introductory linguistics course and pester the professor.
Short answer: speech doesn't work that way. When you cut phonemes away from the surrounding ones, they no longer sound like speech and you can't string them back together - the result isn't heard as speech at all, but a bunch of random chirps and vowel sounds.
This is also part of why speech to text is so hard; the sound graph of, for example, /k/ looks completely different depending on what other phonemes are in the same syllable. (and so speech to text can't really match at the level of phoneme very well, and has to back off matches to the syllable level or longer) Sounds which we interpret as "identical" when used in speech look completely different when you plot out the frequencies involved (or take a look at the data). About the only phonemes which can be cut-and-pasted in isolation are vowels, and only the middle parts of long vowel sounds do that particularly well.
It frustrates your intuition, but the initial and final sounds of "cook" are not the same to some sound-sensing device that isn't connected to the human brain's special speech processors. That's because the human brain processes speech-like sound so that you hear as similar those sounds which require similar positions of the tongue, mouth, and other organs humans speak with. There's also noise correction in there like you wouldn't believe, which is how you can still understand stilted Hawking-like text to speech.
I suppose that the ultimate text to speech machine would run an intense physical simulation of air being forced over human vocal chords and through a human mouth with a tongue moving just right for each word, but:
Re:Text to Speech, why so crappy? (Score:2)
To sound natural, speech has to incorporate prosody and intonation as well as being able to support coarticulation.
Coarticulation refers to the fact that the sound of a phoneme (the smallest unit of linguistic sound) is affected by those that come before and after it.
It is not an easy problem, but there have been some nice advances in concatenative text-to-speech systems. For example here is a pdf [infofax.com] about IBM's approach to the problem.
We're not there yet, but things are improving.
voice server (Score:3, Insightful)
Some might say that this would make VR to slow. I don't see why this would be noticibly slower than doing VR in person. After all, when we talk on the phone the person on the other end hears us almost instananeously.
On a side note: my brother is doctor who uses VR to do his dictations. It is much cheaper than paying a transcription service. He also does not need to review the transcriptions afterwards for accuracy, because he essentially reviews it as he speaks it.
Re:voice server (Score:2)
capable of instant messaging (can you recieve those
at the same time you're on a voice call)?
Distributed Speech Recognition (Score:3, Interesting)
It is interesting that I JUST did a project on this subject for a Ubiquitous Computing class... My project was called "Distributed Speech Recognition." Here is a link:
Distributed Speech Recognition Project [duke.edu]I also have heard it through the grapevine that the big voice recognition companies are working on exactly this technology... I wouldn't be surprised if Speech .NET includes support for something like this in the near future. I believe I read on some website that support for Speech API on PocketPC was coming soon...
Tablet PC? (Score:2)
Sharp Zarus + ViaVoice (or dragon for linux?) (Score:2, Interesting)
yes yes, a scripting nightmare.. perhaps some enterprising programmers could start something on sourceforge or something..
its not like the technology isn't out there. It's certainly not perfect; the Zarus isn't big on storage space, and it's hardly cheap. and of course countless threads on the imperfection of voice recog.. blah blah.. but good enough is a fine answer on the path to [unattainable] perfection.
Anyway; Keep It Simple, Stupid:
Zarus + Microdrive + ViaVoice/Dragon libs [+ festival?] + glueware = handheld voice recognition..
what's the big deal?
It's always been my dream (Score:2, Funny)
MARCH ON MY PALM MINIONS! Go forth! And ravage the world!
*cackles deviously*
ViaVoice for PocketPC exists! (Score:5, Informative)
As usual, there are some results [google.com] that come up with a simple Google search.
There was a Dragon Naturally Speaking beta for the Newton OS 2.1, and it works OK. But it's still a beta and is far from perfect.
If you're looking for voice recognition for other PDAs, including PalmOS or Linux devices, you'll probably have much less luck.
Re:ViaVoice for PocketPC exists! (Score:2)
Sphinx might be portable, but yeah, it would probably require rewriting lot sof it to force it to live in the C++ environment the newton OS imposes.
Research is underway... (Score:5, Interesting)
Basically, they are working to analyze speech in slices (phonemes) instead of the more computationally intensive task of the whole word. This would lead to a higher success rate and could be easily used across multiple accents of the same language (English, engrish, etc).
I'm excited about what they could accomplish there.
-Cyc
Lack of a ADC/DAC is a big problem (Score:4, Informative)
Talking back? (Score:2)
Patience.... It's coming.... (Score:5, Informative)
Since the asker wanted to know WHY nobody has done this yet, I'll spell it out:
Basically the major pitfalls to developing this are: :)
1) Crappy algorithms that mangle what you really said into something unrelated
2) Power Consumption
3) Interfacing to the PDA (not hard to do, but non-trivial)
4) Limited PDA capabilities (Remember that Palm's DragonBall is a RISC architecture, and things like speech recognition NEED floating point math which must be emulated)
The solutions:
1) Somebody (not unlike me...) has to code the already existing better algorithms (check the literature - speech recognition is a mature technology, and publications abound) into a usable chunk of code, instead of simply recycling ViaVoice or NaturallySpeaking's libraries.
2) Add more battery storage.
3) Use another processor to do the conversion, then simply write it to the Palm in a serial stream.
I would just wait about a year, then ask that question again to your physician friends, and see what they whip out of their pockets... :)
Re:Patience.... It's coming.... (Score:2)
Is that a dictaphone in your packet or are you just happy to see me?
misconceptions of misrecognition (Score:2, Informative)
Have you ever wondered about how well people recogize speech. If something is blurted out at random we rarely catch the meaning first time. "What?". If humans have a lot of trouble understanding each other (about 20% error rate) then computers have no chance when it comes to out-of-the-box out-of-the-blue dictation. And computers don't have the benefit of a decade of childhood, not to mention millions of years of evolution.
What I'm getting at is that computers need a great deal of context to succeed (to reduce the number of possible interpretations, and therefore the number of ways of getting it wrong).
(I'm speech recogition engineer - our company went bust last year - another dot bomb).
1) the algorithms are good (trust me, i've seen them)
2) the training takes bloody ages - it takes weeks (and tera-bytes of data) to get good results across most of the speaking population.
3) dialogue is very hard.
4) actual recognition is fast (we had dozens of simulateous recognitions on 600Mhz machines).
The take home message: Train the users. Manage expectations. Say bye bye to HAL.
Re:Patience.... It's coming.... (Score:3, Informative)
Dragonball's Motorala's, not Palm's. It is a CISC, not RISC, more specifically a M68K. RISC is usually better than CISC at floating point, but both architectures can go without a floating point unit, and that's what Dragonball does.
Major reason: Learnout & Hauspie's corporate d (Score:2, Insightful)
Microsoft poured a bunch of cash into L&H. L&H eliminated some competition by purchasing Dragon.
L&H did some highly irregular accounting tricks, got themselves thrown in jail, and took their comapny down with them.
End result: There is only really one speech recognition vendor at this time, IBM, and they are just useless at marketing consumer products.
Keep an eye on Phillips. They are currently spending big bucks developing their Speech Magic engine.
Your other option is to find a copy of Dragon Mobile. Record an audio file on your mobile, then have it recognized on your PC.
Not enough profit .... (Score:2)
2) ????
3) ???? (not profit!!!)
Seriously, TRUE voice recognition is only 99% accutate. It is bad enough trying to make corrections on a regular key board
Why not stick to using your laptop (which has MUCH more processing power) for voice recognition for now? You'll be able to run better software (software that does TRUE voice recognition, not phrase recognition) and have enough memory to run a text editor w/ spell check after you have completed your document.
This might be a great idea, but I think it might be a little ahead of its time
Just my two cents
Some thoughts... (Score:2)
As an open-source zealot, I have to point out that Free software would be a solution here, as it is less concerned with profits. IBM seems to have open-sourced some code related to speech recognition, and there are a number of other projects out there, but even for open-source, there has to be sufficient interest in a project, and sufficient could mean _a lot_ in this case.
I think speech recognition is great, and I would use it if I used Windows. I just haven't found a good solution for XFree86 yet - not that I've looked very hard.
It's the battery (Score:4, Insightful)
EARS (Score:5, Informative)
A different solution (Score:3, Insightful)
Install voice recognition servers, network connected boxes with powerful CPUs and the best voice recognition software you can get your hands on. A voice recognition client then just needs to send the voice data up to the server and get the translation back, say 100kbps up and some tiny amount back.
The payback comes because most devices will only use voice recognition for brief periods, so will present a negligible load on the servers. The dictation users will place a higher load on the servers, but even there, I'm guessing there is a lot of pausing involved. I'm also going to guess that some lag is acceptable for dictation. Presumably the person is thinking about what they are saying and proof reading later. This load can be prioritized lower to allow better immediate response for people issuing voice commands on their mobile devices.
Power consumption on the portable device will probably improve. They will have to operate their transmitter (think "talk time" vs. "on time"), but they won't need 5 watts of CPU doing recognition. (Guessing from a mobile G3 PPC, further validated, considering that the CPU spot of my iBook gets far hotter under solid use than a cellphone.)
So, just to pick numbers out of the air, a dual processor, high end commodity hardware voice server might serve 500 pda users giving intermittent commands and 6 simultaneous dictation users.
A company or school could easily justify the hardware cost of this service.
Now, someone go out and build one.
Re:A different solution (Score:2)
Modern PDAs do not have the equivalent of a PPro 200. StrongARM 200s are about the same as a Pentium 90 for general purpose work, and right on useless for floating point work.
According to my compilation benchmarks (integer and data pushing, no floating point), a PPro 200 is right about 4 times faster than a strongarm 200. On the other hand the SA200s are little more than twice as fast as a 486-66dx2.
The very latest xscale pdas are proably about twice the speed of the SA-110 200MHz ones, but I don't have any similar hardware to benchmark.
I just unlinked my benchmark page form the web because I hadn't updated it in years, but here is a link. Fun to reminise about all those machines I thought were so fast at the time...
metastones [federated.com]
Whaddaya mean, no market? (Score:3, Insightful)
You don't have to be disabled in some way to think this'd be handy, do you? That's the story for this one person, okay. But if you hadn't heard of a PDA ever before, wouldn't this be one of the most likely functions you'd think of for them? It's a totally natural application for a handheld gadget like that, and one that really would have a natural market among all the middle manager types who made Palms so popular to start with. Right?
(Are there PDAs that can even read text in the other direction, though -- text to speech?)
Discrete is passe (Score:2, Informative)
One byproduct of this was a decrease in voice error correction performance -- Most verbal corrections are single words (e.g., the user selects the misrecognized word, "foo" and repeats the intended word "bar" without any of the coarticulation cues that the continuous recognition engine relies on). The recognition of isolated words by a continuous speech recognizer is inferior to the performance of a discrete system, yet the major software companies removed the discrete recognition engines from their products. (for more on speech errors, see this [umich.edu] or this [umich.edu] pdf).
Anyway, the use of discrete recognition engines has been essentially abandoned by the major players, and seems to have been relegated to the specialty shops that cater to disabled users. One outcome of this is that there is very little innovation related to discrete speech because it was one of (many) historical barriers to the use of desktop speech reco. I can certainly understand the resistence by the big companies to go back to an "inferior" recognition engine for handheld devices. Most likely, speech reco on the handheld will emerge in a client-server environment with the speech signal (maybe somewhat processed) being sent from the handheld to a server for recognition, and the text being returned to the handheld. We probably won't see a general purpose speech recognition application (as opposed to a limited vocab application) that runs solely on a handheld until continuous processing can be done entirely on the device.
2001 A space odyssey - HAL 9000 (Score:2, Insightful)
Bowman: "Hello, HAL? Do you read me, HAL?"
HAL: "Affirmative, Dave, I read you."
Bowman: "Open the pod bay doors, HAL."
HAL: "I'm sorry Dave, I'm afraid I can't do that."
Bowman: "What's the problem?"
HAL: "I think you know what the problem is just as well as I do."
Bowman: "What are you talking about, HAL?"
HAL: "This mission is too important for me to allow you to jeopardize it."
Bowman: "I don't know what you're talking about HAL..."
HAL: "I know you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen."
Bowman: "Where the hell'd you get that idea, HAL?"
HAL: "Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move."
Another whacky idea... (Score:2)
Speech-to-text? (Score:3, Insightful)
The poster's question brings to mind a thought I've had lately, though, on PDAs and smart mobile phones. I've recently 'switched' from a Visor to just using my Sony Ericsson T68 as an organizer. Works great with iSync, etc.
The Palm-with-phone always made more sense to me than the phone-with-organizer. It seemed that the phone part could change shape - I could stick it in my ear in the form of a headset, with a connector to the Palm. A phone I need to hold up to my head. I can't surf with something held against my head that way.
However,
I've realized that I need a phone more, and more importantly, I only enter very small bits of text into the Palm. Furthermore, I spend much more time looking up things than entering things (as I use the Mac do enter data whever possible).
This led me to the conclusion -- the one thing we are missing from the organizer/phone landscape, as the poster asked, is some kind of speech-to-text.
If I could literally hit a button and say "lunch with Dave next Tuesday" and have it enter that as live text... blammo. No more Palm, no more stylus. The phone already listens to voice commands. If it took short notes/appointments, I could literally walk around, call people, make appointments and notes, and not take the thing out of my pocket. Nice dream.
*sigh*
We're working on that. (Score:3, Interesting)
until recently, the PDA processors were not good enough, but that is changing rapidly (even though there is, in my view, little use for so much power except language technology).
The resulting dictation systems will not replace conventional keyboard input for a while, however, as recognition rates are .97-.98 (accuracy), and that's a wrong word in at least every second sentence. In comparison to low-bandwith input, however, (as in the PDA with the stylus or as in the author's case due to a fine-motor dysfunction), voice recognition is very competitive.
cheers from dublin.
Move over Harry Potter... (Score:3, Funny)
I'd say this guy found the magic combination of words to get his article posted on Slashdot. Heh.
I worked on this at MS (Score:5, Interesting)
There are a couple of reasons why this hasn't hit the market yet:
1) the PDAs really are not powerful enough to do decent recognition. Mainly, they don't have good enough audio input systems for reasonable speech quality. Also not enough disk space for dictionary storage. And the cpus are slow and the RAM is too low.
2) at least at MS it is not a top priority to make speech work for disabled users. Outrageous you say? Not so! Turns out when the speech guys approached the accessability guys on the subject, they learned that speech recognition is not workable in most cases where accessability is needed; that is to say, the market for disabled people who cannot use the keyboard but who CAN use speech input is actually quite small. Most people who don't have the motor function to type (or use some sort of keyed input like Stephen Hawking has) dont have the motor function to speak clearly enough for speech recognition to work. Bottom line: other solutions work better.
Not enough CPU (Score:3, Interesting)
Sure, a 386 could do vioce recignition, but it required a special card that not only had higher quality sound inputs, but also had some DSPs to do the hard work. When IBM put voice recignition in OS/2 they warned you that a a 486 was not enough. (Several people tried it anyway, and it worked only within narrow limits)
To emulate a DSP required a lot of floating point math. Most PDAs do not have floating point in the CPU because nothing would use it. The few times it is needed emulation is easy enough, just very slow. No problem though because as I said floating point math isn't much used.
Don't forget that PDA cpus are not designed for speed above all else. They are designed for low power, which means they have to compromise something and require extra CPU cycles to get something done.
Finially don't forget power requirements. When doing normal use the CPU is shut down most of the time, and drawing essentially no power. Voice recignition would change that, and your battery life would suffer drasticly.
Re:stephan hawking (Score:3, Informative)
Re:stephan hawking (Score:3, Funny)
Re:Because it does not work. (Score:3, Funny)
According to my boss, it's actually something called an office administrator.
Re:Because it does not work. (Score:3, Insightful)
Re:MOD PARENT DOWN! (Score:2, Insightful)
He is correct, current markets go for the majority and don't bother for the minority (excepting small speciality groups).
Unless you show one of the big players how to turn it in to a cash cow, they won't put to much time or money in to it.
Re:Dosent Voice work on PDAs. (Score:3, Funny)