Mac Version of NaturallySpeaking Launched 176
WirePosted writes "MacSpeech, the leading supplier of speech recognition software for the Mac, has canned its long-running iListen product and has launched a Mac version of Dragon NaturallySpeaking, the top-selling Windows speech recognition product. MacSpeech had made a licensing agreement with Dragon's developer, Nuance Communications. The new product is said to reach 99% accuracy after 5 minutes of training."
Minion, do my bidding! (Score:5, Interesting)
I'll have to play with Dragon at some point; I just haven't gotten around to it yet. Aside from accuracy errors, the primary issue that bothers me about speech recognition solutions I've tried is the general lack of being able to recognize speech that seems natural to humans but isn't what the system is expecting as input.
This is especially true with over-the-telephone solutions. For example, I am with Rogers Wireless carrier here in Canada, and their automated customer service system prompts you for your phone number. My last 4 digits are 2125, and it is very natural to say "twenty-one, twenty-five" when giving the number to a human being. The speech system, unfortunately, is only sophisticated enough to understand one-digit-at-a-time mode, so you have to suffer through saying "two one two five". Which isn't truly a big deal, but it's frustrating having to learn each system's unique quirks and limits. I suppose the same can be said of any technology.
Oral dictation (as opposed to fixation) is frustrating at best. Punctuation is a critical item that I can't stand dealing with. Trying to get the goddamn software to insert commas and semi-colons can be difficult enough, let alone wanting to actually insert the word "comma" into a paragraph. Then there's trying to spell out acronyms (aka "aka"), or inserting the contents between and including those parentheses. Until dictation of a document can be done with truly minimal correction and post-editing, and can be spoken at a very comfortable pace, I will stick to a keyboard.
Of course, the most entertaining aspect of watching someone else play with speech recognition is the inevitable habit of sounding completely unnatural while speaking. The monotone voice and sounding like a robot are bad enough, let alone those who think that shouting or talking ree... aaa... llll... lllyy... sloowwwww.... llly is going to help. The funniest I've seen was a woman who seemed to think that talking in cutesy baby-talk would win the system over to her side. :)
I just want a system that responds to commands via a programmable keyword. Only when speech recognition is Star Treky enough to respond to its name will I be happy. My computer will be named Minion.
Re:Isn't that... (Score:4, Interesting)
David talking about it back in 2002.
"On a more technical from I began using voice-activated software when I broke my wrist very badly about two years ago. I've found that it tends to increase the rate at which I can write while I'm actually working, but that it's more fatigue-sensitive than a keyboard. You can push your fingers further than you can push your voice when fatigue begins to blur your pronunciation and confuse the voice recognition feature of your software.
I don't think it's had a major impact on my writing style, but it does affect how I compose sentences. What I mean by that is that because the software prefers complete phrases, in order to let it extrapolate from context when it's trying to decide what word to use for an ambiguous pronunciation, I have to decide how I want a sentence to be shaped before I begin talking to a much greater extent than I had to do before I began typing."
http://sfcrowsnest.co.uk/features/arc/2002/nz5718.php [sfcrowsnest.co.uk]
At Last! (Score:5, Interesting)
Almost 10 years later - and it's finally here!
Or at least a follow up announcement is here.
I just saw these guys at macworld (Score:3, Interesting)
Re:Practical speech recognition, "House, lights on (Score:3, Interesting)
Re:Whatever became of this technology? (Score:4, Interesting)
Yes, and to follow along the same line of thought, nobody has ever come out with anything like a speech recogniser designed for programming. Personally, I always figured that a good speech recognition system for both text and commands would need to make use of sounds that don't occur as text. So, you could do something like a special double-whistle to enter command mode, or honk like a goose for undo. Likewise, you could use gibberish words as commands instead of "delete that."
Obviously, it violates the principle that all computers you can talk to should work like Star Trek. But, it seems that just like a command line interface, a spoken interface could be fantastically useful if only somebody would decide that the operator will need some instruction in a few special arcane incantations.
Then, all we'll need is an extension to C so that function prototypes include a way to express the pronunciation of a function name, so a spoken interface IDE could use something like intellisense to parse the API I am using and away we go.
Re:Talking to oneself (Score:3, Interesting)
I won't recommend "Don't use it" because it's really a personal choice - some people love it and some hate it. But I have tried 3 versions so far (including the latest)and it wasn't so much a conscious decision to stop using it as much as I just eventually stopped bothering.
I could see using it to write-up letters which is a chore Dragon is very competent once trained (not necessarily faster or even as fast as typing though) but a task I seldom engage in for extended durations.
But part of the dream of Speech Recognition is telling the computer to do this and that -- even just a simplistic version of what is in some Sci-Fi like in Star Trek -- and the computer just knows what it needs to do and does it. I'm not even talking anything as complicated as AI, just something like "look up slashdot" and it fires up the browser and goes to the site. Or while using Dragon the command won't be "Set my dentist appointment for 4:00pm Wednesday" but more like (open calendar app with mouse, put mouse on correct textbox and click) "Dentist Appointment.... Tab..... tab.... numeral 7...." (bring mouse over AM/PM selector and select PM).
This isn't something that is Dragon's fault -- I think in many years programs and OSes as well will have a number of keywords that will control them built in (if I'm not mistaken Apple has a primitive version of this but the speech recognition is crap). Dragon has great accuracy but the program is hopeless in commands and context (yes, I know it can be trained -- like a dog; a lot of effort for a few piddly tasks) and I think that's a major aspect of what many people would secretly like when they try out the program.
When the software's history involves jail terms... (Score:5, Interesting)
Even a 0.5 percent recognition failure rate is enough to make speech recognition software worse than worthless. The reason is that speech recognition software never makes a spelling mistake. Instead, the mistakes are often extremely difficult to recognize, and sometimes change the meaning in subtle ways. That's partly because when the software is confused it tries to select something that is grammatically plausible.
The result is that it has become difficult to sell speech recognition software. A high enough percentage of people in the U.S. culture know that it isn't actually useful. The orginal owners of Dragon NaturallySpeaking sold the product to a company that sold it to the company that became Nuance, maybe because they felt the product was damaging the credibility of their trademarks.
Here is a quote from the ComputerWorld story [computerworld.com] linked in the earlier Slashdot story, Is Speech Recognition Finally 'Good Enough'? [slashdot.org]:
"In 1993 two executives from Kurzweill Applied Intelligence (which pioneered SR for the medical market) went to prison for faking sales. That firm was sold in 1997 to a Belgium SR firm, Lernout and Hauspie (L&H), which was reporting phenomenal sales growth at the time. Dragon Systems, which originated DNS that year, was reporting only anemic growth, and L&H had no trouble acquiring Dragon Systems in early 2000 in a stock deal. Within a year a series of accounting frauds came to light and L&H collapsed into bankruptcy. Its SR technology was sold in late 2001 to ScanSoft Inc., which kept the DNS line going. (It was then at Version 6.0.) ScanSoft later acquired Nuance and adopted its name.
"Thereafter, "It was with the launch of Version 8.0 (in November 2004) that the market became reinvigorated and took off," said Chris Strammiello, director of product management at Nuance. "We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users. Sales have been growing at a rate of 30% yearly since then, except that we expect it to do better than 30% this year."
Read that again: "... the software actually delivered on its promises and offered real utility..." I called Nuance and was told that version 8 did not have a new recognition engine, but only had improvements in the user interface. A friend who owns and tested version 8 told me he could see no difference in accuracy between that and version 7.
So, in my opinion, Nuance has done common deceitful things that are called "Marketing":
1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still worse than useless to me, in spite of the fact that they advertise that the software is now more accurate.
How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. As I mentioned, those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example.
In the words of a Slashdot reader: "The opinions expressed here may be those of my speech recognition so
Re:Talking to oneself (Score:5, Interesting)
I am not a native English speaker, but I am usually able to say just about anything I want. In this comments, I have not altered any of the mistakes (if any) that Dragon NaturallySpeaking made while I was dictating. As you can see, the error rate is probably a bit higher than 99 per cent correctness. Nevertheless, I used this extensively, because it increases the speed at which I can work.I often have to type reports, and it goes a lot faster while using this tool. The only problem is that these reports contain lots of enterprise specific (and IT specific) terms. Naturally, it takes a while before Dragon NaturallySpeaking knows all of these terms.
Other than that, I am very happy with it.
Urgh!! Wrong PLATFORM!!!! (Score:4, Interesting)
But they are so DROPPING THE BALL. They have the best voice-rec platform. (You can think it's not good enough, but it's still the best.) What they need is to port it to Linux. Duh! Wake UP!
No, I'm not just saying the usual "Does it run on Linux?" bit. Linux is the now (and coming even more) obvious OS for small devices. When you want to talk to ANY device in your home or car, or your cell phone or PDA, you'll be talking to LINUX. THAT'S where we need a great voice-rec system. We need it ported to Linux and opened for an API. This will catapult this annoying desktop app into a present on almost everything type software device in a matter of a couple of years -- as low power devices provide enough umph to do what the heavy machines of a few years ago do.
Re:Whatever became of this technology? (Score:3, Interesting)
I was a software developer and now an IP lawyer doing patent law stuff. I quickly discovered that dictating vastly increased my productivity. Most people in software have no idea what a boon to productivity this could be, or they'd be dictating specs and pseudocode and notes all the time. I actually think that software developers should seriously think about dictating pseudocode and handing it off to newbies for implementation details. Obviously, it's more directly applicable to the types of work a lawyer does though.
In any case, because the turn-around for transcription in our firm can be a half-day to a day, I got this software to try out. It is actually amazingly good. You can tweak the settings for special spellings or acronyms, and can train special words for odd names, etc. When I don't have time to have our word processing department transcribe something, I use this, and the accuracy is very very good.
One thing most people who don't usually do dictation may not realize is that you don't get the efficiency boost unless you really just look away from the screen and dictate a good chunk, then go back for editing when done. The best is to dictate an entire document without worrying about any corrections, then come back and review it the next day for errors. With Dragon though, it's probably better to do a few paragraphs, then go back and check. If you constantly let minor corrections interrupt you, you don't get the benefit of the increased speed.
Re:Whatever became of this technology? (Score:3, Interesting)
I make more mistakes than that just from typing. Of course, I catch and correct them faster when using the keyboard than I do when dictating. How many times do you have to use the backspace key every seven lines or so?
Part of reducing mistakes is learning that dictating clearly is a different skill than typing. Just because you can type well doesn't mean that you can speak and articulate your words clearly. Dictating to a computer has more in common with giving a presentation. If you litter your speech with "um," "ahh," and "ya know," then the program will dutifully represent that. Garbage in, garbage out. What's helped me is that I have a lot of experience with public speaking and narration. I've also produced a lot of training videos for companies that I've worked for which involves recording voice overs or presenting to the camera. So I'm comfortable "talking to myself" and learning to prepare what I want to say before I begin my delivery. These are useful skills that anyone can learn.
One of the first things I did when I got the program was try to read some of the documents that I had previously produced. There were some words that it wasn't recognizing correctly, and I later realized that these words were also in my custom dictionary in Word. You can train the software on individual words so I opened up my custom dictionary and taught it all of the words in there.
When dictating I don't worry too much about the mistakes because the dictation is just to get a first draft into the computer. Once I'm done, I proofread the document and use the keyboard to make corrections. Every now and then it'll hose some word, but if it's a word that I know that it knows, I'll just say the word "correction" and repeat the word clearly so that I know to fix that when editing the document. If it's a word that it just keeps getting stuck on I can select it and train it on the spot, or just type the correct word and then keep dictating. I usually take the latter approach so I don't get too distracted from dictating. But, this is a rare occurrence. As you keep tweaking its recognition, it gets better.
Just such a microphone headset comes with the program when you buy it. It works well since it's a unidirectional mic [wikipedia.org] and needs to be close to the sound source to pick up sound. I've used it in an environment with noise, including at work with other people around and at home with the TV on, and I haven't had any problem with it recognizing what I was saying.
Or you just learn to say "microphone off" and it turns off the recognition engine. It can tell if you are saying it as a command or if it's part of a sentence that you are dictating and do the right thing. The program can recognize a bunch of different commands and apply them depending upon which program you are using. I must admit that I don't really use this feature. Browsing the
Re:When the software's history involves jail terms (Score:3, Interesting)