Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Input Devices Software Apple

Mac Version of NaturallySpeaking Launched 176

Posted by kdawson
from the listen-what-i-say dept.
WirePosted writes "MacSpeech, the leading supplier of speech recognition software for the Mac, has canned its long-running iListen product and has launched a Mac version of Dragon NaturallySpeaking, the top-selling Windows speech recognition product. MacSpeech had made a licensing agreement with Dragon's developer, Nuance Communications. The new product is said to reach 99% accuracy after 5 minutes of training."
This discussion has been archived. No new comments can be posted.

Mac Version of NaturallySpeaking Launched

Comments Filter:
  • ai (Score:2, Funny)

    by User 956 (568564)
    MacSpeech, the leading supplier of speech recognition software for the Mac, has canned its long-running iListen product and has launched a Mac version of Dragon NaturallySpeaking

    Tell me more about has launched a Mac version of Dragon NaturallySpeaking.
    • by arazor (55656)
      Teach me of fire mancub.
    • At Last! (Score:5, Interesting)

      by Slurpee (4012) on Wednesday January 16, 2008 @04:18AM (#22064188) Homepage Journal
      I was at the Apple Dev conference in 1999 (or so) when the CEO of Dragon got up during Steve's keynote and announced that they were going to develop a Mac version of Dragon.

      Almost 10 years later - and it's finally here!

      Or at least a follow up announcement is here.

      • Re: (Score:2, Informative)

        by Chaset (552418)
        Actually, almost 10 years ago, there WAS Dragon Naturally Speaking for Mac. I bought it, and its upgrade when it came out. (Unless my brain is totally whacked and it was some other voice recognition package for Mac) It came with a headset in the box, too. I'm sure that version is what that rep was talking about. It's funny... all these comments, and I didn't notice any high-scoring comments pointing out that there already WAS a voice recognition package for Mac years ago.
        • Re: (Score:3, Informative)

          by Slurpee (4012)
          What you are thinking of is "Dragon Power Secretary" which was available for early Macs in the early 90s - but dropped (way before OS X). The WWDC announcement came when OS X was also being announced in 1999. The announced product at WWDC never came out.

          I was able to find this press release:

          WWDC--SAN JOSE, Calif. and NEWTON, Mass.--(BUSINESS WIRE)--May 10, 1999--

          Photo will be available at 2:30 pm EST on Associated Press via Business Wire

          Dragon Systems, Inc. and Apple(R) Computer, I
  • Talking to oneself (Score:5, Informative)

    by flyingfsck (986395) on Wednesday January 16, 2008 @03:14AM (#22063844)
    I tried Dragon a number of times, but it feels too much like talking to oneself. Training it is a chore too. 99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic.
    • Re: (Score:2, Insightful)

      by Seumas (6865)
      I tried Dragon years ago and after a couple hours or so of training, it still completely sucked. Same with IBM Via Voice. Perhaps Google will help improve things with their GOOG411 service that they're using to build up a massive bank of phonetics. Otherwise, it seems like real speech recognition is never seriously going to get off the ground.
      • by rucs_hack (784150) on Wednesday January 16, 2008 @06:52AM (#22064896)
        I tried it a few years back. I stopped when my youngest, who was still learning to talk started going round the house saying 'mousegrid' all the time.

        Good job he didn't get the whole thing though, which was typically.

        "Mousegrid...."

        "Mousegrid...."

        "MOUSEGRID!!...."

        "FUCKING MOUSEGRID YOU PIECE OF SHIT PROGRAM!!!"

      • Re: (Score:3, Insightful)

        by LMacG (118321)
        > I tried Dragon years ago

        Yeah, software never gets better or anything. And faster processors and more memory surely couldn't help.
        • by mbourgon (186257)
          Actually, I seem to remember that Dragon _won't_ get better over the years - the core software and algorithms haven't changed any, they just got bought and changed hands, and (again, IIRC) the people who bought it didn't know how it worked, just that it did, and that they owned the code to do so.

          All I can find offhand is the wiki page, which says "Lernout & Hauspie bought Dragon Systems in 2000. The dictation system bubble burst in 2001, and Lernout & Hauspie had a spectacular bankruptcy. ScanSoft I
    • Re: (Score:2, Funny)

      by alex4u2nv (869827)
      Training is tough because they replaced the iListen package with iStoppedListening.

      Also, its use may be weak in dictating a paper,but it's great for dictating a command.

      Think about it, you could walk up to your iComputer and say "Main Screen Turn on!!"
      instead of pressing the power button.
    • Re: (Score:3, Interesting)

      by rolfwind (528248)
      I used it too a number of times - I probably have an accuracy rate not much better than 99% typing - I'm a clutz. But whereas fixing in middle of typing is pretty smooth and not too time consuming - Dragon makes it a chore over every little mistake.

      I won't recommend "Don't use it" because it's really a personal choice - some people love it and some hate it. But I have tried 3 versions so far (including the latest)and it wasn't so much a conscious decision to stop using it as much as I just eventually stop
      • by CastrTroy (595695)
        I would recommmend "don't use it" in an office environment, or any other environment where people can hear you speaking. Nothing more annoying than listening to somebody else say "Dentist appointment.... Tab.... Tab.... numeral 7..." all day long.
      • by ubrgeek (679399)
        If I'm not mistaken Apple has a primitive version of this but the speech recognition is crap.

        Actually, from my experience it's pretty good, at least for short expressions. I've got mine set-up to do things exactly like your Slashdot example. (I tell it "Browser slashdot" and it works great (I'm guessing because it knows "browser" means that I want the word right afterward to mean the phonetic term "Slashdot" that I've previously told it meant the Website, not "/ .") It's also useful for things like lau
        • Re: (Score:3, Informative)

          by samkass (174571)
          Yeah, Apple's speech recognizer has very dissimilar goals to Dragon's (although both, if I recall correctly, got their start at Carnegie Mellon's speech labs). Apple is trying to build a speaker-independent, no-training-required recognizer that can handle short commands. Dragon doesn't care as much about speaker-independent, but requires accuracy over sentences and paragraphs. Very different algorithmic, HCI and optimization problems.

      • by gobbo (567674)

        [...]But part of the dream of Speech Recognition is telling the computer to do this and that -- even just a simplistic version of what is in some Sci-Fi like in Star Trek -- and the computer just knows what it needs to do and does it. I'm not even talking anything as complicated as AI, just something like "look up slashdot" and it fires up the browser and goes to the site. [...]
        This isn't something that is Dragon's fault -- I think in many years programs and OSes as well will have a number of keywords that will control them built in (if I'm not mistaken Apple has a primitive version of this but the speech recognition is crap). [...]

        I used to do this on a mac running OS8 in the '90s, using the built in commands and a system-wide macro utility called KeyQuencer (hey, that was a really great app). "Computer: check mail" etc. Not due to accessibility problems on my own, just geeking out. Once you extended it with scripting, it was pretty amazing for repetitive tasks, and I never had problems with recognition, once I got the hang of it.

        My impression is that Apple hasn't developed it at all, and their SR technology is stuck in the '90s.

    • by xtracto (837672)

      I tried Dragon a number of times, but it feels too much like talking to oneself. Training it is a chore too. 99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic.

      99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic.

      99% accuracy means that for every 100 words (a paragraph) you will have a wrong word. Now, that accuracy is in the "optimal conditions" and talking at a specific phase. The problem with the other 1% is that the wrong word might not be even related to the text (whereas when you are writting, the error is mostly in spelling).

      Personally, last t

    • noocular (Score:3, Funny)

      by Hognoxious (631665)
      I don't think dictation's the solution. If you're discelyc what you really need is a spielchucker.

      And what about about people who speak dyslexically? Yes, Dubya, as it happens I am looking at you.
    • Re: (Score:2, Funny)

      by Sox2 (785958)
      hey, i'm using it now. It wonks fine.
    • by duvel (173522) on Wednesday January 16, 2008 @08:06AM (#22065350) Homepage
      I am entering this comment while using Dragon NaturallySpeaking version 8.

      I am not a native English speaker, but I am usually able to say just about anything I want. In this comments, I have not altered any of the mistakes (if any) that Dragon NaturallySpeaking made while I was dictating. As you can see, the error rate is probably a bit higher than 99 per cent correctness. Nevertheless, I used this extensively, because it increases the speed at which I can work.I often have to type reports, and it goes a lot faster while using this tool. The only problem is that these reports contain lots of enterprise specific (and IT specific) terms. Naturally, it takes a while before Dragon NaturallySpeaking knows all of these terms.

      Other than that, I am very happy with it.

      • by autophile (640621)

        Out of curiosity, how long did it take you to dictate the comment?

        --Rob

      • Interestingly, it sounds like you dictated it when I read your comment. And you "chopped" your diction when you did it. It's perfectly understandable (unlike half the comments here) but it is different.

        I've found the same thing when trying various dictation programs over the years (mostly one version or another of Dragon). After a while, it works, but it just doesn't flow the same and it interrupts my train of thought (such as it is). It feels like what I have to do when talking to our outsourced-to-

  • Isn't that... (Score:2, Informative)

    by Sylos (1073710)
    the whole intention of Dragon? For those people who *are* impaired in some way or another? I mean...I could never "speak" out a paper or something. I'd end up tearing my vocal cords out.
    • Yes its useful for those people.

      Its also incredibly useful for people who cant shut up.
      I know quite a few people like that. ;)
      • I'm one of those people, but I wouldn't use it to enter text into a computer. It seems better suited as a transcription device.
    • Re:Isn't that... (Score:4, Interesting)

      by Propaganda13 (312548) on Wednesday January 16, 2008 @04:07AM (#22064138)
      David Weber http://www.baen.com/author_catalog.asp?author=DWeber [baen.com] uses voice recognition software for writing novels.

      David talking about it back in 2002.
      "On a more technical from I began using voice-activated software when I broke my wrist very badly about two years ago. I've found that it tends to increase the rate at which I can write while I'm actually working, but that it's more fatigue-sensitive than a keyboard. You can push your fingers further than you can push your voice when fatigue begins to blur your pronunciation and confuse the voice recognition feature of your software.

      I don't think it's had a major impact on my writing style, but it does affect how I compose sentences. What I mean by that is that because the software prefers complete phrases, in order to let it extrapolate from context when it's trying to decide what word to use for an ambiguous pronunciation, I have to decide how I want a sentence to be shaped before I begin talking to a much greater extent than I had to do before I began typing."
      http://sfcrowsnest.co.uk/features/arc/2002/nz5718.php [sfcrowsnest.co.uk]

    • by ari_j (90255)
      I see it as being targeted more for, and more useful to, people who normally dictate instead of typing: lawyers, doctors, etc. The problem is that part of the joy of dictation is that you don't have to do the formatting yourself. For instance, a lawyer can dictate a letter and let his secretary (whose time isn't billed out at hundreds of dollars an hour) type it, format it, and print it on the lawyer's letterhead. Speech recognition software can't do that, so its only advantage is the extent to which it
  • by tieTYT (989034) on Wednesday January 16, 2008 @03:21AM (#22063890)
  • ...Mac users will have no trouble chatting with their computer for 5 minutes. Think of how accurate the system will be if the users got into a heated debate!
  • Will it recognize metrosexual accents?
    • by bobdotorg (598873) on Wednesday January 16, 2008 @03:53AM (#22064070)
      Will it recognize metrosexual accents?

      Yes, select the check box: preferences/language settings/accent/Fanboi/Apple

      This is the Mac equivalent to your current setting:

      options/language setings/accent/Troll/WindowsME
    • Re: (Score:2, Funny)

      by Wiseman1024 (993899)
      Lol, Apple iFanboys are wasting their mod points on this. Better keep them busy here rather than have them influence meaningful discussion.
  • by lhaeh (463179) on Wednesday January 16, 2008 @03:37AM (#22063972)
    The last time I tried using voice dictation was When I was running OS/2 Warp 4. Training took forever, and the experience of using it was nothing but an exercise in frustration, ending with me screaming at the bloody thing then seeing neat, yet random expletives on my screen. I later came across some budget software that required no training, yet worked surprisingly well compared to the $400 packages made by the big boys. That software really showed what voice diction should be like, if only it was developed further.

    The training an accuracy seem like things that can be overcome, but I would really like to see a solution for things like punctuation and function keys, things that don't naturally come with speaking. Instead of having to say "delete that" or " delete" it would be nice to just have a button that I can hold down when saying things I want interpreted as commands.
    • by jimicus (737525) on Wednesday January 16, 2008 @04:14AM (#22064160)
      A few things became of the technology:

      1. 99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.
      2. 99% accuracy rate is only achievable under ideal circumstances - ie. using a top quality microphone hooked up to a good soundcard in an environment with very little background noise and no echo. Basically, circumstances you only get in a half-decent recording studio. In the real world, you seldom get this.
      3. Unless you happen to be blessed with amazing self-discipline (and/or can guarantee that nobody is going to approach you while you're working). Otherwise you get back to work after a distraction and find yourself having to delete a conversation you just had with a co-worker.
      4. If you're in an open-plan office (that's probably about 99% of UK offices these days) your colleagues will not thank you for spending all day talking.
      • I had about a 98% accuracy rating with the included microphone and no sound card.
        • by jimicus (737525)
          That extra 1% is the part that's difficult to get.

          If you'd said "I got 98.995% accuracy with the included microphone", I'd be more interested.
      • This software's history includes jail terms. Speech recognition has gotten an extremely bad reputation for being worthless garbage, maybe because it is worthless garbage.

        Even a 0.5 percent recognition failure rate is enough to make speech recognition software worse than worthless. The reason is that speech recognition software never makes a spelling mistake. Instead, the mistakes are often extremely difficult to recognize, and sometimes change the meaning in subtle ways. That's partly because when the software is confused it tries to select something that is grammatically plausible.

        The result is that it has become difficult to sell speech recognition software. A high enough percentage of people in the U.S. culture know that it isn't actually useful. The orginal owners of Dragon NaturallySpeaking sold the product to a company that sold it to the company that became Nuance, maybe because they felt the product was damaging the credibility of their trademarks.

        Here is a quote from the ComputerWorld story [computerworld.com] linked in the earlier Slashdot story, Is Speech Recognition Finally 'Good Enough'? [slashdot.org]:

        "In 1993 two executives from Kurzweill Applied Intelligence (which pioneered SR for the medical market) went to prison for faking sales. That firm was sold in 1997 to a Belgium SR firm, Lernout and Hauspie (L&H), which was reporting phenomenal sales growth at the time. Dragon Systems, which originated DNS that year, was reporting only anemic growth, and L&H had no trouble acquiring Dragon Systems in early 2000 in a stock deal. Within a year a series of accounting frauds came to light and L&H collapsed into bankruptcy. Its SR technology was sold in late 2001 to ScanSoft Inc., which kept the DNS line going. (It was then at Version 6.0.) ScanSoft later acquired Nuance and adopted its name.

        "Thereafter, "It was with the launch of Version 8.0 (in November 2004) that the market became reinvigorated and took off," said Chris Strammiello, director of product management at Nuance. "We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users. Sales have been growing at a rate of 30% yearly since then, except that we expect it to do better than 30% this year."

        Read that again: "... the software actually delivered on its promises and offered real utility..." I called Nuance and was told that version 8 did not have a new recognition engine, but only had improvements in the user interface. A friend who owns and tested version 8 told me he could see no difference in accuracy between that and version 7.

        So, in my opinion, Nuance has done common deceitful things that are called "Marketing":

        1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still worse than useless to me, in spite of the fact that they advertise that the software is now more accurate.

        How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. As I mentioned, those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example.

        In the words of a Slashdot reader: "The opinions expressed here may be those of my speech recognition so
        • This software's history includes jail terms.


          Citation and relevance, please.
        • by not_anne (203907)
          Sometimes software is just a useful solution to a need.

          When I was in college, I had to have unscheduled surgery on my right elbow the week before the final paper was due for my Anthropology class. The outline for the paper was complete; I just needed to type it out. Being in a cast after surgery made typing impossible but with one hand. I purchased and used DNS to finish my paper. The software was easy to use and errors were minimal, even for an Anthro paper with lots of jargon.

          I got an A on the paper.
        • I'm a radiologist who uses a Nuance product for several hours a day, every day, and my experience has been overwhelmingly positive. Whereas I used to waste a great deal of time editing and correcting mistakes by human transcriptionists, I only occasionally have to manually correct the Nuance transcriptions. Our throughput and efficiency have increased considerably since we started with the product, and there is absolutely no way that I'd ever return to the previous system. The adoption of speech recognition
          • "... uses a Nuance product..."

            Which Nuance product are you using? Is it a special medical version of Dragon NaturallySpeaking? Those are much more expensive, and I've never tried them. They also use special dictionaries provided by Nuance.

            I've heard about success with that use. Partly the success seems to be due to the fact that there is never confusion about what you said, so that mistakes are easily corrected. Another reason is that technical words are much more easily recognized.

            Speech recogniti
      • by dpbsmith (263124)
        99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.

        Right, and my brief experience trying out ViaVoice convinced me that even that observation underestimates the seriousness of the problem.

        1) You don't necessarily notice the errors as you make them.

        2) Correcting errors takes a surprisingly large amount of attention and labor, as well as being a distraction from the real task.

        3) Correcting errors
      • Re: (Score:3, Interesting)

        by torokun (148213)
        Although your comments about open offices may be true (it may be a problem with colleagues), I offer my thoughts:

        I was a software developer and now an IP lawyer doing patent law stuff. I quickly discovered that dictating vastly increased my productivity. Most people in software have no idea what a boon to productivity this could be, or they'd be dictating specs and pseudocode and notes all the time. I actually think that software developers should seriously think about dictating pseudocode and handing it
      • Re: (Score:3, Interesting)

        by Matt Perry (793115)
        I've been using the Naturally Speaking 9 Medical for the last eight months. I bought it to reduce the amount of typing I have to do for lengthy papers and documentation because of RSI injuries. I have a few responses based on my own experience.

        1. 99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.

        I make more mistakes than that just from typing. Of course, I catch and correct them faster when

    • by forkazoo (138186) <wrosecrans AT gmail DOT com> on Wednesday January 16, 2008 @05:29AM (#22064512) Homepage

      The training an accuracy seem like things that can be overcome, but I would really like to see a solution for things like punctuation and function keys, things that don't naturally come with speaking. Instead of having to say "delete that" or " delete" it would be nice to just have a button that I can hold down when saying things I want interpreted as commands.


      Yes, and to follow along the same line of thought, nobody has ever come out with anything like a speech recogniser designed for programming. Personally, I always figured that a good speech recognition system for both text and commands would need to make use of sounds that don't occur as text. So, you could do something like a special double-whistle to enter command mode, or honk like a goose for undo. Likewise, you could use gibberish words as commands instead of "delete that."

      Obviously, it violates the principle that all computers you can talk to should work like Star Trek. But, it seems that just like a command line interface, a spoken interface could be fantastically useful if only somebody would decide that the operator will need some instruction in a few special arcane incantations.

      Then, all we'll need is an extension to C so that function prototypes include a way to express the pronunciation of a function name, so a spoken interface IDE could use something like intellisense to parse the API I am using and away we go.
      • not exactly true. There's a very interesting project called voice coder developed at nrc-it in Canada. It translates limited English expressions into code. The reason that approach was chosen is because most software using the current style of bmpyNms is literally unpronounceable and would require spelling out letter by letter which does incredible damage to your voice as well as your temper. As I pointed out elsewhere, the complete and total lack of a backdoor API also makes it extremely difficult to
  • by Anonymous Coward on Wednesday January 16, 2008 @03:37AM (#22063974)
    "Computer... computer... hello computer?"
  • by Library Spoff (582122) on Wednesday January 16, 2008 @03:39AM (#22063982) Journal
    Am oosing it two type this comment. Didn't knead the fave mins train ming though...
  • i know the answer. No it doesn't.

    I own a copy of dragon 9 but having to reboot into windows to use it makes it too much of a hassle. Wine doesn't seem to handle it either.

    It actually works quite well, although mileage may vary depending on the sound quality you get from your microphone, soundcard setup.
    • Have you heard of VMWare ?
      • by markdavis (642305)
        In his case, that might be OK.

        But for the rest of us- we choose to use Linux because we want to use Linux. For most Linux users, it doesn't make much sense to buy and install MS-Windows and Dragon to use in the free/open Virtualbox or the proprietary/closed VMware. With such a model, you cannot use the speech recognition in the Linux applications.
    • by markdavis (642305)
      Probably not.

      But I, personally, know several people that would buy a Linux version of Natural Speaking... including myself.

      Perhaps the Mac version would be easier to port? Don't know. Best thing to do is send them Email saying you would pay for a Linux version. I did: questions@macspeech.com
  • by Anonymous Coward on Wednesday January 16, 2008 @03:46AM (#22064016)

    I'll have to play with Dragon at some point; I just haven't gotten around to it yet. Aside from accuracy errors, the primary issue that bothers me about speech recognition solutions I've tried is the general lack of being able to recognize speech that seems natural to humans but isn't what the system is expecting as input.

    This is especially true with over-the-telephone solutions. For example, I am with Rogers Wireless carrier here in Canada, and their automated customer service system prompts you for your phone number. My last 4 digits are 2125, and it is very natural to say "twenty-one, twenty-five" when giving the number to a human being. The speech system, unfortunately, is only sophisticated enough to understand one-digit-at-a-time mode, so you have to suffer through saying "two one two five". Which isn't truly a big deal, but it's frustrating having to learn each system's unique quirks and limits. I suppose the same can be said of any technology.

    Oral dictation (as opposed to fixation) is frustrating at best. Punctuation is a critical item that I can't stand dealing with. Trying to get the goddamn software to insert commas and semi-colons can be difficult enough, let alone wanting to actually insert the word "comma" into a paragraph. Then there's trying to spell out acronyms (aka "aka"), or inserting the contents between and including those parentheses. Until dictation of a document can be done with truly minimal correction and post-editing, and can be spoken at a very comfortable pace, I will stick to a keyboard.

    Of course, the most entertaining aspect of watching someone else play with speech recognition is the inevitable habit of sounding completely unnatural while speaking. The monotone voice and sounding like a robot are bad enough, let alone those who think that shouting or talking ree... aaa... llll... lllyy... sloowwwww.... llly is going to help. The funniest I've seen was a woman who seemed to think that talking in cutesy baby-talk would win the system over to her side. :)

    I just want a system that responds to commands via a programmable keyword. Only when speech recognition is Star Treky enough to respond to its name will I be happy. My computer will be named Minion.

    • Minion, inform the family I love them.
    • Minion, crawl the web for the highest quality, free pr0n you can find
    • Minion, order me my favourite pizza. Oh, and hack a credit card number from the net to pay for it.
    • Minion, tell some slashdoters off for me. Make sure it's worthy of +5 funny.
    • Re: (Score:3, Insightful)

      by LordLucless (582312)
      I used Dragon Naturally Speaking for a while ages ago, and you could program it to respond to its name. Or rather, you setup a "start" sound that would indicate activate the listening algorithm. I had mine set to respond to "computer", but "minion" would work just as well.

      I stopped using it after I accidentally left it on in training mode one day, when I was teaching it the word "bonza". The pet lorikeet outside my room made such a wide variety of noises, that from that time forth, it thought every word I
      • Re: (Score:2, Informative)

        by Narcogen (666692)
        MacOS has had a built-in feature called Speakable Items that does exactly this, and as an option you can have it respond only to things said after a specific key word-- in essence, the machine's name. "Minion" would work fine.

        It is not true dictation. Essentially you create a script and give it a name. When your speech is recognized as the name of a corresponding script, the script is executed.

        You can even make scripts that required multiple inputs. Some of the built-in ones in the Mac OS 9 days were knock
        • That's essentially all software dictation is - it recognizes the pattern of your speech, and executes the corresponding instruction (prints the correct word). The thing that really defines quality software is the accuracy of its comparison algorithm, and the speed of its learning algorithm. But essentially they do the same as you describe, just with a much larger search space.
    • Re: (Score:2, Funny)

      by andrewjhall (773595)
      I think I'd name mine Igor. Then, assuming I can find the right USB widgets, I can shout "Igor! Raise the lightning rod and find me a fresh brain" - at which point my life's final ambition will have been achieved.

      That said, the USB iBrainExtractor is probably as much of a technical challenge as producing speech recognition that isn't a pain in the ass.
    • by mwvdlee (775178)
      "twenty-one, twenty-five" = 201205.
      Why do you expect a computer to get this right when humans don't?
  • by Anonymous Coward on Wednesday January 16, 2008 @03:47AM (#22064032)
    iIt iworks iso iwonderfully iand iintegrates iwell iinto ithe iother iiproducts.
  • > The new product is said to reach 99% accuracy after 5 minutes of training.

    According to MacSpeech, I suppose?

    I'll bet what was said was something 99% different to what MacSpeech thought.
  • by Capt'n Hector (650760) on Wednesday January 16, 2008 @04:19AM (#22064204)
    I was a bit put off by their pricing scheme. It's $50 off the normal price (something like $200) if you buy it at macworld. The only problem is that it's a pre-order, so you can't try before you buy. Also, nobody has reviewed the software, since it doesn't exist yet, so if it turns out to be a stinker you're out $150. And if you don't like the product, their tech support will try and "walk you through" your problem to make it go away. They explicitly said "no refunds". No, thanks.
    • My mother can't type (rheumatoid arthritis) so I bought DNR for her. She couldn't get past the first training sentence, it simply would not recognize her voice, but it worked perfectly well for me. So we called up ScanSoft and tried everything they suggested, and went back to them: No refunds.
      • So we called up ScanSoft and tried everything they suggested, and went back to them: No refunds.

        Thanks for the warning - I won't even bother trying then.

        You might see what your state has to say about defective products, merchantability and such. Your State's AG might have some info. Maybe you can just ask Scansoft what your State's AG would have to say on the matter.
  • So I've always wanted to rig my house up with voice commands. My guess is I need the following:
    • *Simple* speech recognition. I want it to react to a keyword ("Computer", or "House", or similar sci-fi-ey) and then a few simple commands. Sphinx-2 [sourceforge.net] seems ideal, but I'd need good dictionary files.
    • Ubiquitous microphones (preferably exclusively usable by the speech recognition engine. Setting proper /dev permissions will help). Probably the most difficult/expensive to get right; it needs to work in noisy environ
    • Back in the late 90's using only Applescript and the Apple built in speech recognition I was able to voice automate my music library. I don't remember all the details, but I could start and stop the music and select what artist I wanted to hear. It was pretty neat being able to say "Computer, play Nirvana" and getting my music all from the comfort of my bed.
    • *Simple* speech recognition. I want it to react to a keyword ("Computer", or "House", or similar sci-fi-ey) and then a few simple commands. Sphinx-2 seems ideal, but I'd need good dictionary files.

      Be careful what you use as the trigger, or else you won't be able to use the words "House" or "Computer" in any conversation while at home without the house thinking you are trying to command it, and starting the dishwasher or something. I suppose you could always name your house something sci-fi-ish, or fantasy-

  • by Caspian (99221) on Wednesday January 16, 2008 @05:50AM (#22064614)
    I've worked with Nuance's server product in the Dragon NaturallySpeaking line as a developer. Their API is confusing, their speech recognition SUCKS, and their software bugs out in bizarre ways. It's also slow as a dog, and advanced functionality (like recognizing from wav files, as opposed to from a live audio stream) is so poorly implemented as to seem bolted on.

    And the worst part? Nuance has a virtual monopoly in realistically priced (read: "in a budget that a normal small-to-medium-sized business can afford") general-purpose speech recognition systems. If I recall correctly, they bought out Lernout and Hauspie's speech recognition products and IBM's old consumer-level speech-recognition stuff. So you can't take your business elsewhere; there is no "elsewhere".

    I loathe those guys.
    • "Nuance has a virtual monopoly in realistically priced (read: "in a budget that a normal small-to-medium-sized business can afford") general-purpose speech recognition systems."

      Not really. The best software as usual is free and Open Source. The trouble is that (1) It lacks a marketing budget so few people know about it and (2) it is software "by Phds and for Phds" meaning that it is not packaged with a slick installer and GUI.

      The Mac and PC software we are talking about here mgame out of CMU decades ago a
      • by Caspian (99221)
        I looked into that stuff. It was so un-user-friendly as to be worthless. I can manage sendmail... this stuff is harder to use and more opaque than sendmail.
  • Accessibility (Score:4, Insightful)

    by Selanit (192811) on Wednesday January 16, 2008 @06:53AM (#22064900)
    Five minutes training for most people, but not everyone. My boss uses Dragon NaturallySpeaking, and it took him nearly two weeks to complete the five-minute training due to some complications.

    Namely, he's blind. He cannot read the training phrases off the screen, because he can't see them. Instead he had to have a screen reader (JAWS in this case) read the phrases aloud to him so that he can repeat them back. But of course, Dragon was not expecting to hear audio input from anything other than the user, so that confused things. There were problems even using a headset. And since he can't actually use the program at all without having the screen reader running, it was pretty awful trying to get the training done. I'm not even sure how he finally managed to do it - I suspect he probably got a sighted friend to help. Thankfully the training files can be copied from one computer to another so you don't need to retrain it on each different installation.

    Once the training was finally finished, it worked well. He has poor fine motor control as a result of leukemia treatments - he can type, but only slowly and with a high error rate. His speech is slightly slurred as well, which reduces the accuracy of the transcription. Even so, the Dragon transcriptions are definitely better than manual typing. It's helped him a lot.

    I just wish that the Dragon programmers would come up with a more easily accessible training routine. There aren't a whole lot of users with the same disabilities as my boss, but for the few like him having good, well-trained dictation software is vital. With it, he can control his computer reasonably well, if rather more slowly than a sighted person with normal motor control. Without it, using the computer is basically impractical. When he can't use Dragon, sending a single rather short email can take upwards of an hour.
  • Understanding 99% of what I say correctly after 5 minutes is a lot better than the developers do...
  • by Tibor the Hun (143056) on Wednesday January 16, 2008 @08:27AM (#22065548)
    This is fantastic news for those who need extra accessibility features.
    It may be fine for you or me to hit any key, but there are many other folks with various disabilities for whom such a task is not an easy one. So it may make more sense for them to use their voice and move on.

    If any of us were to lose fingers or hands in an accident, I bet we'd all be using something like Dragon to continue our work, rather than try to become a tap dancer.

    And let's not forget about accessibility in the workplace. This is great news for Mac shops, as now there is one less reason for having to support a rogue Windows machine...
    • by timftbf (48204)
      Thank you. It winds me up seeing the product getting a slamming because it's "only" 99% accurate, or because "it sucks - so much better to type". While they might be marketing it at people who are too lazy to type, or who think it's cool to talk to their computer, it's an absolute boon for people who really *can't* type.

      My wife has been through bouts of severe RSI, and while a lot of the time she can now manage with a specialist keyboard, Dragon kept her able to work and to communicate through a long bad
  • by wonkavader (605434) on Wednesday January 16, 2008 @08:34AM (#22065602)
    It's fine to port this to the Mac. Fine. Good. Whoopie.

    But they are so DROPPING THE BALL. They have the best voice-rec platform. (You can think it's not good enough, but it's still the best.) What they need is to port it to Linux. Duh! Wake UP!

    No, I'm not just saying the usual "Does it run on Linux?" bit. Linux is the now (and coming even more) obvious OS for small devices. When you want to talk to ANY device in your home or car, or your cell phone or PDA, you'll be talking to LINUX. THAT'S where we need a great voice-rec system. We need it ported to Linux and opened for an API. This will catapult this annoying desktop app into a present on almost everything type software device in a matter of a couple of years -- as low power devices provide enough umph to do what the heavy machines of a few years ago do.
    • by gstoddart (321705)

      No, I'm not just saying the usual "Does it run on Linux?" bit. Linux is the now (and coming even more) obvious OS for small devices. When you want to talk to ANY device in your home or car, or your cell phone or PDA, you'll be talking to LINUX. THAT'S where we need a great voice-rec system. We need it ported to Linux and opened for an API.

      But, really, what is the incentive for a commercial organization to do this? They're not gonna get paid for it. So, other than some altruistic reason so that everyone ca

      • by AnyoneEB (574727)

        He did not ask for a free edition for Linux. In fact, the applications he suggested were for various embedded systems which run Linux which most users would not be modifying. I am sure the hardware developers can handle licensing fees for their own devices if they think voice recognition is worth the cost.

        In short: "For Linux" does not mean "For free".

  • by benmhall (9092) on Wednesday January 16, 2008 @08:38AM (#22065634) Homepage Journal
    My wife needed voice dictation software a year or two ago. She had been a Linux user. I gave her my PowerBook and bought iListen for her. It was terrible. And it was a resource hog. It used the Philips engine and, even with extensive training, was the pits. We even tried several high-quality mics to no avail.

    She went from my G4/1.5GHz/1.25GB RAM PowerBook running iListen to Dragon NaturallySpeaking 8 on an IBM ThinkPad T23. (P3 1GHz, 768MB RAM, WinXP.) The difference was night and day. Not only did Dragon run much faster on the lowly P3, but the quality of speech recognition was _much_ better. As a result of this, she's now back to being a Windows user with Dragon.

    At least it looks like our iListen purchase won't be a complete waste, as we can use it to upgrade to NaturallySpeaking for Mac. I'm glad that MacSpeech has killed iListen. It needed it. It was an embarrassment compared to Dragon.

    Speech recognition has been a big hole in the Mac's software line-up. It looks like that is finally coming to an end. Now if only someone would release something that works for Linux.* I know that we'd have paid $200 for something approaching Dragon 8's capabilities.

    ----
    *Yes, I know about IBM ViaVoice. Good luck getting that to work on any recent distribution. I also know about Sphinx. Unfortunately, it seems to be a perpetual research tool rather than an end-user program.
  • Dragon had a Mac product once before [thefreelibrary.com] - Dragon Power Secretary. It was tied to specific apps. Didn't get much updating or new versions after the initial release and died an agonizing death.
  • Here's hoping they support Linux next.
  • I have used their speech synthesis products and they're quite impressive. I used one of the voices to dictate a textbook into an MP3 file so that I could then do a book-on-tape type thing to play my textbook in my car. The pronunciation was generally pretty good. I had to define the pronunciation of a few words here and there (it had problems with some of the less common geek words, like "macromolecular"). But after giving it the proper pronunciations, it was quite excellent. The voice sounded natural a goo
  • ... has launched a Mac version of Dragon NaturallySpeaking ... The new product is said to reach 99% accuracy after 5 minutes of training.

    About that 99% ... in honor of his Steveness, who tested the software while writing a recent keynote, one out of every 100 words is "Boom." But in true Mac fashion the options panels are severely minimalist, so the Boom feature cannot be disabled.

    • by russotto (537200)

      But in true Mac fashion the options panels are severely minimalist, so the Boom feature cannot be disabled.


      You can turn it off with this command:

      defaults write com.nuance.dragon boomtoday false

      But you have to run it every day, as there's ALWAYS "boom" tomorrow.
  • by esj at harvee (7456) on Wednesday January 16, 2008 @10:32AM (#22066898) Homepage
    Reading the comments I'm see a bunch of tabs[1] with no clue about being disabled, the speech recognition market, the history of the product, and how nuance is probably hampered by the management attitude towards money and the history of the code base.

    for someone who's been disabled (temporarily or permanently) speech recognition means the difference between making a living and being able to support oneself, a mortgage, family etc. and sitting around on your ass in section 8 housing on Social Security disability. Pain from RSI once made it extremely difficult to feed myself. When you've experienced that level of pain, disability and the associated despair, you get the attitude that anything that gives a disabled person independence and an ability to make a living should be encouraged with all possible resources.

    Listening to someone dictating using speech recognition will drive you mad. You would have the same problem with a blind person listening to text-to-speech. But that's not the fault of speech recognition or text-to-speech. That's the fault of management not providing the disabled person with an acoustically isolated environment (i.e. reasonable accommodat.

    Desktop speech recognition is a monopoly because it's extremely expensive and difficult to develop speech recognition and there is not a large market. the market consists of lawyers, doctors, and the disabled. There is not enough money to support two companies (or more) to develop desktop speech recognition applications.

    NaturallySpeaking is very buggy. There are bugs that cause people problems that were first seen in NaturallySpeaking 5. These are not hidden or hard-to-find bugs. They don't affect nuances ability to sell NaturallySpeaking. There's no reason for them to fix them except for the fact that they interfere with the use of many programs by the disabled. If you are just doing dictation into Microsoft Word or DragonPad, you'll never notice. If you try to dictate into Thunderbird, Firefox, Open office,... you're screwed. For example, I cannot dictate directly into Firefox for this comment, I need to use a workaround for dictation and then paste the result into the text box. The reason why this problem exists is because nuance management has the reputation of not making any change or feature unless you can make a business case and show them they will get revenue from that change. This is not such a bad model because it can keep nuance profitable and product available to people who truly need it (i.e. the disabled). The downside is that it doesn't leave room for changes necessary for the disabled.

    I've heard from people working inside dragon that part of the problem also is the code base. It was written by a bunch of Ph.D.'s who are really really good at speech recognition but are not so good at writing code. Also in the last few years, there has the huge turnover and people working on the code as NaturallySpeaking was sold first to L&H and then to nuance. That kind of change alone will wreak havoc on the code base as knowledge is lost and never really acquired by the new people. by the way, I have talked with some people from nuance, and they are basically good people. They understand the needs of the handicapped but they are constrained in what they can do for us because of budget and resources.

    When people talk about alternatives with open source speech recognition, only a tab would think they would work for the disabled. Their recognition speed is significantly slower, vocabulary size is smaller, and they are really more projects to keep grad students busy than be anything useful in the real world.

    The last problem with speech recognition sits in your lap if you are a manager of a software product or a developer. As far as I can tell, the number of applications that are speech recognition friendly is vanishingly small. It seems to me that software developers go out of their way to make software handicap hostile. It starts with the multiplatform GUI toolkits that do not
    • by gstoddart (321705)

      Listening to someone dictating using speech recognition will drive you mad. You would have the same problem with a blind person listening to text-to-speech. But that's not the fault of speech recognition or text-to-speech. That's the fault of management not providing the disabled person with an acoustically isolated environment (i.e. reasonable accommodat.

      I've only ever known a single blind programmer.

      He had a text-to-speech program running. He had the damned thing set to be so fast (he had astounding hear

  • In my job, I teach clients how to use this software everyday.
    The 99% accuracy is after the initial training. Then comes the tutorials which further enhances recognition and use, which makes even more accurate. Dragon is invaluable for those would cannot use a computer any other way.
    Accuracy does increase with time and use.
  • I do know that David Pogue uses DNS for all of his writing (NYTimes, books, etc.). He writes about Dragon often, and how he previously used to carry a Windows laptop essentially just for writing.

    Many Mac folks, myself included, have installed windows via Fusion or Parallels so that we can run DNS alongside OS X. I have got it working reasonably well, and have been doing all my writing and email via speech for about 6 months. There are still some frustrations, but in general it works great and I'm happy to h
  • Let's see,

    I've tried them all - and spent dozens of hours and hundreds of dollars for "custom dictionaries" - as a lawyer I dictate in a language that is almost, but not quite totally, unlike English.

    I've had excellent "luck" with iListen and almost bought their "last" upgrade a few weeks ago. I decided not to because I couldn't get the helpdesk to tell me if my custom dictionaries and profile would transfer to the latest release.

    I know why they didn't answer - and I'm not alone - the support mailing list

Real Programmers think better when playing Adventure or Rogue.

Working...