Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

The Coming Wave of Gadgets That Listen and Obey

Posted by Soulskill on Sun Jan 27, 2008 11:36 AM
from the tea-earl-grey-hot dept.
dgan brings us a NYTimes piece about the development of speech recognition for common gadgets. Companies such as Vlingo and Yap are marketing their software to cellular carriers to give consumers a hands-free option for tasks like finding directions and text messaging. Quoting: "Vlingo's service lets people talk naturally, rather than making them use a limited number of set phrases. Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs), for the location of a local bakery and for a Web search for a consumer product. It was all fast and efficient. Vlingo is designed to adapt to the voice of its primary user, but I was also able to use Mr. Grannan's phone to find an address. The Find application is in the beta test phase at AT&T and Sprint. Consumers who use certain cellphones from those companies can download the application from vlingo.com."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by moogied (1175879) on Sunday January 27 2008, @11:37AM (#22200576)
    5,000 years ago man relized it could not make women listen and obey. So he started a quest to make devices that could..

    Is it possible that all of mankinds dreams are coming true now?!

    • by peragrin (659227) on Sunday January 27 2008, @11:46AM (#22200634)
      nope. because we must select double delete them all.

      voice recognition is no where near reliable. I laugh at my brother as he tries to use voice dial on his cell phone, it takes two or three times to get it to work. I once sneezed and it dialed my father. a good throat clearing sounds like mother. I should try farting at it some time to see who that would Dial.

      Seriously try it sometime. delicately train the system for your voice, use it for a while, and then start throwing random noise at it. Or take a song which the music track is quiet enough to hear each word clearly and play that at the microphone. It should give you all the lyrics, yet they can't sort that out. The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.
      • by MonsterOfTheLake (880659) on Sunday January 27 2008, @11:56AM (#22200694) Homepage
        I should try farting at it some time to see who that would Dial.

        #265532 [bash.org]:
        (Sabdo) on one of those speech-to-text programs my friend ripped ass onto the mic.
        (Sabdo) and it typed out "France"
        (Sabdo) we were like, wtf?
      • by ScrewMaster (602015) on Sunday January 27 2008, @12:01PM (#22200736)
        The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.

        Voice recognition is incredibly useful in the right context. A friend of mine is an attorney who happens to be disabled. He makes great use of voice recognition on his computer, does most of his legal work with it. Is it "conversational"? No, but it serves his purposes perfectly.

        So you're right, speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level. Of course, by then you'll have to know Spanish or Mandarin to use one of them.
        • funny I thought I heard that same thing 10 years ago only it was with the GHZ barrier.

          processing speed has helped a lot and they are getting better but I think we need to be able to process more than one thing at a time first. parallel programming will help more than anything else.

        • speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level.

          I'll believe it when I see it. This is one of those areas where various folks have been promising "[five|ten] more years" since the late sixties. Trouble is, the only thing greater storage and processing capacity get you is bigger personalized dictionaries of memorized [words|phrases|phonemes]. You still have to invest time to train the system in recognizing your speech. The greater capacity/accuracy, the longer it takes to "fine tune" the dictionary. It just doesn't seem like simply a problem of lack of

          • I agree, that's why I said, "with improved algorithms". Ten years, fifty ... eventually it will get done unless we find a better way of communicating with computers. Direct neural interface, perhaps. Something like that would be indistinguishable from telepathy, and a darned sight more useful.
      • The recognition that you describe is poor because the speech recognizer is running on the phone in a tiny memory/cpu footprint.

        Most of the cell phone systems described in the article are likely uploading the audio to a server farm, running recognition there, and then sending back the response.

        • ...what?
          Please mr. guru, tell me how this happens exactly.
          • Re: (Score:3, Interesting)

            ...what?
            Please mr. guru, tell me how this happens exactly.

            I not saying it is done that way, but it would be very easy to do it that way. Mobile phones have all the kit which is needed to digitise speech, and to send that digitised speech over a GPRS connection to a web service that does speech-to-text and returns the text would be trivial. Doesn't need a guru.

    • That's all fine, but do we really another idiotic web 2.0 name for a startup? Vlingo?? REALLY!?!? Haven't we had enough of vongo, twitter, oyogi, flickr, xuqa, blinkx, sharkle, squidoo, zemq, diigo, frappr, joost, zingee, vyew, bebo?
  • by bennomatic (691188) on Sunday January 27 2008, @11:44AM (#22200614) Homepage
    "Open the pod bay doors, HAL."

    "I'm afraid I can't do that, Dave."

    • Re: (Score:2, Funny)

      by Anonymous Coward
    • by value_added (719364) on Sunday January 27 2008, @12:51PM (#22201014)
      "Open the pod bay doors, HAL."
      "I'm afraid I can't do that, Dave."


      My take on the matter is that the reason that's all you can think of is that everything else is inappropriate, inefficient or simply too goofy for consideration.

      Not to anthropomorphise electronic devices (I know, they don't like it when you do that), but I think they'd prefer to be treated anonymously and respond the most basic of instructions only. And we'd prefer they remain that way, except in very limited circumstances where the device is named Lenore.

      In the Star Trek movies you'll find something similar to the above, with an occasional "Tea, Early Gray, Hot" for good measure, but the rest of the time everyone is interacting with devices using ... wait for it ... keys and buttons. And this is into the technologically advanced future where most everything is a device, including crew members. Seeing Picard, for example, say "Computer, send a message to Data telling him to work on his joke-telling skills", or to use the article's example, [asking] his phone for a song by Mississippi John Hurt, would be seen by everyone as a ridiculous use of technology and dismissed as absurd.

      Voice recognition, in the abstract, is fascinating and no doubt fun, but I wouldn't want to live in a Tourettes-like world where everyone is shouting out instructions to unthinking devices, let alone work in a cubicle where the next guy's phone conversation are competing with the noise of his regular work.

      So past opening and closing doors, keyboards it is. Or for those unskilled in the expressive art of the command-line, a mouse or function buttons.
    • "Doolittle: Fine. Think about this then. How do you know you exist?
      Bomb #20: Well, of course I exist.
      Doolittle: But how do you know you exist?
      Bomb #20: It is intuitively obvious.
      Doolittle: Intuition is no proof. What concrete evidence do you have that you exist?
      Bomb #20: Hmmmm... well... I think, therefore I am.
      Doolittle: That's good. That's very good. But how do you know that anything else exists?
      Bomb #20: My sensory apparatus reveals it to me. This is fun."
      • I *loved* that movie. Thank you for reminding me of it. Jeeze... going way back in the memory banks. Dark Star? I was a sophomore in high school when I saw that (not first run) at the UC Theater in Berkeley...

  • by Knave75 (894961) on Sunday January 27 2008, @11:44AM (#22200616)
    User: Please connect me with Hugh Jass
    Gadget: Sorry, I could not find a Hugh Jass
    User: *snicker*
  • I can imagine the day we speak the name of some legislation in the phone and say "vote yes" or "vote no". The results show up on our congressman's web site and some other third party sites that archive. This way we take control of a few and transfer it to the less corruptible and wiser "many".
  • by debatem1 (1087307) on Sunday January 27 2008, @11:45AM (#22200624)
    I maintain great skepticism about speech recognition as an interface. It just isn't much faster than typing, even on a cell phone- and its not that it takes so much longer to get an ideal rendering, its that even a minor error in translation results in about five seconds of prompting followed by reentry. Until they can get that figured out, or get accuracy up to a point where someone unused to giving dictation can use it, its just not that great a technology.
    • by mdfst13 (664665) on Sunday January 27 2008, @12:16PM (#22200800)

      It just isn't much faster than typing
      Sure, but it's a lot safer to do while, say, driving down the road. The problem with screen output and typed input is that you have to use both eyes and hands to operate the device. By contrast, using speech input and output only requires voice and ears. Of course, there are some circumstances where the screen/type method is superior, e.g. sending emails from your blackberry during meetings. However, there are many cases where speech is superior, e.g. driving down the road (or even just walking). Viewing speech as a replacement for screen/type is over zealous. It's really more of an alternative.

      It would probably help if advocates of the technology understood this. It doesn't have to be all or nothing. Two alternative solutions can add up to a more powerful solution than either would be alone.
      • Poeple definitely shouldn't be texting while they drive. People probably shouldn't be talking while they drive either.
        • Yes, in an ideal world, all drivers would devote 100% of their attention to driving safely and not distract themselves.

          Unfortunately, in practice, people are going to zone out, talk to their passengers, mess with their radio, etc. I'd much rather have them ask their car for a song or directions than have them look down to adjust the radio dials or check a map. That's what this technology is trying to address, and I would guess it will eventually make us safer, should they get it adopted and used in a wide
        • So, you advocate that there should be no passengers in motor vehicles? If only more people could understand that talking to someone next to you is just as bad (actually worse because you are naturally drawn to look at the person you are talking too) as talking to people a hundred miles away. At least there are two of us that don't see cell phones as evil magic delivered by the dark lord.
          • I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers. At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that. Also, an adult riding with you might notice things the driver misses. But talking can be a distraction, no matter who it is or where they are.
            • "I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers."

              The studies you saw were specifically designed to find that cell phones are dangerous.

              "At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that."

              Not only are many passengers not adults, you cannot just hang up on an adult passenger if you need to.

              "Also, an adult riding with
    • That's not really it. Right now you need to tell the computer exactly what to do when you're talking to it, so you need to say "move down five, move left three, press enter". This is done much faster with a keyboard, obviously. What we need is a way to micromanage computers less and have them do what we want, e.g. "find a restaurant in the area that serves seafood". Unfortunately, the fewer information you give, the more can go wrong, so I'm not sure that movie-like voice recognition will ever catch on...

      Ju
  • I'll only be interested in gadgets which obey only what I tell them to do.
  • Limited phrasebook (Score:4, Interesting)

    by name*censored* (884880) on Sunday January 27 2008, @11:49AM (#22200652)
    Limited phrasebook technology is a lot better than voice recognition technology in a lot of devices. Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him), there's very little point in giving the device the ability to understand possibly-misdirected phrases such as "Honey, have you seen the remote?". A good approach for this technology would be to limit it to understanding alternate ways of phrasing a particular command; "Device, Get Me A Beer"/"Device, Can I Have A Beer"/"I'm Really Thirsty". This way, we'd avoid misdirected speaking (the device thinking you're speaking to it instead of to another), and could also exploit the reduced set of understandable phrases to correct for people with colds/accents/quiet voices/etc, in much the same way as limited-phrasebook devices work (only with more flexibility).
    • Instead, we should invent plot-directed recognition technology. I mean, you never see the computer on Star Trek misinterpreting the zillions of conversations as being directed toward it. Why? Because it would bog down the plot, except for those rare occasions where it's funny.

      Same thing applies to the doors. The doors know exactly when someone is going to walk through them, because they are plot-directed. You can stand mere inches away from a door, facing it, but until the plot indicates that the time
    • by niceone (992278) * on Sunday January 27 2008, @12:37PM (#22200922) Journal
      "Honey, have you seen the remote?"

      Phone: Yeah, sure, it's cute enough, but I think I can do better.
    • Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him)

      The hardware problem isn't as big as the software one. Sure Steve Jobs' iPod can't do his taxes with stock firmware, however with a different OS I am sure that it could be done. It used to be that speech recognition would become a reality when your processor was fast enough, now we have quad-core CPUs running at 3 GHZ and it still hasn't been done reliably.
  • "Didn't I ask to to start the Roomba, Dear" (click) "Do it yourself, Roomba my ass...."
  • Seems to be a "layer mismatch". Analogy is the OSI model.

    I'll stick to using voice for "higher layer" communication with actual intelligences like humans and other animals. For "lower layer" comms you don't use your voice.

    If you ride a horse while you do talk to the horse sometimes, the talking is for the "higher layer", you use reins and body for "lower layer".

    The last I checked all these gadgets and devices are pretty stupid, definitely no real AI. So it'll be more gimmicky than actually useful.

    For such t
  • Obedient, huh? Get a job and bring home some cash!
  • I have put some thought into this problem via a hobby of robotics, and consequently have read quite a few papers etc.

    The trouble with this can be summed up like this: Would you typically go through your day with a 6 year old, giving the 6yr old instructions on who to dial, what emails to send etc.?

    No? Then you can forget the voice recognition stuff. Voice recognition substitutes What? for the typical 6yr old's Why?

    There are a lot of people who have VR dialing on their phone now. Do you ever see anyone using
  • (as a world traveller) would be a mobile phone that can pair up with 2 bluetooth headsets, and translate between different languages coming into each. That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of. The age-old incentive for development is there, so surely something like this has to appear.
    • That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of

      I think I saw a documentary about a prototype for this... it translated anything you said to helpful phrases such as "Free mustache rides" and "Suck it, bitch, suck it dry".
  • There, I said it. Voice-recognition shit (most especially attempts at "natural language" parsing) never, ever, ever works right for me -- or anyone that I know or discuss it with. It never works right. On phone networks we all just wind up frustrated, wasting time, swearing obscenities into the phone until it finally turns us over to a live human operator, in a much-worse mood.

    It sucks and I hate it and it's bullshit and the charlatans selling this shit should be shot in the kneecaps. You're *garbage*.
  • I can see it from here. You will have both a car and a cell phone that are voice-activated. What could possibly go wrong, right? Best case scenario: As you try to send a text message over the phone while driving around, the car will be like, "You talkin' to me? You talkin' to me?" "No, damn it, I'm writing an e-mail!" The phone: "Sorry, I thought you were talking to the car, there. Would you mind repeating that?" You: "Ah, never mind."
  • "Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs)"

    I am not impressed. I will bet you a nickel that he tried that out prior to the demonstration, and made sure there was nothing similar that might come up by accident. I would be impressed if he had given the mike to reporter Michael Fitzgerald and Fitzgerald had tried it.

    At trade shows, I used to watch all sorts of demonstrations of
  • I am looking forward to the day when I can get a cognative reply from a GPS navigation device when I shout at it "WHERE THE FUCK AM I?"
    • I'm sorry Dave, but you have got us lost...I can not allow you to drive anymore.

      But yes, that WOULD be a useful thing.
    • Re: (Score:3, Interesting)

      Everything. I personally don't give a rat's ass about cell phones - it's not really a big deal or very innovative until you just have a communicator built in. Everything else though, from doors, lights, running tasks on a computer, etc. is what's really cool. Little inane things that just piss you off in life - like having to get up from the bed with the girl/boy on it to turn the light off, or setting a TV up for a movie, or having the computer do everything you want. I'd much rather say "wait until th
      • Forget voice recognition/synthesis and all that crude claptrap ... I want a brain implant capable of accessing symbolic thought patterns directly. Just think about something and the machine will figure out what it is that you want to know, and feed the information back into your head as if you'd just remembered it naturally. You wouldn't even have to know the difference between a "real" recollection and one that was put there on-the-fly. You would just know stuff. Need to perform some integral calculus? No
    • by lgw (121541) on Sunday January 27 2008, @12:00PM (#22200728) Journal
      I can't get over this "hands free text messaging" option! What engineer had the insight "we need to give customers a way to communicate over the phone just by talking"? It's a strange world.
      • One of the features of my new phone is "Voice SMS."

        Think about that for a moment. It's like a text message, but it's voice. On a phone.

        According to Sprint [sprintpcs.com], the reason this is better than a normal voice mail message is that you're guaranteed to leave a message and not actually reach the person you're calling (which comes up how often?) and that the text message UI is easier to deal with than the voice mail system. (Then why not offer a voice mail UI?)

        And, of course, it wastes both a text message and d

    • Re: (Score:2, Interesting)

      There is no reason you couldn't set your car's speed with your cell phone using Blue Tooth. Just say 80 MPH please. Or reduce rapid, no break lights, 60. Or speak "reduce 3 spot 60 BL not." That means reduce speed to 60 in 3 seconds no break lights. Over the course of 3 microseconds the car determines based on recent stored values if there is another vehicle approaching from behind and how close speed and if a collision would result from your command. If not it executes the command. Everyone could be twea