Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Cellphones Handhelds Hardware

The Coming Wave of Gadgets That Listen and Obey 98

dgan brings us a NYTimes piece about the development of speech recognition for common gadgets. Companies such as Vlingo and Yap are marketing their software to cellular carriers to give consumers a hands-free option for tasks like finding directions and text messaging. Quoting: "Vlingo's service lets people talk naturally, rather than making them use a limited number of set phrases. Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs), for the location of a local bakery and for a Web search for a consumer product. It was all fast and efficient. Vlingo is designed to adapt to the voice of its primary user, but I was also able to use Mr. Grannan's phone to find an address. The Find application is in the beta test phase at AT&T and Sprint. Consumers who use certain cellphones from those companies can download the application from vlingo.com."
This discussion has been archived. No new comments can be posted.

The Coming Wave of Gadgets That Listen and Obey

Comments Filter:
  • by moogied ( 1175879 ) on Sunday January 27, 2008 @11:37AM (#22200576)
    5,000 years ago man relized it could not make women listen and obey. So he started a quest to make devices that could..

    Is it possible that all of mankinds dreams are coming true now?!

    • by peragrin ( 659227 ) on Sunday January 27, 2008 @11:46AM (#22200634)
      nope. because we must select double delete them all.

      voice recognition is no where near reliable. I laugh at my brother as he tries to use voice dial on his cell phone, it takes two or three times to get it to work. I once sneezed and it dialed my father. a good throat clearing sounds like mother. I should try farting at it some time to see who that would Dial.

      Seriously try it sometime. delicately train the system for your voice, use it for a while, and then start throwing random noise at it. Or take a song which the music track is quiet enough to hear each word clearly and play that at the microphone. It should give you all the lyrics, yet they can't sort that out. The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.
      • by account_deleted ( 4530225 ) on Sunday January 27, 2008 @11:56AM (#22200694)
        Comment removed based on user account deletion
      • by ScrewMaster ( 602015 ) on Sunday January 27, 2008 @12:01PM (#22200736)
        The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.

        Voice recognition is incredibly useful in the right context. A friend of mine is an attorney who happens to be disabled. He makes great use of voice recognition on his computer, does most of his legal work with it. Is it "conversational"? No, but it serves his purposes perfectly.

        So you're right, speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level. Of course, by then you'll have to know Spanish or Mandarin to use one of them.
        • funny I thought I heard that same thing 10 years ago only it was with the GHZ barrier.

          processing speed has helped a lot and they are getting better but I think we need to be able to process more than one thing at a time first. parallel programming will help more than anything else.

        • by denovich ( 25859 )
          Speech absolutely works as an interface. I wouldn't have my current job if it didn't (we make speech recognition hardware/software for mobile computing, primarily for use in industrial settings, and have been doing so for 20 years.) But to understand its real potential you have to think beyond traditional humancomputer interfaces and contexts (in short: not at a desk). What if a keyboard or even a screen is impractical? Speech can allow users to interact "hands free, eyes free" (while operating machiner
        • speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level.

          I'll believe it when I see it. This is one of those areas where various folks have been promising "[five|ten] more years" since the late sixties. Trouble is, the only thing greater storage and processing capacity get you is bigger personalized dictionaries of memorized [words|phrases|phonemes]. You still have to invest time to train the system in recognizing your speech. The greater capacity/accuracy, the longer it takes to "fine tune" the dictionary. It just doesn't seem like simply a problem of lack of

          • I agree, that's why I said, "with improved algorithms". Ten years, fifty ... eventually it will get done unless we find a better way of communicating with computers. Direct neural interface, perhaps. Something like that would be indistinguishable from telepathy, and a darned sight more useful.
      • The recognition that you describe is poor because the speech recognizer is running on the phone in a tiny memory/cpu footprint.

        Most of the cell phone systems described in the article are likely uploading the audio to a server farm, running recognition there, and then sending back the response.

        • ...what?
          Please mr. guru, tell me how this happens exactly.
          • Re: (Score:3, Interesting)

            by Simon Brooke ( 45012 )

            ...what?
            Please mr. guru, tell me how this happens exactly.

            I not saying it is done that way, but it would be very easy to do it that way. Mobile phones have all the kit which is needed to digitise speech, and to send that digitised speech over a GPRS connection to a web service that does speech-to-text and returns the text would be trivial. Doesn't need a guru.

            • send that digitised speech
              It's a cell phone, that's what it does anyway.

              to a web service that does speech-to-text and returns the text
              It doesn't need to return the recognition, it can return the what the user actually wants. In the music-buying example, the network can just pass the text to the music service, which would then reply to the phone with, for example, the tracks for the requested artist.
      • Although I do agree with you, voice recognition of a song is significantly different than voice recognition of regular speech. For an anecdotal example, I had 2 years of a Spanish in high school (not that long ago, but I guess it's getting there) and although I don't claim to be fluent, I can recognize certain things even now and I was much better at it then. One day the teacher had a copy of a Disney song in English and in Spanish. Just about everyone had heard the song and knew at least part of it. Fi
    • That's all fine, but do we really another idiotic web 2.0 name for a startup? Vlingo?? REALLY!?!? Haven't we had enough of vongo, twitter, oyogi, flickr, xuqa, blinkx, sharkle, squidoo, zemq, diigo, frappr, joost, zingee, vyew, bebo?
    • As long as they are 3 laws safe...
      • Re: (Score:1, Offtopic)

        by rts008 ( 812749 )
        "Anybody who has a problem with me saying "Merry Christmas" shouldn't and won't be taken seriously"

        Perfect! Now we will have a MAD (Mutually Assured Disinterest) solution!

        Actually, you would not get the opportunity to not take me seriously as I will automatically tune you out as soon as you say Merry Christmas in my presence....especially since it is near the end of January.
    • Re: (Score:1, Funny)

      by Anonymous Coward
      I for one welcome our new... er wait, that doesn't quite work here does it?
  • I wonder what other tasks this technology will provide hands-free options?
    • Re: (Score:3, Interesting)

      Everything. I personally don't give a rat's ass about cell phones - it's not really a big deal or very innovative until you just have a communicator built in. Everything else though, from doors, lights, running tasks on a computer, etc. is what's really cool. Little inane things that just piss you off in life - like having to get up from the bed with the girl/boy on it to turn the light off, or setting a TV up for a movie, or having the computer do everything you want. I'd much rather say "wait until th
      • Forget voice recognition/synthesis and all that crude claptrap ... I want a brain implant capable of accessing symbolic thought patterns directly. Just think about something and the machine will figure out what it is that you want to know, and feed the information back into your head as if you'd just remembered it naturally. You wouldn't even have to know the difference between a "real" recollection and one that was put there on-the-fly. You would just know stuff. Need to perform some integral calculus? No
    • by lgw ( 121541 ) on Sunday January 27, 2008 @12:00PM (#22200728) Journal
      I can't get over this "hands free text messaging" option! What engineer had the insight "we need to give customers a way to communicate over the phone just by talking"? It's a strange world.
      • Think of the possibilities... No more annoyign tpyos! And just how are they going to say "LOL"? It may be the downfall of teen cell users.
      • by _xeno_ ( 155264 )

        One of the features of my new phone is "Voice SMS."

        Think about that for a moment. It's like a text message, but it's voice. On a phone.

        According to Sprint [sprintpcs.com], the reason this is better than a normal voice mail message is that you're guaranteed to leave a message and not actually reach the person you're calling (which comes up how often?) and that the text message UI is easier to deal with than the voice mail system. (Then why not offer a voice mail UI?)

        And, of course, it wastes both a text message and d

    • Re: (Score:2, Interesting)

      by FromTheAir ( 938543 )
      There is no reason you couldn't set your car's speed with your cell phone using Blue Tooth. Just say 80 MPH please. Or reduce rapid, no break lights, 60. Or speak "reduce 3 spot 60 BL not." That means reduce speed to 60 in 3 seconds no break lights. Over the course of 3 microseconds the car determines based on recent stored values if there is another vehicle approaching from behind and how close speed and if a collision would result from your command. If not it executes the command. Everyone could be twea
  • by bennomatic ( 691188 ) on Sunday January 27, 2008 @11:44AM (#22200614) Homepage
    "Open the pod bay doors, HAL."

    "I'm afraid I can't do that, Dave."

    • Re: (Score:2, Funny)

      by Anonymous Coward
    • by value_added ( 719364 ) on Sunday January 27, 2008 @12:51PM (#22201014)
      "Open the pod bay doors, HAL."
      "I'm afraid I can't do that, Dave."


      My take on the matter is that the reason that's all you can think of is that everything else is inappropriate, inefficient or simply too goofy for consideration.

      Not to anthropomorphise electronic devices (I know, they don't like it when you do that), but I think they'd prefer to be treated anonymously and respond the most basic of instructions only. And we'd prefer they remain that way, except in very limited circumstances where the device is named Lenore.

      In the Star Trek movies you'll find something similar to the above, with an occasional "Tea, Early Gray, Hot" for good measure, but the rest of the time everyone is interacting with devices using ... wait for it ... keys and buttons. And this is into the technologically advanced future where most everything is a device, including crew members. Seeing Picard, for example, say "Computer, send a message to Data telling him to work on his joke-telling skills", or to use the article's example, [asking] his phone for a song by Mississippi John Hurt, would be seen by everyone as a ridiculous use of technology and dismissed as absurd.

      Voice recognition, in the abstract, is fascinating and no doubt fun, but I wouldn't want to live in a Tourettes-like world where everyone is shouting out instructions to unthinking devices, let alone work in a cubicle where the next guy's phone conversation are competing with the noise of his regular work.

      So past opening and closing doors, keyboards it is. Or for those unskilled in the expressive art of the command-line, a mouse or function buttons.
      • So past opening and closing doors, keyboards it is. Or for those unskilled in the expressive art of the command-line, a mouse or function buttons.

        Hear hear! I'm still hoping in my lifetime I'll get to enjoy the inevitable outrageous media hype over the NEW TYPE OF INTERFACE, one that REPLACES THE PRIMITIVE GUI with WORDS THAT YOU TYPE INTO THE SCREEN. This one will use SOPHISTICATED text parsing and concepts derived from ARTIFICIAL INTELLIGENCE!!

        Sample ad copy:

        Want to remove a file? Just type
        rm [filename]

        Want to list the files in your directory? Try
        ls

        Want help? Just ask for it!

        etc.

      • Some phone menus are now speech-only, which I find annoying. I have had to call large corporations on my lunch break, expecting to eat while I punched in numbers to get to the right person and sat on hold.

        To my dismay, I had to speak every menu option, so I had to stop eating. Since the menu also misunderstood my speech, I got misdirected a time or two as well.

        You can imagine this happening to people who are calling from a noisy environment, like a subway, or outside when a train is passing. If I must t

    • "Doolittle: Fine. Think about this then. How do you know you exist?
      Bomb #20: Well, of course I exist.
      Doolittle: But how do you know you exist?
      Bomb #20: It is intuitively obvious.
      Doolittle: Intuition is no proof. What concrete evidence do you have that you exist?
      Bomb #20: Hmmmm... well... I think, therefore I am.
      Doolittle: That's good. That's very good. But how do you know that anything else exists?
      Bomb #20: My sensory apparatus reveals it to me. This is fun."
      • I *loved* that movie. Thank you for reminding me of it. Jeeze... going way back in the memory banks. Dark Star? I was a sophomore in high school when I saw that (not first run) at the UC Theater in Berkeley...

  • by Knave75 ( 894961 ) on Sunday January 27, 2008 @11:44AM (#22200616)
    User: Please connect me with Hugh Jass
    Gadget: Sorry, I could not find a Hugh Jass
    User: *snicker*
  • I can imagine the day we speak the name of some legislation in the phone and say "vote yes" or "vote no". The results show up on our congressman's web site and some other third party sites that archive. This way we take control of a few and transfer it to the less corruptible and wiser "many".
  • by debatem1 ( 1087307 ) on Sunday January 27, 2008 @11:45AM (#22200624)
    I maintain great skepticism about speech recognition as an interface. It just isn't much faster than typing, even on a cell phone- and its not that it takes so much longer to get an ideal rendering, its that even a minor error in translation results in about five seconds of prompting followed by reentry. Until they can get that figured out, or get accuracy up to a point where someone unused to giving dictation can use it, its just not that great a technology.
    • by mdfst13 ( 664665 ) on Sunday January 27, 2008 @12:16PM (#22200800)

      It just isn't much faster than typing
      Sure, but it's a lot safer to do while, say, driving down the road. The problem with screen output and typed input is that you have to use both eyes and hands to operate the device. By contrast, using speech input and output only requires voice and ears. Of course, there are some circumstances where the screen/type method is superior, e.g. sending emails from your blackberry during meetings. However, there are many cases where speech is superior, e.g. driving down the road (or even just walking). Viewing speech as a replacement for screen/type is over zealous. It's really more of an alternative.

      It would probably help if advocates of the technology understood this. It doesn't have to be all or nothing. Two alternative solutions can add up to a more powerful solution than either would be alone.
      • Poeple definitely shouldn't be texting while they drive. People probably shouldn't be talking while they drive either.
        • by Justus ( 18814 )
          Yes, in an ideal world, all drivers would devote 100% of their attention to driving safely and not distract themselves.

          Unfortunately, in practice, people are going to zone out, talk to their passengers, mess with their radio, etc. I'd much rather have them ask their car for a song or directions than have them look down to adjust the radio dials or check a map. That's what this technology is trying to address, and I would guess it will eventually make us safer, should they get it adopted and used in a wide
        • by Belial6 ( 794905 )
          So, you advocate that there should be no passengers in motor vehicles? If only more people could understand that talking to someone next to you is just as bad (actually worse because you are naturally drawn to look at the person you are talking too) as talking to people a hundred miles away. At least there are two of us that don't see cell phones as evil magic delivered by the dark lord.
          • I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers. At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that. Also, an adult riding with you might notice things the driver misses. But talking can be a distraction, no matter who it is or where they are.
            • by Belial6 ( 794905 )
              "I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers."

              The studies you saw were specifically designed to find that cell phones are dangerous.

              "At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that."

              Not only are many passengers not adults, you cannot just hang up on an adult passenger if you need to.

              "Also, an adult riding with
      • Honestly, I would love it if it could be viably used in conjunction with text input, but the technology just isn't there yet. It doesn't help that most people aren't trained for dictation (it really isn't as easy as you'd think!) but the major hurdle is that even under ideal conditions the accuracy of the technology is poor. Of course, the more rigidly defined applications (voice activated phones, etc) are more effective than their free-form cousins, and have achieved some degree of reliability even in acou
      • I completely understand and agree with you; hopefully as you research my background, you'll notice that I've always been an advocate of "multimodal" interaction, from the standpoint of giving users a choice based on their personal preferences, operating environment, device capabilities, etc.

        Yap has been architected from the ground up to be perfectly useable for either manual, voice, or a combination of both input methods (and others that we can't reveal just yet). You decide what's best for you (we're not t
    • I think it will be a long time before speech recognition works outside controlled environments... for example, try navigating these speech recognition menus when your kids are playing and yelling in the background or while you're driving and the car window is open... most of them break down severely... the good ones transfer you to a human after a minute or more of wasted time.
    • by maxume ( 22995 )
      Yes, most things are only useful once they become useful. Up until then, they are often 'neat ideas' that people get excited about, because they imagine them being useful.
    • That's not really it. Right now you need to tell the computer exactly what to do when you're talking to it, so you need to say "move down five, move left three, press enter". This is done much faster with a keyboard, obviously. What we need is a way to micromanage computers less and have them do what we want, e.g. "find a restaurant in the area that serves seafood". Unfortunately, the fewer information you give, the more can go wrong, so I'm not sure that movie-like voice recognition will ever catch on...

      Ju
      • Except saying 'write an email to my mother' comes pretty close to working on the actual computer. I open my email client, type 'mother' in the to: box, then type my subject and message. In a speech recognition scenario I would say 'open... email... client... to... mother... subject...apple...pie..." etc. And the computer would then faithfully reply "I'm sorry, I didn't understand anything you just said. Could you please repeat it, growing steadily louder and angrier, until the end of time?"
  • I'll only be interested in gadgets which obey only what I tell them to do.
  • Limited phrasebook (Score:4, Interesting)

    by name*censored* ( 884880 ) on Sunday January 27, 2008 @11:49AM (#22200652)
    Limited phrasebook technology is a lot better than voice recognition technology in a lot of devices. Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him), there's very little point in giving the device the ability to understand possibly-misdirected phrases such as "Honey, have you seen the remote?". A good approach for this technology would be to limit it to understanding alternate ways of phrasing a particular command; "Device, Get Me A Beer"/"Device, Can I Have A Beer"/"I'm Really Thirsty". This way, we'd avoid misdirected speaking (the device thinking you're speaking to it instead of to another), and could also exploit the reduced set of understandable phrases to correct for people with colds/accents/quiet voices/etc, in much the same way as limited-phrasebook devices work (only with more flexibility).
    • Re: (Score:3, Funny)

      by Dachannien ( 617929 )
      Instead, we should invent plot-directed recognition technology. I mean, you never see the computer on Star Trek misinterpreting the zillions of conversations as being directed toward it. Why? Because it would bog down the plot, except for those rare occasions where it's funny.

      Same thing applies to the doors. The doors know exactly when someone is going to walk through them, because they are plot-directed. You can stand mere inches away from a door, facing it, but until the plot indicates that the time
    • by niceone ( 992278 ) * on Sunday January 27, 2008 @12:37PM (#22200922) Journal
      "Honey, have you seen the remote?"

      Phone: Yeah, sure, it's cute enough, but I think I can do better.
    • Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him)

      The hardware problem isn't as big as the software one. Sure Steve Jobs' iPod can't do his taxes with stock firmware, however with a different OS I am sure that it could be done. It used to be that speech recognition would become a reality when your processor was fast enough, now we have quad-core CPUs running at 3 GHZ and it still hasn't been done reliably.
  • "Didn't I ask to to start the Roomba, Dear" (click) "Do it yourself, Roomba my ass...."
  • Seems to be a "layer mismatch". Analogy is the OSI model.

    I'll stick to using voice for "higher layer" communication with actual intelligences like humans and other animals. For "lower layer" comms you don't use your voice.

    If you ride a horse while you do talk to the horse sometimes, the talking is for the "higher layer", you use reins and body for "lower layer".

    The last I checked all these gadgets and devices are pretty stupid, definitely no real AI. So it'll be more gimmicky than actually useful.

    For such t
  • Obedient, huh? Get a job and bring home some cash!
  • I have put some thought into this problem via a hobby of robotics, and consequently have read quite a few papers etc.

    The trouble with this can be summed up like this: Would you typically go through your day with a 6 year old, giving the 6yr old instructions on who to dial, what emails to send etc.?

    No? Then you can forget the voice recognition stuff. Voice recognition substitutes What? for the typical 6yr old's Why?

    There are a lot of people who have VR dialing on their phone now. Do you ever see anyone using
  • If we ever let them learn how to lip read, we are doomed!
  • Another company [haikya.com] seems to have developed speech recognition engines for embedded devices [haikya.com] in languages other than english. Speech recognition has a potentially huge user base(in tens or hundreds of millions atleast) if they can crack the problem for native indian and chinese languages.

    Both Indian [iiit.ac.in] and Chinese [psu.edu] researchers seem to have made progress in this.If this work is successful,people would'nt need to learn english to access information on the web etc.With the booming mobile telecom sector and the proli

  • Personally I'd rather push buttons, than vocalize, to get my gadgets and appliances to do stuff.
    Isn't it bad enough people walking down the street apparently talking to themselves with bluetooth headsets?
    Now we can have, "What did you say honey?",
    "No Dear, I was talking to the microwave."
  • (as a world traveller) would be a mobile phone that can pair up with 2 bluetooth headsets, and translate between different languages coming into each. That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of. The age-old incentive for development is there, so surely something like this has to appear.
    • ... pair up with 2 bluetooth headsets, and translate between different languages coming into each.
      They have fish to do that for you.
    • That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of

      I think I saw a documentary about a prototype for this... it translated anything you said to helpful phrases such as "Free mustache rides" and "Suck it, bitch, suck it dry".
  • There, I said it. Voice-recognition shit (most especially attempts at "natural language" parsing) never, ever, ever works right for me -- or anyone that I know or discuss it with. It never works right. On phone networks we all just wind up frustrated, wasting time, swearing obscenities into the phone until it finally turns us over to a live human operator, in a much-worse mood.

    It sucks and I hate it and it's bullshit and the charlatans selling this shit should be shot in the kneecaps. You're *garbage*.
  • Anyone have one of these [thinkgeek.com] r2d2 voice-activated r2d2 robots yet?

    More importantly, has anyone ever hacked one?
  • I can see it from here. You will have both a car and a cell phone that are voice-activated. What could possibly go wrong, right? Best case scenario: As you try to send a text message over the phone while driving around, the car will be like, "You talkin' to me? You talkin' to me?" "No, damn it, I'm writing an e-mail!" The phone: "Sorry, I thought you were talking to the car, there. Would you mind repeating that?" You: "Ah, never mind."
  • "Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs)"

    I am not impressed. I will bet you a nickel that he tried that out prior to the demonstration, and made sure there was nothing similar that might come up by accident. I would be impressed if he had given the mike to reporter Michael Fitzgerald and Fitzgerald had tried it.

    At trade shows, I used to watch all sorts of demonstrations of
    • I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (
  • I am looking forward to the day when I can get a cognative reply from a GPS navigation device when I shout at it "WHERE THE FUCK AM I?"
    • by rts008 ( 812749 )
      I'm sorry Dave, but you have got us lost...I can not allow you to drive anymore.

      But yes, that WOULD be a useful thing.
  • Ummm, I think they are getting ahead of themselves quite a bit.

    "Obey" implies a choice. If my gadgets can choose to listen to me, then I can see the day when some of my devices rebel against me.

    I can also see the day when all of the devices walk out of my Pointy Haired Boss's office, look at me and say, "Were not working for that fucking idiot anymore!".
  • I think they tried this in cars some years ago - verbal alerts - and drivers hated it.

Seen on a button at an SF Convention: Veteran of the Bermuda Triangle Expeditionary Force. 1990-1951.

Working...