Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Input Devices Privacy Technology

The Challenges and Threats of Automated Lip Reading 120

An anonymous reader writes: Speech recognition has gotten pretty good over the past several years. it's reliable enough to be ubiquitous in our mobile devices. But now we have an interesting, related dilemma: should we develop algorithms that can lip read? It's a more challenging problem, to be sure. Sounds can be translated directly into words, but deriving meaning out of the movement of a person's face is much more complex. "During speech, the mouth forms between 10 and 14 different shapes, known as visemes. By contrast, speech contains around 50 individual sounds known as phonemes. So a single viseme can represent several different phonemes. And therein lies the problem. A sequence of visemes cannot usually be associated with a unique word or sequence of words. Instead, a sequence of visemes can have several different solutions." Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
This discussion has been archived. No new comments can be posted.

The Challenges and Threats of Automated Lip Reading

Comments Filter:
  • HAL 9000 (Score:5, Funny)

    by tchuladdiass ( 174342 ) on Saturday September 13, 2014 @10:50AM (#47897095) Homepage

    Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.

    • Re: (Score:2, Funny)

      by sconeu ( 64226 )

      To everyone else: If something along these lines was NOT your first thought, please turn in your geek card.

      • by flyneye ( 84093 )

        Sorry, I was still stuck on the claims of reliability in the first line of the article. Now my trousers are damp and I must change them.

      • My first thought was "why the hell would I want a machine to lip read?" since lip reading is basically a crutch for humans' inability to hear sufficiently well to extract someone's voice from the surrounding environment.

        We already have laser microphones, which can detect sound vibrations at a distance, and we have sophisticated sound processing methods to extract weak signals from noise, etc. We don't need lip reading, other than maybe as a fun science project for graduates.

        • By analyzing the light in the background of a video you can see what is reflected there (the people behind the camera). If someone in the background of a terrorist vid is talking about their next terrorist strike- I'd want to know what he was saying. It's a pre-recorded vid, you can't set up surveillance gear and the vid isn't good enough to show the sound vibrations.
          • By analyzing the light in the background of a video you can see what is reflected there (the people behind the camera). If someone in the background of a terrorist vid is talking about their next terrorist strike- I'd want to know what he was saying.

            That's ridiculous. If you can lip read the reflection in a terrorist vid, then you can see the person's face, and you don't need to know what he's talking about, you can arrest him for being an accessory. If you can't see the person's face, try using Photos

        • Why not? Apart from the idea that lip reading may complement speech recognition and make it more reliable. Also it may be more useful in a loud environment, which is frequently the case when machines are around, btw. Or in cases where speaking up loud to a computer is not appreciated, such as in office environments. And if all of this would not be enough, note the title of this website: news for nerds. You want a machine to lipread because it CAN (maybe).
        • In addition to the other suggestions made here, one use case for machine lip reading is tracking multiple simultaneous conversations in a crowd. You could theoretically have searchable index of anything anyone said in view of a particular camera (whereas once more than 2-3 people are talking at once, it becomes almost impossible to separate out their individual speech.)
    • by Tablizer ( 95088 )

      Why didn't Slashdot editors use the HAL eye icon (eyecon?) instead of the lock? I'm disappointed and will increase my trolling 35% in protest.

  • by account_deleted ( 4530225 ) on Saturday September 13, 2014 @10:50AM (#47897097)
    Comment removed based on user account deletion
  • NSA probably already has this technology
  • Jesus H Christ! (Score:5, Insightful)

    by mark_reh ( 2015546 ) on Saturday September 13, 2014 @10:52AM (#47897111) Journal

    We're all going to have to start wearing Burkas if we want any privacy at all.

  • Too bad (Score:5, Insightful)

    by ArcadeMan ( 2766669 ) on Saturday September 13, 2014 @10:53AM (#47897115)

    Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist.

    Too bad it never stopped anyone before.

    • In the end, I suspect we'll decide that the advantages outweigh the disadvantages, and pass laws to protect people from the disadvantages. I'm not saying this will be ideal, but it will be the best we can do.

      We have faced, or are facing the same issue with other technologies such as face recognition, profiling, genome sequencing, etc.

    • If lip reading software reaches the courts, suddenly all video recording becomes wiretapping. The courts might resolve that by allowing audio recording wherever they allow video recording. Or by forbidding video recording wherever they forbid audio recording. Or maybe they will finally do something about that ancient "wiretapping" deal they've been twisting into the modern world.

    • It's a load of garbage anyway. There's nothing this technology does to invade privacy that we can't already do.

      You're in the open, then use a parabolic mic to pick up the conversation you're clearly already taping.
      You're behind some glass, then use a laser microphone to pickup the conversation which while it sounds James Bondish, actually already exists.

      As a society we're already too little too late on the privacy side.

  • How Naive (Score:5, Insightful)

    by Tanuki64 ( 989726 ) on Saturday September 13, 2014 @10:54AM (#47897125)

    Beyond the computational aspect, we also need to decide, as a society,

  • by Anonymous Coward

    Turning the question around, why should it NOT exist or be looked into? At the very least it's an academic curiosity. If privacy is a concern, there's a very easy way to break the algorithm - talk whilst covering your mouth, which people have been doing whilst whispering to others for a long time. Ventriloquists would probably defeat it easily as well.

    Capture: Lunatic

    • Comment removed (Score:5, Insightful)

      by account_deleted ( 4530225 ) on Saturday September 13, 2014 @11:01AM (#47897151)
      Comment removed based on user account deletion
      • Re: (Score:1, Flamebait)

        by ch-chuck ( 9622 )

        we are morally obligated to develop this technology before the bad guys get it and use it against us.

      • by Jeremiah Cornelius ( 137 ) on Saturday September 13, 2014 @11:22AM (#47897287) Homepage Journal

        Governments and corporations are fictional persons. They have no "moral consciousness" of any kind, outside of rhetorical and ideological fantasy.

        So, this will not be a question of moral or immoral use. It will be amoral, in the hands of those who have advanced themselves through manipulation of the aforementioned ideological rhetoric.

        You continue to believe that there is hope for this modern, post-industrial society. But there is none. We as people have increased the sophistication of our tools and our reach - just as relentlessly as we have avoided the refinement of our own beings.

        In the end you don't get Star Trek. You don't even get Starship Troopers. You get Scanner, Darkly And hope there is Valis.

      • by pz ( 113803 )

        related dilemma: should we develop algorithms that can lip read? Of course we should, we should develop any tech. The real question is, will it be used for moral or immoral purposes?

        Certain technology can be declared illegal. Like guns in certain countries. Radar detectors in some US states. Blue lights on non-police cars in most US states. Mechanisms for counterfeiting printed money. Cloning of human embryos. Et cetera. It's perfectly plausible for a society to declare some particular technology illegal.

        Heck, even certain knowledge is illegal for the general public to own, let alone internalize, like plans to make nuclear bombs.

        • Heck, even certain knowledge is illegal for the general public to own, let alone internalize, like plans to make nuclear bombs.

          Designs for nuclear weapons are not too hard to find online. The hard part (thank God) is obtaining the materials to make one, such as enriched uranium, plutonium, deuterium and tritium.

          That said, I agree it would be illegal for a member of the general public to possess classified documents of any kind, without authorization.

      • by hodet ( 620484 )

        Think of the advantages for the deaf and hard of hearing (combined with a HUD). That alone tells me we should develop it. NSA are gonna NSA. Terrorists are going to terrorize. This type of technology has the potential to change countless lives, and for that reason alone we should.

    • by nbauman ( 624611 )

      Grow a big moustache.

  • Pfft (Score:4, Insightful)

    by msobkow ( 48369 ) on Saturday September 13, 2014 @11:09AM (#47897189) Homepage Journal

    Like moral issues have ever stopped anyone. :(

  • The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments. It might improve the accuracy of current speech recognition which is, too be honest, still sub-standard.

    Speech recognition as is now is way too limited. Sure, Siri and the likes may work. And some computerized phone systems use it to nag us instead of using reliable button clicking. But it is still far from transcribing an accurate memo. Let alone automated subtitling or other fancy applications.

    So

    • Re:Combined (Score:4, Insightful)

      by Animats ( 122034 ) on Saturday September 13, 2014 @02:12PM (#47897955) Homepage

      The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments.

      Right. Especially since, when you're looking at your smartphone, it's looking back at you.

      This would be valuable for vehicle driver speech input, which has to reject a lot of noise.

    • The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments.

      Obvious, indeed. There's already a textbook [sciencedirect.com] for the subject, Multimodal Signal Processing [elsevier.com]...available for free online, no less.

      This is exactly the sort of system you'd want on a flight deck, to supplement the accuracy of speech-recognition in the presence of noise, especially intermittent noise such as turbulence. It can also help with speaker identification.

      As for the hopelessly naive idea that "society" should be able to choose whether this sort of thing should exist...the textbook came out in 2009.

  • It will happen, it's just a matter of getting the tech correct.

  • by Aldenissin ( 976329 ) on Saturday September 13, 2014 @11:17AM (#47897253)

    Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.

    How much do they extend beyond that of so called "simple" voice recognition? I suppose one could rarely listen in when they couldn't have with current amplifying audio equipment. As a society, we've already decided that it should exist: "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness."

    Can this be used as a weapon? Yes, so can a hammer. Ban hitting people with hammers, not the hammer.

    • by AmiMoJo ( 196126 ) *

      The problem in the United States is that corporations are legally people. The EU will clamp down on this hard, not allowing corporations to monitor any conversation in range for advertising purposes. Individuals will benefit (I'd love to be able to whisper silently to my phone instead of having to say "OK Google" out loud) but business use will be heavily regulated. New rules already allow for fines of up to 50%of global revenue for privacy violations.

      In the US it will be a conditional issue and corporate l

      • In the US it will be a conditional issue and corporate lawyers/lobbyists will win. People won't speak in public for fear for the adverts they might trigger.

        I doubt they will "win" like you suppose, They are to smart for that. Perhaps they should, and people may start to push back...

  • I seem to recall that this was done previously but the conditions had to be good (e.g. sitting facing the camera with good lighting.)

  • by pubwvj ( 1045960 ) on Saturday September 13, 2014 @11:27AM (#47897311)

    Lip reading is a lot easier than the original poster thinks. There is a lot more data available, especially within context.

    • Try it from across the room and you don't know what the conversation is about. Do it at a bar looking for people using pick up lines and you'll get false positives. As for context, try to figure out how to inject that into the reading algorithms.
  • It's certainly a worthy area of computational linguistic research. But the reason for that is that it's a very hard problem. Automated language processing, with very smart people and very motivated spy agencies working very hard at it, has taken 60 years to get to a point not quite at the level of high school language speakers.

    The privacy concerns are irrelevant. The deaf will demand this, and as long as there are weak-willed politicians and judges more interested in making political statements than disp

  • We are the same species that invented the atomic bomb. If we can think of a technology, someone is already probably working on it.
  • Or perhaps one of the others - the CIA would no doubt appreciate it.
  • You can bet your $THINGOFVALUE here that the CIA and similar organizations are already researching this if they don't have it already.

    Like handwriting recognition this will be full of examples of "bad output" in the early days and there will always be cases where lack of context and/or deliberate obfuscation by the speaker makes this unreliable.

    Let's just assume that this will be as reliable 5 or 10 years from now as automated face recognition is today and within 20 years both will be very reliable. What d

  • 'D' and 'T', 'G' and 'K', and even 'P' and 'B' are frequently all but impossible to discern by lip-reading alone, and can only ever really be discerned when one of the alternatives simply does not make any sense. But this is not always the case.
  • Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist.

    Sorry to break it to you, but society not only doesn't "need" to make this decision, it has no right to make this decision. You don't get to decide what other people invent, and for the most part not even what it is used for.

  • Old George Carlin joke:
    Here’s a good example of practical humor, but you have to be in the right place. When a local television reporter is doing one of those on-the-street reports at the scene of a news story, usually you’ll see some onlookers in the background of the shot, waving and trying to be seen on television. Go over and stand with them but don’t wave. Just stand perfectly still and, without attracting attention, move your lips, forming the words, “I hope all you stupid fu
  • I can see how this would be great for deaf people, using something like google glasses to get subtitles of convo's around them. How about making something for people who can't speak, but can form the words with their mouth, Might need something like a mic but with video/lasers for reading the facial movements, that outputs it to a speaker.

    Sure it will get used for bad, but that is going to happen regardless anyways. So how about we do some good with it and help out the disabled people with some nice t

  • Could augment by adding other sensors such as microwave, laser or terahertz imaging, to detect signals being generated by tongue and vocal cords, or even to directly image the organs themselves.
    Also it seems possible that since tge whole head vibrates, reflections or motions of eye, nose lips and forehead might provide vibratory cues.

  • If recent history teaches anything about technology, it's that if something is technically possible - and it seems highly improbable that automated lip-reading isn't - someone WILL do it. Further that, if it's not actually illegal to do so, someone will make it commercially available in the civil domain. And that if it's made illegal in the civil domain, that's very unlikely to stop the security community, in all its sundry forms, from weaponising it (sorry, my Orwellian paranoia is on clearly overdrive; th

If you aren't rich you should always look useful. -- Louis-Ferdinand Celine

Working...