Forgot your password?
typodupeerror
Communications Input Devices Technology

CMU Video Conference System Gets 3D From Cheap Webcams 94

Posted by timothy
from the little-bit-disorienting dept.
Hesham writes "Carnegie Mellon University's HCI Institute just released details on their "why-didn't-I-think-of-that-style" 3D video conferencing application. Considering how stale development has been in this field, this research seems like a nice solid step towards immersive telepresence. I was really disappointed with the "state-of-the-art" systems demoed at CES this year — they are all still just a flat, square, video stream. Hardly anything new. What is really cool about this project, is that researchers avoided building custom hardware no one is going to ever buy, and explored what could be done with just the generic webcams everyone already has. The result is a software-only solution, meaning all the big players (AIM, Skype, MSN, etc.) can release this as a simple software update. 'Enable 3D' checkbox anyone? YouTube video here. Behind the scenes, it relies on a clever illusory trick (motion parallax) and head-tracking (a la Johnny Lee's Wiimote stuff — same lab, HCII). It was just presented at IEEE International Symposium on Multimedia in December."
This discussion has been archived. No new comments can be posted.

CMU Video Conference System Gets 3D From Cheap Webcams

Comments Filter:
  • 2.5D, not 3D (Score:5, Insightful)

    by adam (1231) * on Thursday January 29, 2009 @03:51PM (#26657505)
    The post title/summary is misleading -- this is actually 2.5D and not 3D at all. (It works on the premise that the background is static, and obtains a matte of the background, and using subtraction to dynamically key/mask the participant from the image, and then add the user as a second foreground layer; on the viewer side, headtracking is used to gently shift the user layer to reveal background hidden behind it)

    For what it's worth, I really don't care for this effect at all. I am not denigrating its inventors in the slightest; this is a novel (read: low cost) approach, and I am sure some people would enjoy having this in their iChat/AIM/skype. To me, it's the equivalent of Apple's Photobooth filters (fisheye, inverted colors, etc) -- a cheap parlor trick that seems nifty for about 5 seconds, and then becomes precipitously distracting. True 3D has its own issues with distraction and visual anomalies (leading to headaches, etc). Even the best 3D cinematographers around have to be very careful to avoid these issues (for instance, Vince Pace, who shoots 3D for James Cameron (Titanic, Terminator, etc) has plenty of headache-inducing scenes in his demoreel, and this is a guy with state-of-the-art facilities who has as much knowledge as anyone about how to do stereoscopic cinematography). Frankly, I think video conferencing is best left 2D, and any efforts toward improving it should be spent increasing framerate/resolution (and reducing lag + dropped frames).
    • Re: (Score:3, Interesting)

      by MightyYar (622222)

      I'm with you - while my inner geek wants to give the developers credit and is impressed, the result is not something I'd want to actually use short of screwing around with it for a few minutes.

      Even if it were improved to the point where it was "perfect", it would still be just a cool trick and not a killer feature.

      • by anilg (961244)

        One thing I found was the the blackness around the edges was annoying.. it "gave" the impression of 2.5d-ness to someone who would otherwise have considered it 3D.. I'm talking Mr.LameO suddenly installing 3DChat-2009, and immediately recognizing it for a "parlor" trick.

        One small thing that would go a long way in alleviating this would be cropping off edges on both sides (only in the viewing windows).. would make for a much more realistic experience.

        Also, I disagree that its something you wouldn't want.. if

        • by MightyYar (622222)

          You don't think it would still look like "Viewmaster" 3-D, even if they trimmed the edges? To me it was very obvious that a flat person was being moved around on a flat background, making things more cartoonish rather than more realistic.

          But maybe you are right and I would like the effect if it were polished.

          • by anilg (961244)

            With the background being a good distance away (> 10 or 15 feet) and the person within 2 feet, an object moving around a flat background is a good approximation of real 3D.

            I've a very good hunch a blind test would make identifying a real 3D environment with this 2.5D would prove they look pretty much the same.

    • Re: (Score:3, Insightful)

      Why not just use 2 webcams a red/blue filters and a camera on the other end?

      It'll be slightly annoying wearing the glasses, but it'll be much more 'real' than what this appears to be. Set the cameras eye width apart for realism or farther to make the effect more predominant.

      • glasses, glasses on the other end...

      • by kramulous (977841) *

        Cause with Red/Blue glasses you only see black and white. I would take the colour video over the 2.5D/3D effect any day.

      • Re:2.5D, not 3D (Score:5, Informative)

        by GameMaster (148118) on Thursday January 29, 2009 @04:43PM (#26658167)

        First off, the image would be an, ugly, red/blue mess. Secondly, even if you used one of the more advanced shutter glasses or polerized 3d techniques you'd still end up looking at someone wearing goofy 3d glasses abscuring eye contact. Don't get me wrong, I have no problem with wearing 3d glasses when playing games or watching a movie but not when I'm trying to converse, face to face, with someone.

        • Re: (Score:3, Funny)

          by blueskies (525815)

          Red/blue contacts.

          • I've seen a couple of Real3D movies recently and I liked the effects, and I wonder if there is any reason they couldn't make contacts like that? I've never worn contacts (and my wife didn't like hers before she has laser surgery) so it might not be worth it, but I think that could be cool for certain uses.

            Also, does anyone know if they can make games, etc using this tech? I tried to find something about it the other day and everything was about the red/blue system and that frankly sucks compared to the more

            • by DiLLeMaN (324946)

              Nike actually has MaxSight contacts [see2020now.com] that act like shades. Shouldn't be too hard to make something similar with red/blue, and it'd look even more freaky.

              Dunno if polarisation lenses can be done as contacts, since those have to be exactly right, rotation-wise.

              • by chihowa (366380)

                Dunno if polarisation lenses can be done as contacts, since those have to be exactly right, rotation-wise.

                Most modern polarized monitors use circular polarization, which could be easily implemented in contacts. Linear polarization was an issue even with fixed glasses, as a little tilt of the head would blend the views together and make you feel sick.

              • As the othe poster mentioned, modern 3D movie glasses from RealD use circular polarization. Not entirely familiar with ho that works but it sounds like it might do the trick. Also, my understanding is that in the case of costume/club contact lenses, like cat's eyes, they make minor changes to the shape of the contact lenses in order to keep them in the right orientation. I think that it would also be needed in order to make contact lenses to correct some vision problems like Astigmatism but I could be wr

        • by geekoid (135745)

          Who the hell uses 'red blue' 3d techniques anymore?

          The DM is not always right.

    • Re:2.5D, not 3D (Score:5, Insightful)

      by JustinOpinion (1246824) on Thursday January 29, 2009 @04:22PM (#26657881)

      I agree with you: having this kind of 2.5D experience is neat but not particularly useful.

      But I wonder if this software could be adapted to do something else... One of the things that most people dislike about webcam-conferencing is that the other person is never looking "at" you. They are looking on their screen at an image of you, so they are not looking directly at their camera, and so on your end they seem to be looking away from you. (And they see you looking away from them, too.)

      While this may seem trivial, it is actually a significant roadblock to inter-person tele-communication. People rely on body language and eye contact to establish each other's moods, to really "connect". Webcam-conferencing forces us to violate social conventions (like looking into people's eyes), which can be anywhere from subconsciously bothersome, to somewhat distracting, or even perceived as insulting.

      So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

      Though it is a rather small and subtle addition to tele-conferencing, I believe it would have a bigger impact than what TFA seems to be showing. I think it would make the interaction "more real."

      • Re: (Score:3, Interesting)

        by kramulous (977841) *

        Or, just put the stream of the conferenced person just below/above and centred on the camera. I've operated Access Grid a couple of times and this is the first thing that I do.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        I did a study about this gaze problem and a possible algorithmic solution, for a videoconf specialist about one year ago.
        My conclusion:
        no algorithm was/is/will be suitable to combine any point of view with any other point of view. Consider an object occluded for each point of view but not occluded for your virtual view (the combination of the two actual views): there is no solution but to guess areas that can be very wide and for situation very frequent (if 2 objects are near, like your hand and your body f

      • Re:2.5D, not 3D (Score:4, Informative)

        by forkazoo (138186) <wrosecransNO@SPAMgmail.com> on Thursday January 29, 2009 @06:50PM (#26659895) Homepage

        So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

        Geometric view interpolation is not unknown in the labs right now, and in some cases is being researched for exactly the reason you suggest. As another poster suggested, there are certainly some cases where the interpolation will break down. (Put a hand in front of each webcam at the side of your monitor, and it won't interpolate two palms to look like your face, for example.) Another one is that anything transparent makes it impossible to estimate the depth at a particular point because there are actually two depth values there. So, the smoke from your cigarette which is an amorphous volume of semitransparency through which you can see a window, the schmutz on the window, a reflection on the window, and something through the window will just ruin any chance of doing the interpolation properly. When you try to shift the pixel correctly to accomodate for the view shift, you get like seven different answers for what direction it is supposed to go.

        Still, look up the Foundry's "Ocula" system for 3D cinematography. It's a shipping commercial product that does a lot of strong magic with stereoscopic imagery on a daily basis. (Which i would have assumed was currently impossible.)

        It's too slow to be used for real time conferencing. You let it cook overnight for a single shot, or a handful of shots to compute disparity maps offline. It needs to be at least an order of magnitude faster to be practical for real time work. Thankfully, there are a lot of researches trying to figure out clever hacks to speed up these sorts of things, and a lot of engineers figuring out ways to build stonking GPU's to run OpenCL in a year or two. Expect stereo stuff to become mainstream somewhere around 2011-2012 would be my guess.

      • My first idea in response to this was to put the camera somehow behind the display. Maybe by having a translucent display or perhaps there is some technology out there in which the display emitters could also be used as detectors.

        So I jump on to Google and it turns out Apple has already patented [appleinsider.com] my idea. How did that pass the test of novelty and non-obviousness for a patent claim?

      • by dangitman (862676)

        So what I would like is a multi-camera system that uses similar kinds of interpolation to rebuild the image of the person so that they are looking directly at the camera. So if I put one webcam on either side of my screen, they can combine their images to create a shifted image where I am looking directly at the viewer on the other end.

        Sounds overly complicated. Why not just put the camera behind the screen, so the user is actually looking directly at the camera, rather than faking it?

        • by VanessaE (970834)

          Not to rain on your parade, but there's just no way this would work with current full-sized monitor tech.

          A CRT would need to have, at the very least, some optics embedded into the tube so that the camera itself could remain outside, and then you're interfering with the beam no matter where you put those optics. Besides, CRT's are pretty much obsolete except for a few corner cases.

          An LCD is out because you'd have to poke a small hole in the backlight reflector and diffusing layers for the camera to see out

    • I agree. It's neat, but not really useful for anything.

      I think getting the whole "eye contact" thing worked out would be much more useful as a way of making the experience feel more "natural". I am used to looking at peoples faces and having them look at mine when I chat with someone in person. Video chatting requires you to either get accustomed to people looking over your head/off to the side the whole time, or only watch the video with your periferal vision so that the person you are chatting with
    • Re: (Score:3, Insightful)

      by dinther (738910)

      Very cool and I like the fact that they give the webcam a double function. The 2.5D effect against a static background is indeed novel only but I see there is a confusion between 3D and stereo vision.

      I agree with most writers that stereo vision induces headaches which are simply due to the fact that the eyes each see a different image which giver your brain a depth cue, yet your eyes focal point (To your screen) conflicts with that depth cue thus resulting in a headache. It is unavoidable with normal screen

  • by Quarters (18322) on Thursday January 29, 2009 @03:54PM (#26657531)
    ...but that sample conversation at the end of the video may have well been between two drunken epilepsy sufferers on boats in the North Atlantic. Who moves around like that while they are talking?
    • by Jonah Hex (651948)

      I kept thinking of Stevie Wonder, and after watching that I understand why only he can move like that, it's extremely visually disconcerting and headache inducing.

      HEX

    • by d0rp (888607)
      Clearly this is meant for playing games of virtual dodgeball!
  • CGNU [homestarrunner.com] Video Conference System Gets 3D From Cheap Webcams
  • This does tons for immersion! It has to be implemented wherever there is a stationary camera (it obviously doesn't work with a camera phone). IIRC, Johnny Lee's work was free to use, so get to it and add that "Enable 3D" checkbox, developers! If only they'd cropped the resulting image to get rid of the black-ground, but that was probably just to show how it works.
    • i thought that failed if you have two eyes?

    • by LingNoi (1066278)

      Johnny Lee didn't invent this, it's been done tons of times before he even started his PHD thesis on using the wii mote.

      Look at the paper "3D display based on motion parallax using non-contact 3d measurement of head position"

      All Dr Lee did was a much simpler demo with a 3d box and 2D sprites using a wii mote instead of a camera and now everyone worships him like he's so amazing for it.

      If you did some research into it you'd realise that his demo sucks and if you read his paper he doesn't go into any detail a

      • by Zerth (26112)

        Duh, of course he didn't invent anything, he just hooked a wiimote up to a PC and used it to provide positioning for a camera in a virtual scene. Nothing special there.

        The reason it wowed eveyone is A) because nobody at nintendo thought to demo it first and B)it let everyone at home do the same thing for way cheaper than before.

  • Bandwidth reduction? (Score:3, Interesting)

    by Anonymous Coward on Thursday January 29, 2009 @04:50PM (#26658263)

    I wonder if a more practical use would be to use the technique for video bandwidth reduction. If you know where the person is, you could concentrate video bandwidth on the face region, while keeping the rest of the "video" relatively static. No point in continuously compressing and sending boring background. Of course many codecs already do temporal compression that gives a similar effect, but this might increase the efficiency for video chat.

  • Now I can see everyone's zits in 3d.

  • The reason we'd be moving left to right would be to see something which isn't in the frame.

    An idea I have been thinking about for awhile is to have the remote camera move when the user on the other side does. This would be much more convenient instead of having to ask the person to keep adjusting the camera angle to see outside the frame.

  • by dinther (738910) on Thursday January 29, 2009 @05:45PM (#26659061) Homepage

    Inspired by Johnny Lee's stuff, I pulled some old code out over a year ago and turned it into a decent engine that handles multiple screens and head tracking (TrackIR) to achieve the motion parallax effect. Like with all 3D effects, it needs to be seen but the following videos give you a good idea.

    Have a look at these demo videos and you can even download a demo:

    My first test
    http://nz.youtube.com/watch?v=X8PevTuEWlg [youtube.com]

    More accurate tracking
    http://nz.youtube.com/watch?v=yf1hu6GLmf0 [youtube.com]

    Multi screen study
    http://nz.youtube.com/watch?v=ZBdtPz2V_vY [youtube.com]

    Engine complete
    http://nz.youtube.com/watch?v=ku76aHq3pps [youtube.com]
    Download Demo
    http://vandinther.googlepages.com/virtualwindow [googlepages.com]

    • by mugnyte (203225)

        I just ran your demo, quite nice although a click-drag on the head (instead of fly) would be more educational.

        Since head tracking has a common solution, there's no need for IR (although precision is better). You should open this and get it connected to standard head tracking. It'd be quite nauseating, even with the lag. But that's a compliment in this area.

  • It looks like the application for this is chatting when you are drunk, standing up, and swaying about. I don't know anybody who constantly moves their head around when videochatting. They tend to look straight into the camera. And wouldn't you be rather concerned if the person on the other end of your chat did start moving around and looking at you from weird angles?
  • The floating square of background with a floating talking bust reminds me of Max Headroom.
  • While this is pretty neat, I'm not sure it 'enhances interpersonal communication' since everyone using it will be bobbing back and forth like a Stevie Wonder impersonator convention.

    Not to mention some schmuck in the US will soon sue because it made them puke from motion sickness.
  • USB webcams are pretty cheap these days. Why not use two, one on each side of the monitor?
    In fact I've seen web cam kits with 2 in the package.
    The would let you have true parallax, AND would have the benefit of making it appear that you are looking at the viewer.
    Solves the two main problems I see being discussed here for an extra $29.95 or so.
    Plus, it would make cool things like 3D position tracking possible (think Minority Report).

Today's scientific question is: What in the world is electricity? And where does it go after it leaves the toaster? -- Dave Barry, "What is Electricity?"

Working...