Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Robotics Technology

Replacing the Turing Test 129

mikejuk writes A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks. A recent workshop at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion is that the Turing Test had reached its expiry date and has become "an exercise in deception and evasion." Marcus points out: the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers which has motivated the new initiative for a multi-task competition. The one of the tasks is based on Winograd Schemas. This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is: "The trophy would not fit in the brown suitcase because it was too big. What was too big?" Another suggestion is for the program to answer questions about a TV program: No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh. Another is called the "Ikea" challenge and asks for robots to co-operate with humans to build flat-pack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate. This at least is a useful skill that might encourage us to welcome machines into our homes.
This discussion has been archived. No new comments can be posted.

Replacing the Turing Test

Comments Filter:
  • by Anonymous Coward

    The Turing test was a CONCEPT, not an actual test.

  • "The trophy would not fit in the brown suitcase because it was too big. What was too big?"

    If you change this to "The trophy would not fit in the brown suitcase snugly because it was too big" I wouldn't be able to answer it, either.

    • by itzly ( 3699663 )

      Why would we need to replace the Turing test, if it's perfectly possible to ask these types of questions as part of the Turing test.

      • by rtb61 ( 674572 )

        The most obvious replacement for the Turing test is the politician test. How 'smart' does a computer need to be to pretend to be smart when speaking on behalf of others. So receiving 'er' contributor input and producing a speech that sounds really good based upon those 'er' contributions, which doesn't mean much and surreptitiously hides the real intent of the 'er' contributors in the speech behind sounds good messages. Sort of a Bush vs Obama measurement scale, both doing much the same thing but one sound

  • I like the idea of the IKEA challenge but why include a human? I would think having a robot
    open a box, pull out the instructions, and assemble the piece of furniture would be huge.
    Having a person involved just muddles the issue. You obviously might have to start with
    simple furniture but this seems like a worthwhile challenge as assembling furniture at
    times can even stump humans.

    • To be fair the test has to be something possible for a human, also

      • What's difficult for a human in assembling IKEA furniture is PAYING ATTENTION to the instructions. I have been assemblling IKEA furniture for 40 years (since there were only two IKEA stores in the whole world) and I have never had instructions that were not right. When humans screw up an IKEA assembly, it's because of choosing the wrong piece, or not rotating the piece to the correct orientation, or not using the correct screw/bolt/doohickey for the currrent stage.

        Frankly, the "IKEA test" identifies crea
    • by itzly ( 3699663 )

      The extra human is to see whether the computer will end up in a emotional argument with the human about the best way to interpret the instructions.

    • by Greyfox ( 87712 )
      For some reason I just thought of Calculon breaking down and crying, "Why are there THREE EXTRA SCREWS?! Whyyyyyy!"
      • Calculon is an actor, not a businessrobot.

        It's cheaper to include extra screws than to pay customer service to deal with the the complaints of missing parts, or to cover the extra cost of a more thorough inspection process. The easiest way to tell if a unit was packed properly is to weigh it precisely, and only ship the units that weigh the correct amount. A missing screw may fall within the error margin of the scale, so by throwing in an extra screw or three, the risk of actually being short a part is grea

    • Presumably to immunize the experiment against shortcomings in the mechanics.

  • Clever programming and mechanics do not make "AI" and human "robots." Interesting machines, but nothing more. Nature is not an idiot.
  • Really -- someone suggests a computer program could identify when to laugh at a sitcom? When humans are likely to disagree rather strongly about which parts are the funniest? Heck, even Mycroft's first jokes were on the weak side of humor. It took a lot of coaching from the humans to get (his) jokes classy.

    • It would be a trivial program to listen for the laugh track.

    • by Sarten-X ( 1102295 ) on Sunday February 08, 2015 @05:04PM (#49012571) Homepage

      The most common underlying basis of humor is subverted expectations. We expect people to behave according to the norms of society, we expect people to act to the best of their intelligence, we expect misfortune to be avoided, and we expect that words will be used according to their common meanings.

      Subvert any of those expectations, and you have various kinds of humor. How funny a particular joke is perceived to be is related to how strongly the viewer is attached to their expectations. Since a computer is only an expert in the things they've been explicitly exposed to, it's very difficult to subvert their expectations. Watson would be familiar with all of the meanings of each word in a script, for example, so it would have a difficult time identifying the usual meaning that a human would expect from a situation, and would therefore likely fail to notice that when a different meaning was used, it was an attempt at humor.

      As another example, consider a military comedy, like Good Morning, Vietnam. Much of the humor is derived from Robin Williams' fast-paced ad-lib radio show contrasting the rigid military structure, and the inversion where a superior at the radio station is practically inferior in every way. A computer, properly educated in the norms of military behavior, might recognize that the characters' behaviors are contrary to expectations, but then to really understand the jokes, the computer must also have an encyclopedic knowledge of pop culture from the period to understand why Williams' antics were more than just absurd drivel.

      Finally, a computer must also understand that humor is also based largely on the history of humor. Age-old jokes can become funny again simply because they aren't funny in their original context any more, so their use in a new context is a subverted expectation in itself. Common joke patterns have also become fixed in human culture, such that merely following a pattern (like the Russian Reversal) is a joke in itself.

      Algorithms simply haven't combined all of the relevant factors yet to recognize humor. Here on Slashdot, for instance, a computer would need to recognize the intellectual context, the pacing of a comment, the pattern of speech, and even how much class a commenter maintains, in order to realize when someone is trying to be funny.

      Poop.

  • An AI to add a laugh track to the Simpsons so you'll know when there has been a joke.

  • It seems like the startup investors would get sucked in then. Way more cool to be 2.0 than 1.0.

  • by jimmydevice ( 699057 ) on Sunday February 08, 2015 @01:58PM (#49011569)

    And tell it to make something useful.
    Virtual junk is okay and any virtual tools can be used.

  • by hey! ( 33014 ) on Sunday February 08, 2015 @02:11PM (#49011653) Homepage Journal

    And of course there should be. But that doesn't diminish the importance of the Turing test.

    The Turing test has two huge and closely related advantages (1) it is conceptually simple and (2) it takes no philosophical position on the fundamental nature of "intelligence". That such huge advantages necessarily entails certain disadvantages should come as no surprise to anyone.

    The Turing test treats intelligence as a black box, but once you've contrived to pass the test the next logical step is to open up that black box and ask whether it's reasonable to consider what's inside "intelligent" or a tricky gimmick. That's a messy question, and that's *why* something like the Turing test is valuable. It is what social scientists call an "operational definition"; it allows us to explore an idea we haven't quite nailed down yet, which is a reasonable first step toward creating a useful *conceptual* definition. Science builds theories inductively from observations, after all.

    If the Turing test were a suitable *conceptual* definition of intelligence than an intelligent agent would never fail it, but we know that can and does happen. We have to assume as well that people can be fooled by something nobody would really call "intelligence". Stage magicians do this all the time by manipulating audience expectations and assumptions.

    Think of the Turing test as a screening test. Science is full of such procedures -- simple, cheap tests that aren't definitive but allow us to discard events that are probably not worth putting a lot of effort into.

    • by itzly ( 3699663 )

      the next logical step is to open up that black box and ask whether it's reasonable to consider what's inside "intelligent" or a tricky gimmick

      When a computer truly passes the Turing test, the internal mechanism will be too complex for our judgement.

      • by hey! ( 33014 )

        That's certainly a very interesting conjecture, but it's a little broad for my taste. I should think it more likely that we'll gain some insight, but that these insights will create new questions we can't answer right away.

        • by itzly ( 3699663 )

          I agree it will be interesting to look. At least we'll have better access to the internal information than with our human brains. But if the computer has intelligent behavior in every reasonable sense of the word, I don't think you'll be in a position to judge the internals as a "tricky gimmick" any more than you could call a human brain a tricky gimmick.

  • What has become of those compression tests? Wasn't the answer to AI not (at least partially) found in the ability to compress?

  • A test of intelligence should be dealing with unforeseen input. The problem with chatbots is that they are just giving pre-planned responses. How about trying to land a rocket on the moon while being bombarded with spurious input [wikipedia.org] from a radar device that was accidentally left on? Given the computers in use by NASA in 1969 that's pretty intelligent behavior.

    Another would be landing a rocket on a small floating platform. We'll see how that plays out tonight.

    • Another would be landing a rocket on a small floating platform. We'll see how that plays out tonight.

      That would be a very impressive display of controls algorithms, but not AI. You can build something doing essentially the same tasks with LEGO Mindstorms - taking sensor input, and using that to control physical motion. The only difference is that SpaceX has far more and far better sensors, has some very complicated and impressive intermediate math, and sends the output signals to rocket engines rather than electric motors.

  • That's all we need. Computers with a sense of humour:

    "Oops! I deleted all your files!"

    "Just kidding. I moved them to your Documents folder. :P"

  • The Turing Test has been abused, bypassed, and cheated to the point that almost no one knows what the actual Turing Test is. At this point, a new test needs to be created, a test that is difficult to cheat without making it obvious that it's not the real test. This could be "The Real Turing Test administered by [reputable group]".

    Or we could make a new test, with incredibly explicit criteria that no one can nerf with a straight face and a different name. But from the sounds of it, it would be an easier test

  • by aix tom ( 902140 ) on Sunday February 08, 2015 @02:44PM (#49011837)

    The original Turing Test, as published in "Computing Machinery and Intelligence" as "Imitation Game" was not about whether a machine could successfully pretend to be a human.

    He proposed a test, where a computer and men both pretended to be women, and the test would be passed if the computer would be more successful in lying about being a woman than the men were.

    http://en.wikipedia.org/wiki/T... [wikipedia.org]

    • Turing was actually somewhat ambiguous. The first, and only detailed, formulation of the Imitation Game had an interrogator, a man, and a woman. After that, he switched to a person and computer. The question asked in the first game was intended to distinguish between a man and a woman (asking about hair style). The questions listed later aren't gender-related.

      • by aix tom ( 902140 )

        In my opinion the "what questions are asked" by the interrogator is only a small part of the test setup. I think the main point is "what is the question that is asked of the interrogator"

        In that area the question "do you thing your opponent is a computer or a human" is influenced hugely by the interrogators knowledge and perception of what a computer should be able to do and what it should not be able to do. So asking the interrogator "find out if your opponent is a man or woman" might be a good way to have

  • ... FTW.

    "Describe in single words, only the good things that come into your mind. About your mother."

  • A rigorous definition of general intelligence now exists and has been applied by the Deep Mind folks. See this video lecture by Deep Mind's Shane Legg at Singularity Summit 2010 on a new metric for measuring machine intelligence [youtube.com].

    If you want something more accessible to the general public, The Hutter Prize for Lossless Compression of Human Knowledge [hutter1.net] has the same theoretic basis as the test used by Deep Mind and has the virtue that it uses a natural language criterion, in the form of a Wikipedia snapshot. I

  • Turing's test was about the ability to imitate human behavior/knowledge. The real question we need to answer I will call the Mycroft [wikipedia.org] test. The purpose of the test is to determine if the program has earned the right to not be turned off, that is, does it have a right to a trial before it is "terminated"? A program that has earned that right has crossed the blurry line between inanimate and "human" in a way that should be important to us. Defining a test that can measure this is at the heart of deciding what
  • Let me see if I've got this straight. If you can watch an episode of the Simpsons and know when to laugh, then you're intelligent.
    Or at least a real person.

    Better go with answer number two. Doh!

An age is called Dark not because the light fails to shine, but because people refuse to see it. -- James Michener, "Space"

Working...