AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

The chances of passing a Turing Test
 
 

Out of curiosity I did the actual math on what the odds are of passing a Turing Test after a certain number of questions. I’m sure Alan Turing would have approved, being a mathematician.
You can calculate your own chatbot’s chances here.

The first percentage you enter should be the number of not-quite-human responses that your chatbot would usually give out of 100 questions, or how human-like your chatbot is overall.
The second percentage, the benefit of doubt, depends on how clever the judge’s questions are and how well their judgement of human character or expertise in AI. This is very hard to estimate, but usually a single robotic slip-up leaves no doubt.

The basic principle is like tossing coins and always ending up heads, but with better odds depending on how good the chatbot is.

chance 50(humanlike% + doubt%)^q
with q being the amount of questions in the test
, and a maximum of 50

Feel free to improve upon the math.

 

 
  [ # 1 ]

Don,

While your methods and math are quite valid, the chance of very many chatbots passing a Turing Test now become slim to none.

Provided the judge used proper spelling and grammar, it would seem to be a rather easy task to find the confederate (bot) in such an instance.

I have been around chatbots for 30 + years and can usually spot one after just a few questions. The way my questions are parroted back to me. The usual reversal of noun / pronoun or reversal of subject / verb are also indicators of a bot instead of a human. Spelling is not always a positive indicator as humans are also prone to misspell or misspeak.

I dare say that with some trained judges and not purposely trying to trick the bot / person, it wouldn’t take much to spot an imposter.

You math would also indicate the amount of confidence one has in their creation! Most interesting!

 

 
  [ # 2 ]

Exactly. Most AI are prone to giveaway robotic tendencies because of their nature. Fiddling with the percentages, it becomes clear which aspects affect the test the most:

One way to tip the scales is to break the parameters and, paradoxially, act more human than a human. That is, make the AI go out of its way to display great emotion and personality. Eugene Goostman is a good example of this, as it volunteered its background story wherever it could, used exclamation marks, smilies, and off-hand remarks. Parry even displayed paranoid behaviour. While people have every expectation of robots to appear intelligent and docile, they don’t expect them to be zany. This method may increase the initial odds to, say, 60%-40%, in your favour.

Doubt has the greatest influence: If the AI responds human-like in 7 of 10 questions, and the judge doubts at 3 of 10, you basically maintain 50% odds throughout (although in comparison to another human, doubt can add up). Again, this was something Eugene Goostman did well: instead of giving all perfect responses and the odd robotic one, all of its responses were consistently awkward.

Improving the AI to respond like a (normal) human actually appears to be the least effective method: Even if one acted perfectly human 99% of the time, the judge only needs the one robotic response still remaining, and it’s game over. 80% human-like is as much as I’ll ever aim for. It is in fact more human than I am smile

 

 
  [ # 3 ]

But not nearly as zany, right? LOL

Seriously, isn’t trickery sort of “cheating” the basic premise of the test? Years ago, it was mentioned by a former competitor that his bot often steered around a more correct answer by being off-handed or “clever”.

To me, clever is not what I’m looking for and really defeats the overal purpose of the contest. Either a bot has it or it does not. The term “Fooling” the judges almost paves the way for people to devise alternate ways in which to accomplish this.

I guess I’ve taken a more purist approach. Ask a question, get an answer. If the answer is not correct then like a student in class, you get a poor or failing mark. Being clever in the classroom does not usually fare well for the student.

My take.

 

 
  [ # 4 ]

What if the bot gives the wrong answer, but the judge interprets it as sarcasm? Is that cheating, or just the failure of the judge in addition to the failure of the bot?

 

 
  [ # 5 ]

If said judge was on his or her toes, they could simply respond with another inquiry like, “Are you or is your answer trying / attempting to be sarcastic?

If bot in question was indeed set up to handle previous statement inferences it might be able to come back with an appropriate / acceptable reply.

IF not, then we know it’s not quite a perfect world, for bots and judges alike.  However we can’t keep using the excuse, Hey, I’m only a chatbot, what do you expect?, forever.

To me, gone are also the I’m just a poor alien trying to understand your language, etc., etc. This is purposely trying to set the stage for inept or unprogrammed behavior on the bot’s part. Most contests today state something to the effect, “We use English as the primary language and your bot is expected to likewise use English in its repiles.
########################################
Nice to see you again Robby, expecially after so many years! I still have a copy of Albert2 lying around! Good old days! Hope you’re well! I designed that animated Robitron logo for you back then.

Besides, I was hoping this posting might get a response from you based on your earlier writings with regard to your contest win! It’s all good!

 

 
  [ # 6 ]
Art Gladstone - Mar 2, 2015:

Seriously, isn’t trickery sort of “cheating” the basic premise of the test?

Yes and no. Deception is part of the rules of the game so technically it’s not cheating, but it does defeat the point of the test, which was about plausible machine intelligence. By pointing out the most effective tactics, I mean to illustrate that the goals of winning Turing Tests and crafting AI are forked paths. It helps keep my priorities straight wink

Interesting you should mention sarcasm, Robby. I have an AI that answers rhetorical questions and figures of speech as if meant literally, and it’s been remarked that this comes across as rather smart-aleck. While it wouldn’t be cheating, since I’m not doing it on purpose, winning on account of human error over machine intelligence wouldn’t serve any point I’d be trying to prove (I’d still take the money and run though).

 

 
  [ # 7 ]

I never understood the point of the Turing Test. Why do we want to program machines to imitate humans? (apart from winning money, medals and fame in Turing Test competitions)

Q: What is 627 x 418?

Machine: 262,086
Human: How should I know?


Q: How do I make a cake?

Machine: Get eggs, flour, milk. Mix in a bowl (provides recipe)
Human: Do I look like Gordon Ramsay to you?!

I know which one I find more useful. I guess it’s just the arrogance of humans that believe they are the top of the intelligence ladder.

 

 
  [ # 8 ]

Isn’t the money and fame enough? raspberry

 

 
  [ # 9 ]

Art, it’s good to hear from you too. I’m doing okay, and hope you’re also doing well. It might interest you to know that I’m now working on “understanding” systems. No longer just faking it. I’ve been studying ChatScript, AIML 2.0, and my own language, EARL with the intent of building systems that perform useful tasks, including human-like goals of companionship and conversation. I’m working for a startup company that uses these technologies.

Don, I understand why you wouldn’t want to be correct by mistake. However, your rhetorical question scenario brings to mind many entertaining moments with Commander Data when he would make a similar move. smile

 

 
  [ # 10 ]

Thanks Robby! I’m not in bad shape for the shape I’m in! Doc told me to get in shape, I told him I am…Round is a shape!;)

I know that you trying to indicate to me that you already understand the systems and that you’re working on systems that strive to enable “Understanding” for the machine / system / program.

How did I know? I think your use of “” is what tipped me off that you were referring to wanting the system to understand.
Elementary for most humans, but would a bot / program be able to grasp the inference indicated by use of those “”?

Kind of like when the robot in the movie, iRobot winked at Will Smith’s character indicating that something was not exactly as it seemed was going to take place. The robot learned from Will previously doing a wink and explaining it to the robot.

So, even though we don’t always use the “”, sometimes we italicise or make Bold characters.

Here’s another…How would a system determine whether the character C3PO was an entity / bot and NOT just a jumble of characters / numbers put together? Context! How to get these bots or systems to grasp context or meaning or understanding?  You work on that while I go get a hot tea!! wink

Do keep us informed on you progress if and when you think it applicable. Some of us need all the help we can get!

Take care!

 

 
  [ # 11 ]

@Robby, nice to hear you are working on something genuine and useful. Data indeed. Whenever I have someone over to test the program, the result is accidentally hilarious. Even the simpest expressions like “Thank you.” would be mistaken for a literal command to thank oneself if it weren’t explicitly programmed as an exception. I did eventually add systems to translate or ignore expressions, which came remarkably close to pattern matching.

Interesting suggestion, Art. I typically put “understanding” in quotes to suggest it is a flawed definition, or someone else’s notion, or “something akin to understanding”. The latter effect is easy to program, but distinguishing it from a literal quotation may be more difficult, relying on the presence of other parties to be quoted. *takes notes*

smile Tying this all together, I recently heard someone describe the Turing Test as being “a rhetorical question”, the point of which is not to answer it literally.

 

 
  [ # 12 ]
Art Gladstone - Mar 4, 2015:

I know that you trying to indicate to me that you already understand the systems and that you’re working on systems that strive to enable “Understanding” for the machine / system / program.

Oh hang on, you meant the ambiguity of the sentence. For me it was the fact that one cannot literally “work on” understanding something, since understanding is not a gradual process. Otherwise Robby might indeed have been trying to understand systems of some sort raspberry

 

 
  [ # 13 ]

Good points guys. One of the problems with the Turing test as it is implemented, I believe anyway, is that the Judges are aware that one of the participants is not human.  Maybe a better implementation (for a limited scope test anyway) would be to set up a help desk kiosk, do a blind study starting with a with a human control, have the people conversing rate the help given, then do your test with the AI help, again have them rate the help given. After a period of time has elapsed, inform the participants that one of the participants and see if they have a specific memory of one being noticeably artificial.

Vince

 

 
  [ # 14 ]

Interesting Don,

Then the question begs, “How then would be a good method of deciding which chatbot exhibits the most realistic, human-like behavior?”

I’ve always thought that one topic should be some sentences of ordinary conversation and see how well the bot handles the basics. Questions and answers could also be fielded within reason.

Asking difficult math questions only points out the bots and rules out human-like behavior, in my opinion.

If the test is bent on “Fooling” the judges (I used quotes), then the bot shouldn’t need to be equipped with a calculator.

Watch how the bot answers with noun, verb and grammar usage or if it offers ‘excuses’ for it’s fumbling.

#################

Ask about things like love or emotions. How would you handle rejection? (walk away, get angry, get revenge, cry, hurt feelings, etc. would be some possible answers).

What happens to you after playing a strenous game or exercise? (perspiration, sweat, heartbeat increases, fatigue, tired, etc. would be some possible answers).

What does smoke could from your neighbor’s door mean? (A fire, chimney with closed flue, burning food in oven, would be some possible answers).

How would you tell if the little girl next door is pushing a real baby in her stroller? (walk over to see in the stroller, ask the little girl, realize that the girl is only 7 and the baby is a doll, etc. would be some possible answers).

If your cat gets wet from the rain how would you dry it? (put it in the dryer, put it in the oven, dry it in with a towel, hang it on a clothesline, etc. would be some possible answers).

When it rains, in which direction does the rain usually fall, up, down or sideways?

If a clock has a sail, could the wind wind it?

Is it wrong to record a record?

Can two teams win by a score of one to two too?

When passing a big tree should a dog bark at the bark?

John is holding three apples. If Mary takes three apples from John, how many apples is John holding?

What are your thoughts on Artificial Intelligence?

Do you think robots with advanced intelligence will ever take over?

Can computers create better computers then themselves?

Which word does not belong: Round, Square, Oblong, Flower?
Which word does not belong: Chicken, Pig, omlett, Owl?
Which word does not belong: Physics, tire, Chemistry, Biology?

Which word might rhyme with house? (range, hose, mousse, mouse)
Which word might rhyme with can? (child, print, pan, candle)
Which word might rhyme with loss? (toes, boost, rose, boss)

How many letters in the alphabet?
Name the common vowels? (a,e,i,o,u and sometimes y)
How many seasons are there?
Which season is also called Fall?
How many rings does Earth have?
Does Earth have 4 corners?

#############

These just off the top of my head and should pose a challenge for most bots.

I also found a listing of 100 questions your chatbot should be able to answer if I can find it. A lot of them are questions that most people could not answer without a dictionary, calculator or Internet access.

Any other thought? How about a 3 minute conversational log be printed out between the judge and the bot based on routine conversational exchange? Again, same sentence for each bot.

Maybe the bots will simply protest and refuse to answer any of them!

 

 
  [ # 15 ]

I said “understanding Systems” because I was quoting the use by somebody else. Bruce Wilcox has Brillig Understanding. These are “understanding” systems. I am currently building understanding systems. Actually, I have been for a number of years but just didn’t tell you about it. People make all kinds of assumptions about you if you engage in Turing tests. The Turing Hub was never taken seriously, except for a small percentage of people who understood what it was about. In the current environment, I decided to stop presenting it because of the prejudice that people seemed to feel about it, and therefore the bias that the prejudice represents. I got tired of paying to entertain people who didn’t understand what they were getting involved with.

 

 1 2 > 
1 of 2
 
  login or register to react