Interesting Don,
Then the question begs, “How then would be a good method of deciding which chatbot exhibits the most realistic, human-like behavior?”
I’ve always thought that one topic should be some sentences of ordinary conversation and see how well the bot handles the basics. Questions and answers could also be fielded within reason.
Asking difficult math questions only points out the bots and rules out human-like behavior, in my opinion.
If the test is bent on “Fooling” the judges (I used quotes), then the bot shouldn’t need to be equipped with a calculator.
Watch how the bot answers with noun, verb and grammar usage or if it offers ‘excuses’ for it’s fumbling.
#################
Ask about things like love or emotions. How would you handle rejection? (walk away, get angry, get revenge, cry, hurt feelings, etc. would be some possible answers).
What happens to you after playing a strenous game or exercise? (perspiration, sweat, heartbeat increases, fatigue, tired, etc. would be some possible answers).
What does smoke could from your neighbor’s door mean? (A fire, chimney with closed flue, burning food in oven, would be some possible answers).
How would you tell if the little girl next door is pushing a real baby in her stroller? (walk over to see in the stroller, ask the little girl, realize that the girl is only 7 and the baby is a doll, etc. would be some possible answers).
If your cat gets wet from the rain how would you dry it? (put it in the dryer, put it in the oven, dry it in with a towel, hang it on a clothesline, etc. would be some possible answers).
When it rains, in which direction does the rain usually fall, up, down or sideways?
If a clock has a sail, could the wind wind it?
Is it wrong to record a record?
Can two teams win by a score of one to two too?
When passing a big tree should a dog bark at the bark?
John is holding three apples. If Mary takes three apples from John, how many apples is John holding?
What are your thoughts on Artificial Intelligence?
Do you think robots with advanced intelligence will ever take over?
Can computers create better computers then themselves?
Which word does not belong: Round, Square, Oblong, Flower?
Which word does not belong: Chicken, Pig, omlett, Owl?
Which word does not belong: Physics, tire, Chemistry, Biology?
Which word might rhyme with house? (range, hose, mousse, mouse)
Which word might rhyme with can? (child, print, pan, candle)
Which word might rhyme with loss? (toes, boost, rose, boss)
How many letters in the alphabet?
Name the common vowels? (a,e,i,o,u and sometimes y)
How many seasons are there?
Which season is also called Fall?
How many rings does Earth have?
Does Earth have 4 corners?
#############
These just off the top of my head and should pose a challenge for most bots.
I also found a listing of 100 questions your chatbot should be able to answer if I can find it. A lot of them are questions that most people could not answer without a dictionary, calculator or Internet access.
Any other thought? How about a 3 minute conversational log be printed out between the judge and the bot based on routine conversational exchange? Again, same sentence for each bot.
Maybe the bots will simply protest and refuse to answer any of them!