Gary Dubuque - Aug 29, 2010:
“Jan is my name.” is storing for the individual Jan.
“Jan is a person.” is storing the individual Jan in the class of person.
“Jan is old.” is storing the attribute old for the individual Jan.
These are great examples. My bot will know how to differentiate these based on what it knows about the words “name”, “person” and “old”. Also, it will take into consideration the modifer “a” to “person” to understand that you mean
Jan is an instance of the class person
I don’t agree with your <subject> is <subject> though. It is often very tempting to ‘force’ how we think of grammar so it fits a nice simple algorithm, but I think it will lead you to a dead end. Natural language and grammar are exceedingly complex, and any attempt to ‘force’ it to work a certain way, to make programming easier, will fail. My approach, is not to force grammar to work to my algorithm, but force my algorithm to work as our grammar does.
“Jan is my name”
I think you may be confusing the fact that you have two nouns, “Jan” is the subject noun, and “name” is also a noun, but it is the direct compliment noun, whereas “Jan” is the subject noun. A direct compliment noun forms part of the predicate.
rule: Sentence = subject (nouns) + predicate
subrule: Predicate = verb + compliment
subrule: compliment = adjective or noun
This is only one possible definition of a sentence. Some sentences, rather than use a single word for the subject noun, can have a noun clause as the subject. Other sentences do not have any direct compliment noun or direct compliment adjective, but instead simply a verb, and that verb being modified by a prepositional phrase.
here is how my bot parsed that input .. .
pos = simple-sentence
subject.noun1.val = Jan
subject.num-noun = 1
predicate1.dcomp.noun1.adjective1.val = my
predicate1.dcomp.noun1.num-adjective = 1
predicate1.dcomp.noun1.val = name
predicate1.dcomp.num-noun = 1
predicate1.num-verb = 1
predicate1.verb1.val = is
In the above, “dcomp” is short for “direct compliment”.
Also, I ran the other way through CLUES (“My name is Jan”), here is what it reported…
pos = simple-sentence
num-predicate = 1
subject.noun1.adjective1.val = my
subject.noun1.num-adjective = 1
subject.noun1.val = name
subject.num-noun = 1
predicate1.dcomp.noun1.val = jan
predicate1.dcomp.num-noun = 1
predicate1.num-verb = 1
predicate1.verb1.val = is
Erwin Van Lun - Aug 29, 2010:
Nevertheless: is it more important to understand the user, than writing the best algorithm to understand proper English?
What I am going to do with my bot is, first try proper English grammar. If that fails to parse the input, the system will then try ignoring some rules, then trying again to see if it is able to parse the input.
Think of it as a spell check, where a given word from the user is compared to known words by how many letters are in common. Same idea, except, CLUES will determine how many grammar rules it had to ignore in order to parse your input.
Example, if it had 2 grammar rules, Rule 1 and Rule 2.
If Rule 1 was made up of 3 ‘subrules’ (see above), 1 of which was ignored and was able to parse user input.
If Rule 2 was made of 4 subrules, and 2 had to be ignored to enable parsing,
then system would use Rule1 to interpret the user input.
It could also tell the user so they could improve their english.
Example, “You are to fast”
this is the incorrect form of ‘to’ , it should be ‘too’.
to = preposition
too = adverb
if CLUES couldn’t parse by using ‘to’ as it was written, it may ‘bend the rules’ and consider ‘to’ as ‘too’, (adverb), and then try again.
If parsing was possible after changing ‘to’ to ‘too’, it will assume the user meant that instead.
CLUES will know this because it knows ‘fast’ is an adjective, and you using don’t have a prepositional phrase of ‘to fast’ with the antecedent of ‘are’.
Instead, ‘fast’ is an adjective being modified by adverb ‘too’ and ‘fast’ is modifying ‘are’.
Since it will have a rule that relates ‘too fast’ to antecedent ‘are’ but NOT have a rule that relates ‘to fast’ to antecedent ‘are’, after it has changed ‘to’ to ‘too’ it will work (parsing would then be possible).
I will have a mapping of commonly misused words like ‘to’ and ‘too’. Also their, there and they’re.
So to answer your question of proper English versus understanding the user—it still needs to be based on grammar, since the system has to know how far off the user is from each proper English grammar rule. The rule he/she was closest to, could be what the user really meant.