Thank you Fatima…..a h*** of a lot of work has went into this project so far
Actually Grace would extremely capable linguistically by now, but as I mentioned, I’m on the third engine rewirte (2 versions in Perl, one in early ‘09 & one in early ‘10, then the C++ rewrite which started in November ‘10).
I will be tracking Grace’s progress, and very detailed test results like those shown in previous posts, using Google Docs, which I can grant access to anyone that is interested in really following closely her progress. Later, when I ‘hook her up’ to the internet, perhaps some of you will want to train her. She is not that far away from learning new words and phrases by naturual language.
The areas of progress, with test results, that will be kept track of on Google Docs will be:
1) Grammatical Knowledge….. that is, how a given input could be explained in grammatical structure, of course by all valid grammar constructs (yes, sometimes combination explosions).
2) World Knowledge—- this is data Grace uses to promote some parses over others. This of course is a huge job. I will track this in a seperate spreadsheet perhaps on Google Docs.
3) Question answering logic—very generic and flexible routines to do semantic delta and finding closest fact parse tree to answer questions. This is where, when Grace notices an input string that she decides is a parse tree of a fact, or the parse tree representing a question, she knows what to do (how to go about choosing the corrresponding parse tree that is a fact which can supply the data requested in the question… true naturual language query)
4) Inference - intiially, provided code that will allow Grace to take on or more naturual language sentences, and instantiate a logical argument to reach a conclusion, which will be of the form of a naturual language statement. So this is where when Grace is asked a question, and she doesn’t find any fact parse tree (*.pt file in her KB directory), she will try to generate it… she will find a logic module that has a conclusion of the same form as the desired fact. The logic module will indicate the number and types of propositions required. The inference then will be within the logic of the module. The cool thing will be, when that logic module (call it “LM-1”), requires a proposition, the KB will be consulted first (again, *.pt file in her KB directory). If that proposition (as a ‘fact’ parse tree), doesn’t exist, what will she do? You guessed it.. try to find another LM which would generate a conlcusion of the form of the required proposition of the first LM…..this will go recursively until she returns to the original LM, hopefully with all required propositions, and deduces the response. I won’t be working on this until probably late next year though… (God, to be able to work on this full time is my dream!!). So the goal here, is to have, like forward chaining, and back chaining, but not in representations like F.O.L., but in full, rich NL.
I have still more plans for inference but I’m not going to torture myself… since I won’t even be able to start on that type of functionality, until, like I say, perhaps late next year….. for one person, working on weekends… I think 1 - 3 above is enough.
I see the first job for Grace would be an extremely flexible knowledge base query. Naturual language will over flexibility that conventional SQL database type applications can’t provide. Take just a very simple example—storing telephone numbers. This is just a very simple example to get the idea across. I want to store telephone numbers, let me create a table. What columns do I need—well, name of person, say 30 charactors, and a second column, telephone number. So I enter a bunch of numbers, but then, oh, I need to indicate if the phone number is a cell or landlane….oh… I have to go into my DB, and issue an “ALTER TABLE” statement to add a column. Use the app for awhile… oh no… I realize, I should indicate if the given number is the persons work number, or home number….. use the app for awhile again .. oh no… I realize I should differentiate again between x and y,.... and on and on we go.
Now, imagine the NL solution….
user: Computer…. John’s phone number is 111-2222.
computer: OK
user: Quick… what is John’s cell phone number???
computer: Well I don’t know if it is his *CELL* phone number, but you told me is phone number was 111-2222.
user: oh yeah, that’s his cell, thanks.
< system updates its DB itself, using the word “cell” as an adjective to modify “phone number” and adjusts the applicable “fact tree” parse in its KB) >
user: Computer, what is John’s WORK cell phone number?
computer: Hum, well, you told me his cell number number was 111-2222, but didn’t indicate if that was WORK cell or not.
user: no, that is his personal cell number
computer: ok, well, sorry I don’t know then
< system, although couldn’t supply answer, has learned something.. that 111-2222 is John’s PERSONAL CELL number, updates its KB >
* * *
So…you get the idea. Instead of manually updating the SQL DB, the bot takes care of that “house work”. I intend on having all of Grace’s knowledge stored as parse trees, that way, no details are lost.
So, before we can have a dialog like this with Grace (well, we can right now, as in the previous posts, she can handle a variety of ‘did-<statement>’ questions as long as, right now, you limit your topic to people going to different types of social events and going to different rooms in a house LOL… she needs to learn more about the world to “get” other types of statements… like Merlin’s “What gender” question, when I give her more info about what gender means (that it, for example, has values of male and female, etc)), but before the rest of the dialog is possible, I need to ‘buckle down’ for the next several months with supplying her with enough grammar knowledge. Boat load of work to do !!