@CR: Yes, the amount of processing required with my algorithm is immense. My first implementation was a ‘recursive decent’ approach, which, when the input reached 20 words it would take about 10 minutes to generate all trees - yes, unacceptable.
However, since then, I have rewritten a huge portion of that part of the bot engine and speeds are now acceptable. Example, a 20 word input takes about 3 or 4 seconds. Now don’t forget, this is in a single thread. Later, I will figure out how I can have a main process spawn other threads to work in parallel, and run it on a quad core cpu
I’ll ask our ‘C, C++, C#’ expert, Chuck, on how I can do threading. Also, like you, I am using an interpreted, dynamic language (also known as ‘scripting language’) Perl. Later, I will convert to C—I have even thought of coding some of the very cpu intensive functions in assembly. Very tight and efficient code
Basically I have my ‘parse tree generators’ and then a ‘driver’ to call them. The old way, recursive decent, was too CPU hungry. I basically turned the ‘ptgen driver’ algorithm upside down : now it works bottom up, instead of top down. That seemed to make all the difference in the world.
Also, another very important thing I make extensive use of is caching. Often, 2 parse tree generators will be trying to figure out the same thing, so it is very important to cache that, so first thing a ptgen (parse tree generator) does is check if the answer to its question is already in the cache, if not, it figures it out and puts the result in the cache so the next ptgen doesn’t have to re-figure that out.
The cache is cleared on the next user input OR if you tell the bot a new word, or edit a word’s def.
Something else to remember - computers are still using the ‘Von Neumann architecture’ - which sure wasn’t designed for something like NLP, but rather ‘programmable calculators’. Their may be no way to do this without employing several CPUs, perhaps racks of CPUs…*or* some very clever efficient ‘serial’ algorithm designs. Perhaps an NLP algorithm could be implemented as an ASIC (application-specific integrated circuit).
@Erwin - that maybe not be a bad idea.