First, I should be clear. Skynet-AI does not use Part-Of-Speech (POS) tagging.
I believe ChatScript does, but you would have to ask Bruce about how he uses it.
It may be interesting for some as to why I don’t use it.
I looked at using POS in a semantic pipeline to help aid in understanding. My goal for an AI is to understand the input phrase or Natural Language Understanding (NLU). In such a system the POS would be fed to downstream tasks. Some popular taggers include:
- Brill Tagger
- Stanford Part-of-Speech Tagger
- NLTK Tagger
POS Tagger Limitations
In Skynet-AI, the AI interpreter is downloaded each time a user signs on. This creates constraints in practical resource sizes. Since most of the taggers use a dictionary, this increases overhead. A standard tagger is larger than all of Skynet-AI. I did create my own real-time tagger with a minimal dictionary and rules. It looked promising, but ultimately was not part of Skynet-AI’s design. If people have enough interest, I could freshen it up and put a demo on-line.
In the paper Manning gives a few facts about the accuracy of tagging. Current state of the art is about 97% (about the same as a human). But it drops dramatically for full sentences (56%). Since the goal is NLU of the input phrase, I am more concerned about the 56%. That would mean about every other sentence would have an error in it. This led me to conclude that humans have some fuzzy mechanism in the process that allows them to account for or ignore these errors and still understand the input.
Manning gives a number of reasons for the POS errors:
Frequency of different POS tagging error types.
Class Frequency
1. Lexicon gap 4.5%
2. Unknown word 4.5%
3. Could plausibly get right 16.0%
4. Difficult linguistics 19.5%
5. Underspecified/unclear 12.0%
6. Inconsistent/no standard 28.0%
7. Gold standard wrong 15.5%
My analysis showed that many of these problems would be increased when accounting for “text speak” with typos and shorthand phrases in it. As a result I took a different approach with Skynet-AI which has worked out well.
I have also been interested in universal parts of speech from a stand point of intelligence/NLU. The basics may be; nouns, verbs, adverbs, pronouns, determiners/articles, numbers, punctuation. You might also add tags for conjunctions, particles, and other (abbreviations, etc) according to the paper. As the size of a bot grows, and you are looking to handle ever larger numbers of responses, one of the areas that POS tagging may help is in Natural Language Generation (NLG).
Other related areas I found interesting:
Link Grammar
Using a Neural Net to learn POS
Neural word vectors/Deep learning for NLP