From the class syllabus:
Week Three: NLP Tasks and Text Similarity
Week Three will cover Vector Semantics, Text Similarity, and Dimensionality Reduction. I will also go through a long list of sample NLP tasks (e.g., Information Extraction, Text Summarization, and Semantic Role Labeling) and introduce each of them briefly.
Reading this, I was thinking:
I want to enter a text, then compare other subsequent texts to it to see whether they are similar enough to warrant the same response.
Count words in each text and report the similarity as number of similar words?
Suppose an array of reference texts linked to potential output texts.
For each word in the new input:
If that word is in the reference text, add 1 to the similarity total.
If a word in the input text’s equivalence relations is in the equivalence relations of a word in the reference text, increment the similarity score.
[Note: equivalence relations can be stored and retrieved in http://subbot.org/logicagent/ .]
Store the final similarity total for that reference text, go on to the next reference text, and repeat the steps above.
Choose the reference text with the highest similarity total to generate an output.
—-
Reference texts are trained, or copied and pasted, from the parents of my posts, say. My responses are linked to the reference texts that provoked my response.
Run the similarity algorithm on all the initial reference texts to see if some can be consolidated. (How to “consolidate”?)
—-
Just some thoughts provoked by looking at the syllabus. Does the class provide any useful programs I could plug into my app and play to give me a similarity score between two texts?