AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Google’s open source offering of SyntaxNet technology that uses TensorFlow for NLU
 
 

In the following article the author announces Google’s open source offering of SyntaxNet technology that uses TensorFlow for NLU:
Google’s Parsey McParseface is a serious IQ boost for computers

SyntaxNet

How does this help with actual understanding?  Is TensorFlow only being used to more accurately label POS and functional roles?  If we have a tool that is 100% accurate in labeling the POS and a word’s role in a sentence then what?  What do we do with a correctly tagged and labeled parse tree?  How do use that to interface with an ontology or world knowledge and reason about the user input in order to provide an insightful response based on the user’s or the chatbot’s goals.

I am not seeing how statistical training and neural nets are providing real understanding in AI beyond linguistic POS and role tagging.  Perhaps their use in sentiment analysis starts to cross over into some sort of semantic interpretation or understanding.

Are there any tools that actually address extracting meaning from the sentences or match the input to conceptual knowledge beyond syntax and a words function in a sentence?  A tool that would make it easier than say hardcoding each use case in a scripting language depending on the author’s needs? 

I know that Chatscript at least tags words with concepts that are user defined in addition to linguistic concepts such as “Verb”, “Noun”, “Main Subject”, “Main Object”, etc.  It also uses WordNet and you can hierarchically define concepts and facts so that your chatbot can have some ability to generalize and every case does not have to be hardcoded.

Having a tool that can correctly parse a sentence is a good start, but are there any tools that assist with actual semantic reasoning that are already integrated within a NLU framework?

 

 
  [ # 1 ]

As far as I’ve read their explanation (I don’t read too closely when something’s called “Parsey McParseface”), the neural network is only used to deal with ambiguity. Seeing as this is Google I assume this means it’ll remember common word combinations from statistical training, and will assume the more common combinations are more likely meant than the less common. e.g. in their example “Alice drove down the street in her car”, likely it’ll never have come across mention of “street in car”, but more often have read “drove in car”, and so would assume the location applies to the verb/subject. This sort of thing is vital to extract facts correctly. You’d want a program to extract “Alice drove in car” rather than “street was in car”. The strength of a statistical approach is that it’s most often right. The weakness is that it’s always wrong about statistically uncommon situations (See also the Winograd Schema Challenge).

Grammar parsing is always a first step towards understanding, a preprocessing phase. Once you can identify main verbs and their subjects, you can at least extract “subject-verbs-object” fact triples from the sentence, which you could then store in a knowledge database or check with an ontology (if it’s a question about whether a subject verbs the object, for instance; whether “a brown cow in winter eats grass in the morning” = cow-eat-grass). But SyntaxNet doesn’t do that for you, nor does it offer a tie-in to WordNet like ChatScript does. So the parser’s function doesn’t differ from any other grammar parser, save for a little extra disambiguation. The semantics and fact extractions are still something you have to do yourself.

 

 
  [ # 2 ]

It is true that Google’s Syntax net doesn’t handle the semantics or any disambiguation.

There are other problems too. If a client of Google Syntax Net does better entity recognition than Google does, Syntax Net can’t properly accept such higher quality entity recognition. For example, a domain specific application might be able to do better entity recognition. So how can Google Syntax incorporate this? It can’t. And a Googler said as much on a post in response to someone who had this problem. The Googler said that basically it was neccesary to rewrite the text before Google Syntax Net got it, with the rewrite reflecting that superior entity recognition.

So yes, Syntax Net has its limitations.

But I would be low to criticize it for not providing semantics. Frankly, I don’t know of any system that does. I still think it is up to the application to figure out how it can extract semantics for NLP utterances, and make some kind of sense of them.

In many cases, I expect that extraction to be pretty primitive.

Yes, we need much better methods of extracting semantics.

 

 
  [ # 3 ]

Hi Alaric and others. Extracting meaning with respect to a world model is the purpose of Narwhal (“narrow world language processing”).

Please see: https://github.com/peterwaksman/Narwhal

It is probably not “ready for prime time” but it might be worth a look. Currently we are polishing and adding documentation.

The real barrier to use is how to discover the right keywords and topic categories - for a selected “narrow world”. You can do it manually and learn a lot about your subject in the process but I believe that is where POS should be deployed - as a way of bootstrapping some keywords into a mechanism that discovers others. In other words - start with meanings then work backwards towards the parts of speech.

If you are interested, drop me a line.

 

 

 
  [ # 4 ]

Hi Peter,

I took a brief look at Narwhal.

Narwhal does, at least, attempt to find meaningful patterns in text. That’s a good thing, even though the patterns have to be declared explicitly.

What would be interesting if there were a way of finding examples of extracted semantic information in a large text corpus. If that existed, then such patterns might be discovered through machine learning. That would scale much more nicely.

I am not personally aware of any such text corpus.

 

 

 

 

 
  [ # 5 ]

Good point. I think there might be such a corpus within Berkeley’s “FrameNet”.

 

 
  login or register to react