|
Posted: Feb 18, 2013 |
[ # 16 ]
|
|
Member
Total posts: 16
Joined: Feb 16, 2013
|
@Dave - I thought there would be something like that already available. The way I see it everything can be represented as a concept and even though the textual representation is different between languages the meaning would still be the same. Maybe it isn’t quite that way in other languages; I am only fluent in English so I wouldn’t know.
@Andrew - You have links for everything don’t you? That’s Awesome! You wouldn’t know if there are any open-source projects using that sort of system or something similar would you?
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 17 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Andrew is a veritable cornucopia of useful knowledge and wisdom, and has provided not only myself, but the entire community with very valuable insights and information.
As for the possibility of what you are looking for already existing, it’s entirely possible. Just because I’m not aware of such a thing doesn’t mean it’s not out there, somewhere.
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 18 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
By representing everything as a concept, do you mean something like this: http://bragisoft.com/2013/01/objects-and-assets-abstract-and-concrete/
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 19 ]
|
|
Member
Total posts: 16
Joined: Feb 16, 2013
|
Interesting read. Yes, that is what I meant. It sounds like it could work well in theory but in practice I don’t know. Must not be too bad though if it has already been implemented…
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 20 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
I’ve tested this on wordnet, which gives no problems in this area.
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 21 ]
|
|
Member
Total posts: 16
Joined: Feb 16, 2013
|
That is good to know. Since you have more experience with this idea than I, is it something that you think would work well as a general knowledge base of facts and their relationships?
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 22 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
Yes. There are some things to watch out for: it’s a bit tricky to find a correct balance between clustering/grouping/indexing of data for faster searching - memory and disc streaming for very large data sets and processing speed.
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 23 ]
|
|
Experienced member
Total posts: 55
Joined: Mar 21, 2011
|
If multi-lingual representation is important to you, take a look at ConceptNet (http://conceptnet5.media.mit.edu/). It has good multi-lingual links between concepts.
For open-source NLP, the alternative to OpenNLP is StanfordNLP. I haven’t compared them but the Stanford parser and named-entity recognizer are fairly well developed and documented (in Java). The parser is available online, so you can test it. Be warned though, they are memory-hungry. If anyone has compared these two, I would be interested to know.
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 24 ]
|
|
Member
Total posts: 16
Joined: Feb 16, 2013
|
At first it won’t matter very much to me, however if at any point in the future I decide I would like to implement multi-lingual capabilities I don’t want to be hindered because I did not adequately plan for it and have to restructure a majority of the code or database because of it.
|
|
|
|
|
Posted: Feb 18, 2013 |
[ # 25 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Chad, I’m going through that exact challenge with Program O version 3. It seems that supporting languages other than English involves a bit more than deciding which character set to use.
|
|
|
|
|
Posted: Feb 20, 2013 |
[ # 26 ]
|
|
Senior member
Total posts: 473
Joined: Aug 28, 2010
|
@Chad all the links that I post are for open source software. One of the links that I gave you is actually for a catalog of many different language generations systems.
However today I believe I have struck gold. Take a look at the Grammatical Framework. It is a high level language for writing grammars and supports various operations such as parsing and generation. It is completely open source and it comes preloaded already with grammar libraries for no less than thirty different natural languages! Even if they are just basic libraries they will serve as an excellent starting point for numerous projects.
http://www.grammaticalframework.org/
|
|
|
|
|
Posted: Feb 22, 2013 |
[ # 27 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Neat thread! And great links, Andrew.
I don’t think phrase recognition can be acheived in a vacuum—it has to include the context of what’s being said. A lookup list works around this because the creator of the list essentially decides the most common context for, using Chad’s example, “United States” or “Atlantic Ocean”. This context is then assumed for all user input.
However if you want the bot to determine for itself if a phrase represents a single object, then the context must come from the bot as well. In my own project, I employ a full parsing scheme. Ambiguous noun phrases are only resolved if one grouping leads to a “more complete” parse than another grouping. Parses are also weighted according to what phrases they contain—phrases that have been encountered before are considered more likely to be correct than new constructions. In this way, the bot is effectively building its own database of trusted phrases.
Edit: Ooo—I haven’t posted for a little while, but it seems like the site is running pretty speedily at the moment. Nice.
|
|
|
|
|
Posted: Feb 22, 2013 |
[ # 28 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
C R Hunt - Feb 22, 2013: Edit: Ooo—I haven’t posted for a little while, but it seems like the site is running pretty speedily at the moment. Nice.
SHHHH!!! Don’t jinx it, CR!
|
|
|
|
|
Posted: Feb 22, 2013 |
[ # 29 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Chad J - Feb 18, 2013: @Dave - I thought there would be something like that already available. The way I see it everything can be represented as a concept and even though the textual representation is different between languages the meaning would still be the same. Maybe it isn’t quite that way in other languages; I am only fluent in English so I wouldn’t know.
Google’s translate features work from a premise similar to this. Instead of considering words and grammars, the program is trained by reading translated copies of the same material. Whatever groups of words often appear in proximity together must correlate to the same concept. The results of this approach are pretty good—far beyond what other auto-translation software has acheived at least.
But the more I learn German, the more I am convinced that language also influences the way we think about events and therefore the concepts we use. For example, sentences in English seem more likely to be constructed around processes rather than objects and vice versa in German. So whereas an English speaker might think, “thing one actioned thing two” a German speaker would instead frame things as, “thing two, which has been actioned by thing one” (only, because it’s German, they would have one long word to represent this: “der Thingtwoactionedbythingone”).
Not all languages are so disparate—I originally studied French in school and I recall the grammatical constructions being generally similar to English (adjective placement notwithstanding).
So does thinking about processes rather than objects really constitute different “concepts”? A “concept” is such a nebulous idea anyway, but I would argue it does. For example, I once read an article about a study (found it!) that considered the effect of grammar on assigning guilt. Whether or not a language tended to use more passive or active forms of expression changed the way those witnessing an event interpreted it. Their ability to recall who performed what action depended on whether the viewer interpreted that action as intentional or not.
Anyone who’s actually bilingual want to weigh in?
Edit: Dave, I even used the “preview” feature and it worked! In good time too! Okay, I won’t say any more. But yay!
|
|
|
|
|
Posted: Feb 22, 2013 |
[ # 30 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
In my own project, I employ a full parsing scheme. Ambiguous noun phrases are only resolved if one grouping leads to a “more complete” parse than another grouping. Parses are also weighted according to what phrases they contain
That’s very similar to what I do. I first create all possible combinations that can be build with the input sentence in relationship to all the known ‘phrases’ (n-grams basically). Then the parser figures out which is the best fit against the set of known patterns.
But the more I learn German, the more I am convinced that language also influences the way we think about events and therefore the concepts we use.
I think perhaps it’s the other way round (or a feedback-loop). A good example is the ‘dutch’ language. It’s basically spoken in 2 regions: in the Netherlands and a part of Belgium. Though the language is the same (vocabulary and grammar are shared), the cultures are a bit different (though not much). This is often expressed in the way that the language is used: the type of words that are used and such. I’d say that the Dutch-Dutch is harder, more direct (usually also louder) while the Belgian-Dutch is more indirect, softer, perhaps a little more surreal.
|
|
|
|