|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi community
I’m new here, actually I’m kinda novice in programming also, I have some basic understanding of Databases Normalization (MySQL) n high level programing languages general concepts (Python).
I’ve come because I would like to ask for your help, about how to implement a Chatscript bot in Spanish, if it is possible.
I’ve searched about opensource Chatbots, and I came up with AIML, Rivescript n Chatscript. I read about their capabilities, some comparison reviews, and read some part of their manuals.
So I decided to go with Chatscript. Right now I have finished to read the “ChatScript Basic User Manual”, and I have found various doubts about the viability to implement a ChatScript Chatbot that talk Spanish.
Summarize below are my doubts so far from the “ChatScript Basic User Manual”:
Tilde words refer to a concept set of words, a list of words that approximates the ~word. E.g., ~like means any of a number of words that mean to like something. ~animals means any of a large list of names of animals.
In addition to fixed sets (over 1600 of them), the system automatically defines a bunch of
dictionary-based sets. These include:
parts-of-speech like ~noun (see POS-Tagging manual)
These generic interjections (which are open to author control via interjections.txt) are:
~yes,~no,~emomaybe,~emohello,~emogoodbye,~emohowzit,~emothanks,
For nouns, the canonical form is the singular. So if your pattern is:
?: (dog) I have a cat
this will respond equally to I like dogs and I have a dog. Whereas the pattern
?: (dogs) I have a cat
will only respond to I like dogs but not to I have a dog.
For verbs, the canonical form is the infinitive tense. If your pattern is:
?: (be *1 correct) Yes.
This will respond equally to Was it correct? and Are you correct? and Is she correct?.
Personal pronouns like me, my, myself, mine move to the subject form I, while whom,
whomever, whoever, whose shift to who and anyone somebody anybody become
someone and whatever becomes what, whenever becomes when, whichever becomes
which.
The file canonical.txt in LIVEDATA controls lots of these.
Also a topic I read from this community about some issues that appeared when trying to implement a Spanish Chatbot in AIML.
“I’m trying to figure out how to implement those tags in Spanish
certainly it’s more difficult than in English, because you have to transform the verbs and this can be a really hard task (mainly for the irregular ones).
Another added difficulty is that the pronoun in Spanish is implicit, so sometimes it’s hard to know if we’re talking about I or YOU.
For example, a valid pattern in English like ‘I EAT *’ in Spanish can’t be translated just as ‘YO COMO *’ because a normal user won’t write YO(=I) but just the verb ‘COMO *’ and then, from the verb form, we can deduce he’s talking about himself.”
https://www.chatbots.org/ai_zone/viewthread/1123/
Also this topic I read from this community about some issues when trying to use Dutch
https://www.chatbots.org/ai_zone/viewthread/1927/
|
|
|
|
|
Posted: Feb 12, 2015 |
[ # 1 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
ChatScript natively supports English. And it supports UTF8. So writing in other languages is feasible but with more work. Currently we have someone using CS for Chinese, Tamil, Spanish (AI Soy Robotics), German, and who knows what else. But large swathes of automatic language support are not there and you have to create what you need for that. At it’s most basic, you have to define new concept sets. Since there is no base dicitionary in spanish, for canonical forms you have to define them yourself, either as concepts or using a LIVEDATA/canonical file. Irregular verbs are a pain in any language. All of the part-of-speech and parsing, that’s not something you fake. But not normally needed for regular chatbots.
|
|
|
|
|
Posted: Feb 13, 2015 |
[ # 2 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Thanks Bruce Wilcox for the great tool you developed, and your support, and congrats for the Loebner Prize.
So you mean there is not any kinda of Spanish dictionary available on internet (wordnet)?? I read that Chatscript uses an english dictionary in a kinda sublevel, if is not any posibility that I could get an spanish dictionary, do I need to disable the english one???
I was wondering if you could perhaps name the most important files or process that I need to disable or remake (translate) in order to get the new bot to work in Spanish.
Thanks Advanced.
|
|
|
|
|
Posted: Feb 13, 2015 |
[ # 3 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
there is a spanish wordnet dictionary, with a restricted license.
The ONTOLOGY folder and its concepts.top are the most important script files.
|
|
|
|
|
Posted: Feb 14, 2015 |
[ # 4 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
Hi Eduardo
As many may know, I have been developing a Spanish chatbot, the effort is really huge, you cannot use English operators, nor AIML-like constructs though the fail in Spanish due to the higly inflected language it is.
You have to get a good analyzer, eventually a stemmer and a parser, then implement all the rule-engine, may be copycating ChatScript (with teh blessing ofBruce, a good and intelligent maker, winner of many prizes) Hi Bruce!
Whish you luck, if you are willing to programm one by yourself. It took me 10 years to develop a decent chatbort for spanish. My chatbot technology has a lot of documentation (in Spanish) but is not open sourced, it pay-ware!
If you are intrigued or interested, I can send you some documents privately! (some strategic parts are published in many papers in congress including my own engineering These) just google my name.
Cheers
Andres
|
|
|
|
|
Posted: Feb 16, 2015 |
[ # 5 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Thanks Bruce,
I will read the documentation n test as I go. Hope I could make my way my own.
By the way, is it possible to disable the windows voice when chatting? (it slows down the testing process)
I tried typing “%voice” in the chat and deleting the talk.vbs file before launch it.
The first doesn’t work, and if I do the second it will show me an ‘talk’ error message everytime I chat something, that gets shown before each chatbot answer. Thanks for all Bruce.
Hi Andres Hohendahl
Yes, I’m just beginning, I thought I had searched all alternatives before I come to ChatScript, I would be glad to see your alternative too. thanks.
|
|
|
|
|
Posted: Feb 16, 2015 |
[ # 6 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
You can disable voice by any of the following:
1. remove talk.vbs from cs directory
2. remove code in simplecontrol.top the entire topic ~XPOSTPROCESS
3. saying “shut up” to it
4. adding into the outputmacro: harry() in simplecontrol this line
$shutup = 1
|
|
|
|
|
Posted: Feb 17, 2015 |
[ # 7 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
Eduardo, of course if you want to make a Spanish Chatbot you must read Spanish, so email me and I’ll send you the documentation ASAP, in spanish.. be ready to read a 300 pages manual!
|
|
|
|
|
Posted: Feb 17, 2015 |
[ # 8 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Thanks Bruce, it worked.
Ok Andres, I sent you an email.
|
|
|
|
|
Posted: Mar 4, 2015 |
[ # 9 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi Bruce, I read the basic user manual, the advanced user manual n the fact manual
I got to confess that I did understand only half of the stuff in the advanced manual, n the fact manual
I don’t know why, but when I did the TEST excersice, and the HAROLD exercise (creating Harold bot from Harry bot, explained in the manual) all worked ok, I even replace all the HAROLD outputs with new sentences in spanish including “ñ” letters and á, é, í, ó, ú and it worked fine.
But after few days I have tried again my spanish HAROLD bot, and it can not output “ñ” or á, é, í, ó, ú letters, instead it output some kind of extrange codes (like ±), why is that?? I have tried using “:reset” “:build” but it didn’t work, what can I do???
By the way is it possible to replace the english check spelling for a spanish check spelling??
Thanks Advanced.
|
|
|
|
|
Posted: Mar 4, 2015 |
[ # 10 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
1. the output text window of the windows version cannot display all foreign characters even though output to the log file or on a server will be correct. flaw of visual c++ display windows.
2. there is no spanish dictionary so you cant do spanish spell check. You can disable english spell check.
|
|
|
|
|
Posted: Mar 4, 2015 |
[ # 11 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Thanks Bruce,
I just could swear that I got the á, é, í, ó, ú, ñ, before in the output text window the first time I tried it, I will do some tests.
Besides, how can I access the log file (I would like to access the log file frequently)
The spell check is a great advantage, because people usually don’t spell words correctly,
Please Bruce could you tell me if there is a why to create a kinda spell checking manually?? when people jump letters order?
e.g.
BUENOS DIAS = good morning
BEUNOS DIAS = BUENOS DIAS
BUENSO DIAS = BUENOS DIAS
What about canonization? is there any way to emule it in spanish?
There was spanish wordnet dictionary, with a restricted license. Why is that used for???, to create facts?, to create synonyms?
Thanks for your support Bruce, thanks advanced.
|
|
|
|
|
Posted: Mar 4, 2015 |
[ # 12 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
I would imagine that you could possibly translate the dictionary files into Spanish. That might be a lot of work, but it could be well worth the effort.
Of course this is just a guess, so please don’t take it as-is without verifying.
|
|
|
|
|
Posted: Mar 4, 2015 |
[ # 13 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi Bruce
What can I do, if Chatscript does not react to foreign characters input, like “mañana” or any word with these letters “á, é, í, ó, ú”
e.g. If I write a rule that look for the user input “nos vemos mañana” when I test the script and type “nos vemos mañana” the script does not recognize it, so how can I perform tests as I write the script in spanish??? Thanks Advanced
PD> THanks Dave
|
|
|
|
|
Posted: Mar 5, 2015 |
[ # 14 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
If your topic files have been saved as UTF8 and your patterns have your spanish words with your various letters, chatscript will respond to input having them. But your files need to be saved as UTF8 and not ascii.
spell check refuses to try to check words with utf8 unusual characterrs
|
|
|
|
|
Posted: Mar 6, 2015 |
[ # 15 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi Bruce,
yes I had all my four files:
childhoood.top
introductions.top
keywordless.top
simplecontrol.top
saved in UTF8 in Harold folder
but when I run chatscript.exe and :build Harold (even :reset)
the engine does not recognize my pattern: mañana
why is that???? I am using windows 8.1
Thanks Advanced.
PD: lets say I complitly avoid using the á,é,í,ó,ú,ñ unusual characters, isn’t there any way to perform a spell check in spanish? I say so because is the most usefull tool, that get used every single volley.
Thanks Advanced. Bruce.
|
|
|
|