+ Reply to Thread
Page 2 of 2 FirstFirst 12
Results 11 to 16 of 16

Thread: Dictionary needs to be A LOT richer.

  1. #11
    Join Date
    Jul 2012
    Location
    Toulouse, France
    Posts
    18
    I wonder how Word or Open Office manage to have such a comprehensive vocabulary in every language. If that was accessible, every Android keyboard developer could create an awesome prediction model.

    In the case of Open Office, it is possible to download the dictionaries and see them as text files, but I don't know how the software uses them. In the text file you can see each word but not its plural or any other variations, so I guess that the program knows how to create every word from its "root".

    Here is the link of the dictionaries. http://extensions.openoffice.org/en/search?f[0]=field_project_tags%3A157

  2. #12
    Great suggestion by GerPis right there!

    I've been trying to get a hold of a somewhat complete Spanish word list in order to make the suggestion to Swype's developers, but I didn't think of Word™ or Oppen Office™.

    It's true, they rarely correct a real word, but I think it's because they have a very complete dictionary. I don't think they create every word from it's root. Not unlike English, Spanish has irregular and regular verbs, and the rules of conjugation aren't really that uniform. Some verbs have strange forms because of its ethimology or because they're words "borrowed" from other languages.

    Swype would be so much better with dictionaries like this!

    GerPis, how do you download the dictionaries as a text file?

  3. #13
    Join Date
    Jul 2012
    Location
    Toulouse, France
    Posts
    18
    Hi Adrian!

    Try this one, for instance http://extensions.openoffice.org/en/...panish-espaņol

    You can open the OXT file with WinRar and, inside, you'll find a DIC file that you can easily open with any text editor. By the format of the text, you will quickly realize why I suppose that words are generated from its "root". I have not thoroughly checked the text so I still do not know how the conjugation is performed. I am Spanish too, so we know what we are talking about here.

    Maybe it would be worth to open a dedicated thread for this topic... Let's see what moderators think.

  4. #14
    Thanks, GerPis

    I see what you mean by the format of the text, every word has sort of a flag at the end. After doing some digging about how these dictionaries work, I found this: http://pwet.fr/man/linux/fichiers_speciaux/hunspell. This is for English, but it's the same for Spanish.

    You were right, they do creat the words from its root. Appart from the DIC file, there's also an AFF file which contains all prefixes and sufixes accepted for a particular word based on how the word ends and the flag or flags next to it. It's a little more complex than that, but that's basically how it works.

    If somehow we could generate all words using this information (I'm specially interested in imperatives and second person preterit indicatives) and generate a simple text file, maybe we could add these words to Swype and complement its dictionary.

    I hope we can get some feedback from moderators on this regard.

  5. #15
    Join Date
    Jul 2012
    Location
    Toulouse, France
    Posts
    18
    Will we ever get a reply from Nuance? It is pretty sad how such a good keyboard receives so little dedication by its developpers...

  6. #16
    What we need is a morphology dictionary like the (GNU GPL) hunspell ones. These provide instructions on how a word should be conjugated/declined. This way every verb that conjugates regularly just needs a base form/stem and pulls its endings and other changes (like poder -> puedo). It doesn't really make sense to have 100 variations for every single verb, and i can't imagine that that's what they do. They probably just have incomplete or inaccurate morphology tables.

    These dictionaries already exist and there are very complete ones available for Spanish, such as this one here (under a GPL license, which allows commercial use):
    http://sourceforge.net/projects/gold...phologies/1.0/

+ Reply to Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts