This webpage is temporary, until we get our scripts up and running...
Here you'll find a directory with the dictionary files. The files are of the format xYYY.dct where YYY is a unicode value corresponding to the first code point of the words in that file. This a quick way of making a hash table so that any search algorithm used doesn't have to load up a 3 MB file each time it wants to look up a word.
There is a tarred gzipped archive of all these files in the same directory.
The mail archives is an interesting place to look through to see how the project is coming along.
Following email from Kaushik states the goal of this project. He has also provided a provided an initial word list of ~6000. You can see a screenshot of some of these words here.
From
Sun Oct 13 05:44:21 2002 Date: Thu, 3 Oct 2002 08:17:36 -0400 (EDT) From: Kaushik Ghose <
>
Goals: 1. Bangla dictionary 2. Webpage interface to bangla dictionary 3. CD version = offline version of webinterface 4. Various converters to turn bangla dictionary into say ISCII, higher ascii for display in other fonts 5. Various interface programs a) a dictionary GUI, b) a commandline version of the GUI (can act as spellchecker for other progs)
Specs: 1. Base dictionary in unicode distributed as a tarred gzipped plaintext file, along with some interface programs
2. Dictionary in XML format cut into several files (by alphabet ?) under a common directory
3. A hash table or some sort of other helper for sorting/searching presented as a separate file which will have pointers of where to look in other files.
4. GUI, very simple layout format as mentioned in a previous mail for webpage. Basically the GUI will be standalone app for a) looking up words tec. b) offline data entry
5. some sort of CVS for the files that we create update
6. Data entry : a) offline : people use the GUI to work on words etc and the when they are done they upload it into CVS
b) online : we have a webinterface and the updates are sent to CVS by the cgi script that runs the interface
Basically this is the bangla dictionary version of the seti@home project :)
We've got some smart cs pepople out there, whats the wisdom ?
I'm thinking if we setup a sourceforgepage we get a website and CVS repository. CVS looks like an ideal thing to put the growing dictionary under especially since many people will be hacking away at it at different times !
Should I go ahead and register a new project or can we use the CVS resources of bengalinux. Taneem, I don't want to hijack what you started ? would this fit in as one of the things bengalinux does ?
Immediate Todos:
1. Whitepaper ? Is that what they call it. Put up a spec sheet so that anyone wanting to join up can pick a task and hack away at it.
2. webpage I volunteer to work on this if no one else wants to. basically we can put up the spec sheet status of the project and once it is in some shape - put up the webinterface to the dictionary.
Ankur is often involved in development based on cutting edge technology. Ankur developers were the first to come with a Bangla Open Type font, and the Ankur Bangla Live CD is often considered to be the best among the localised Live CD distributions out there. If you want to be a member of the Ankur family, please do get in touch with us at core at bengalinux org. For more information on volunteering refer to the Ankur Developers' Guide .