This webpage is temporary, until we get our scripts up and
you'll find a directory with the dictionary files. The files
are of the format xYYY.dct where YYY is a unicode value
corresponding to the first code point of the words in that
file. This a quick way of making a hash table so that any
search algorithm used doesn't have to load up a 3 MB file each
time it wants to look up a word.
There is a tarred
gzipped archive of all these files in the same directory.
archives is an interesting place to look through to see
how the project is coming along.
Following email from Kaushik states the goal of this
project. He has also provided a provided an initial word list
of ~6000. You can see a screenshot of some of these words here.
From firstname.lastname@example.org Sun Oct 13 05:44:21 2002
Date: Thu, 3 Oct 2002 08:17:36 -0400 (EDT)
From: Kaushik Ghose <kghose@...>
1. Bangla dictionary
2. Webpage interface to bangla dictionary
3. CD version = offline version of webinterface
4. Various converters to turn bangla dictionary into say ISCII, higher
ascii for display in other fonts
5. Various interface programs a) a dictionary GUI, b) a commandline
version of the GUI (can act as spellchecker for other progs)
1. Base dictionary in unicode distributed as a tarred gzipped
plaintext file, along with some interface programs
2. Dictionary in XML format cut into several files (by alphabet ?) under a
3. A hash table or some sort of other helper for sorting/searching
presented as a separate file which will have pointers of where to look in
4. GUI, very simple layout format as mentioned in a previous mail for
webpage. Basically the GUI will be standalone app for a) looking up words
tec. b) offline data entry
5. some sort of CVS for the files that we create update
6. Data entry :
a) offline : people use the GUI to work on words etc and the when they are
done they upload it into CVS
b) online : we have a webinterface and the updates are sent to CVS by the
cgi script that runs the interface
Basically this is the bangla dictionary version of the seti@home project
We've got some smart cs pepople out there, whats the wisdom ?
I'm thinking if we setup a sourceforgepage we get a website and CVS
CVS looks like an ideal thing to put the growing dictionary under
especially since many people will be hacking away at it at different times
Should I go ahead and register a new project or can we use the CVS
resources of bengalinux. Taneem, I don't want to hijack what you started
? would this fit in as one of the things bengalinux does ?
1. Whitepaper ? Is that what they call it. Put up a spec sheet so that
anyone wanting to join up can pick a task and hack away at it.
2. webpage I volunteer to work on this if no one else wants to. basically
we can put up the spec sheet status of the project and once it is in some
shape - put up the webinterface to the dictionary.