বাংলা পাতা
Bengali Dictionary Project

This webpage is temporary, until we get our scripts up and running...

Here you'll find a directory with the dictionary files. The files are of the format xYYY.dct where YYY is a unicode value corresponding to the first code point of the words in that file. This a quick way of making a hash table so that any search algorithm used doesn't have to load up a 3 MB file each time it wants to look up a word.

There is a tarred gzipped archive of all these files in the same directory.

The mail archives is an interesting place to look through to see how the project is coming along.

Following email from Kaushik states the goal of this project. He has also provided a provided an initial word list of ~6000. You can see a screenshot of some of these words here.

From Sun Oct 13 05:44:21 2002
Date: Thu, 3 Oct 2002 08:17:36 -0400 (EDT)
From: Kaushik Ghose <kghose@...>

1. Bangla dictionary
2. Webpage interface to bangla dictionary
3. CD version = offline version of webinterface
4. Various converters to turn bangla dictionary into say ISCII, higher
    ascii for display in other fonts
5. Various interface programs a) a dictionary GUI, b) a commandline
version of the GUI (can act as spellchecker for other progs)

1. Base dictionary in unicode distributed as a tarred gzipped
plaintext file, along with some interface programs

2. Dictionary in XML format cut into several files (by alphabet ?) under a
common directory

3. A hash table or some sort of other helper for sorting/searching
presented as a separate file which will have pointers of where to look in
other files.

4. GUI, very simple layout format as mentioned in a previous mail for
webpage. Basically the GUI will be standalone app for a) looking up words
tec. b) offline data entry

5. some sort of CVS for the files that we create update

6. Data entry :
a) offline : people use the GUI to work on words etc and the when they are
done they upload it into CVS

b) online : we have a webinterface and the updates are sent to CVS by the
cgi script that runs the interface

Basically this is the bangla dictionary version of the seti@home project

We've got some smart cs pepople out there, whats the wisdom ?

I'm thinking if we setup a sourceforgepage we get a website and CVS
CVS looks like an ideal thing to put the growing dictionary under
especially since many people will be hacking away at it at different times

Should I go ahead and register a new project or can we use the CVS
resources of bengalinux. Taneem, I don't want to hijack what you started
? would this fit in as one of the things bengalinux does ?

Immediate Todos:

1. Whitepaper ? Is that what they call it. Put up a spec sheet so that
anyone wanting to join up can pick a task and hack away at it.

2. webpage I volunteer to work on this if no one else wants to. basically
we can put up the spec sheet status of the project and once it is in some
shape - put up the webinterface to the dictionary.


