This page attempts to justify our decision to choose unicode as the format to represent Bengali text, and to provide some pointers to those interested in knowing more about Unicode. As such, it is not directly related to the rest of the project, and it is certainly not necessary to understand Unicode in order to follow the rest of the stuff.
In a sense, Unicode is the only possible choice for a project of this nature, since it is the only open standard, as far as we know, that attempts to recognize and unambiguously and systematically define an ASCII-like standard covering all languages (well, not all, but...). There have been other efforts in the past to address the problem of non-latin scripts, some of them purely hacks, some more systematic, but none as widely recognized and as pervasive as Unicode.
The problem for Indic languages is far more complex compared to European languages. Unlike Latin-based scripts, here there is a very basic distinction between a sequence of characters, and the rendering of these characters in a visually acceptable form. It should be noted that Unicode says little about how to actually display a sequence of abstract characters. (In theory, that is a problem for the application interpreting and rendering the text. In practice, the most systematic approach to that problem is via Opentype layout tables in opentype fonts.)
Unicode attempts to list all alphabetic characters (as opposed to glyphs, the visual form representing one or more characters) used in all the major languages of the world. Different groups of languages use different sets of characters, many characters are common to more than one language (e.g., Hindi and Sanskrit use the same character set; the French character set is a superset of the English one). Unicode tries to list all these characters.
Unicode is similar to ASCII in the sense that it assigns integer values to each character in the list. This value is usually a 16-bit integer (as opposed to 8-bit in ASCII), thus the integer value can be between 0 and 65535 (there is also a 32-bit extension planned).
It is most common to refer to unicode characters by their 4-digit hexadecimal representations (0000 to FFFF). The numbers 0 (0000) to 128 (007F) correspond exactly to their ASCII counterparts. Bengali characters are numbered from 0980 to 09FF (Devanagari from 0900 to 097F). The correspondence between the integer values and the actual characters can be found at Unicode's website.
As mentioned above, unicode deals with the characters of the alphabet as abstract entities, and is not concerned with their visual representation. An example would make this precise : consider juktakshars in Bengali / Devanagari. These are combinations of two or more consonants which are represented in the script by a single glyph, a specific example being `kra' as in `chakra'. A typographer would think of this as a distinct character, but Unicode would represent it as 'ka' + 'hasanta' + 'ra'.
The advantage of unicode is that unicode bengali characters are uniquely and unquestionably bengali. Even though the question of how they should be displayed is not clear, the intended contents of the file (as bengali text) is uniquely defined.
Computer files contain 8-bit characters. How can they represent 16-bit Unicode characters ? Obviously, some sort of encoding has to be used. The most common encoding is what is known as UTF-8. Details are unimportant (to give a taste, the first 128 8-bit characters represent the corresponding Unicode characters; all bengali characters are represented by sequences of 3 8-bit characters). Details on the exact algorithm used can be found here.
The other links provided here, especially the Unicode FAQ, should tell you everything else you ever wanted to know about Unicode.