[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hebrew ANSII codes



This discussion is a little off-topic, but since it is the second
time Stas mentions my name hoping I'll answer, I'll do it:

> > There is no algorithm. Or, to be more precious, there is a bunch of.
> > Hebrew of Word and Hebrew of Wordpad are two different hebrew supports. 
> So how about establishing some contrib repository with converters for
> HebrewPad HebrewWord and etc... will save a lot of sleeping hours...
> 
> Eli, If Stanislav says that there is no standard algorithm can you please
> share with us how did you succeed to support all of them? 

It is not exact. Till the Hebrew NT 4.0, there *was* one algorithm
(ignoring the very early release of Windows 3.1. Most of the 3.1, all
the 3.11, and all the 95, use the same algorithm). The Hebrew of NT is
different, not because Microsoft loves to change algorithms, but
because of the pressure of standadization bodies. The new algorithm,
Unicode, which was forced by these bodies, is inferior and too complex
relatively to the current one (Doron, I know you disagree).

However, Word does not have a logical order Hebrew (yes! You can refer
it as a scoop!). The Hebrew version of Word, uses a font-driven
directinality. Contrary to other Hebrew supports (such as the Hebrew
Motif and the Hebrew NT), you don't have one active font (which is
English-Hebrew), but two active fonts (one for English and one for
Hebrew). When letters are used, there is no problem ("A" must be
written with an English font, and Alef must be written with a Hebrew
font). But when you use numbers, punctuations, and other neutral
characters, their behavior depends on the language of the font. This
behavior is very rare: Word is the main application to use it. There
are dozens of applications which were localized to Hebrew and don't
use this behavior (but the standard behavior), and thousands which
were not localized and inherit the Hebrew from the system (BiDi
unaware applications). I don't know about any application (except for
Word) which behaves like that. Anyway, I recommend to use Notepad
when you are not sure regarding to anything in Hebrew and/or BiDi.

Still, if you use the "correct" font for each character under Word,
the behavior will be exactly the same as the standard algorithm.

There is no way to "convert" Hebrew-English TXT files which were
prepared by Word, to the standard algorithm, since there is no hint
for the original fonts (which the characters used) in the file. The
same reason prevents us from building a perfect converter for HTML
files generated by Word. I hope that FrontPage (don't abuse;
UNFORTUNATELY, it is really a great tool) uses the standard logical
directionality.

This anomally (2 active fonts) was removed in the new Hebrew supports
(e.g. the Hebrew version of NT), which adopts, finally, our way of
BiDi (i.e. one English-Hebrew font instead of one English font and
one Hebrew font). It is the forth or the fifth principle which I use
for years and Microsoft adopts instead of their original way. They
are good students  :-)

-- 
Eli Marmor
marmor@elmar.co.il
El-Mar Software Ltd.