[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hebrew support for wordtrans



On Sun, 18 Nov 2001, guy keren wrote:

>
> On Sun, 18 Nov 2001, Ricardo Villalba wrote:
>
> > In first place I was using the "ISO 8859-8" text codec in Qt 2 for
> > converting hebrew text to unicode, but now thiscodec seems that doesn't
> > work in Qt 3 (I've got only one hebrew letter for each hebrew text).
> >
> > So I used with Qt 3 the "ISO 8859-8-I" text codec and seems to work but
> > the results differ to those in Qt 2.
> >
> > I'm attaching two pictures, one from qwordtrans compiled with Qt 2 (and
> > using internally "ISO 8859-8") and another one from qwordtrans compiled
> > with Qt (which uses "ISO 8859-8-I"). Which text is correct?
>
> actually, none of them looks fully correct, but the qt2 version seems
> better - it puts all words in the right order, but reverses english-letter
> words inside the linx (see the 'XINU' part). the qt3 version, on the other
> hand, makes the word ordering itself incorrectly.

It is not that bad. Besides the problem with the English text ("UNIX"
showed in reverse) the only problem is that the bidi base direction is LTR
(Left To Right, e.g: English). If you had to give a parameter of base
direction you may be safer by giving "Neutral" as the base direction.

In case you want to handle hebrew/arabic code slightly diffeernt:

The bidi renderer (QT3, in this case) has to know something about the
context of the text that it renders. It can be:

* part of a bigger chunk of LTR text. In this case any LTR character
breaks a sequence or RTL (Right-To-Left, e.g: Hebrew) characters to
subsequences that are "reversed" seperately.

* Part of a bigger chunk of RTL text. Similar, but here RTL is the
dominant.

* Neutral: either RTL or LTR. determained by the first character which has
a "direction" (e.g: an english letter or a hebrew letter)

So basically neutral is the safest choice, but can cause unexpected
results, in case the first word in the definition happens to be an english
one. In that case, you may add the character RLM (Right-to-Left-Mark,
0x200f) to the beginning of the text. This is a zero-width character with
RTL direction.

Alternatively, if you want to mark some text sequnce as RTL, you can
prefix it with RLE (RL Embed, 0x202b) and postfix it with PDF (Pop
Directional Formatting, 0x202c). This will make that text sequnce a
seperate RTL sequnce, independent of the context.

>
> it _looks_ like some usage of fribidi on the string, before adding it to
> the widget, would help - perhaps soemone else here could sketch the
> correct code.

There was a post to ivrix describing a possible solution (can't find the
link at the moment). This should ideally be done on the code that reads
the babylon dictionary, so that that the text coming out of it could be
rendered by any renderer that uses the unicode bidi algorithm (wqt3, bidi
xterm, etc.)

-- 
Tzafrir Cohen                        /"\
mailto:tzafrir@technion.ac.il        \ /  ASCII Ribbon Campaign
Taub 229, 04-829-3942                 X   Against  HTML  Mail
http://www.technion.ac.il/~tzafrir   / \


=================================================================
To unsubscribe, send mail to linux-il-request@linux.org.il with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail linux-il-request@linux.org.il