[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[proposal] The CRCT Standard for bidirectionality support - Version 0.01
First of all, thanks to Peter L. Peres and especially to
Stanislav Malyshev who took the trouble of reading those
inconveniently broken lines, reviewing and looking up
all sorts of related information.
Here is my proposal after revisions.
---------------------------------------------------
Departing, in a rather violent way, from Unicode specifications,
I am proposing the following standard for bidirectionality support
in the world of text processing software (such as word processors,
browsers, Motif edit boxes, etc.).
The proposal is motivated by the following:
1. Simplicity of algorithms for rendering bidirectional text.
2. Full user control when constructing bidirectional text, so
that the user can override automatic software decisions
in pathological cases.
3. Access to logical order of the text, e.g. for use as record
keys when retrieving (or sorting) database records.
4. Only three special codes need to be reserved for handling
bidirectionality (versus the seven reserved by Unicode).
This not only frees codes for other purposes, but algorithms,
which deal with three codes, are simpler than algorithms, which
deal with seven codes.
(The CRCT name, which I am proposing for the standard, has
its historical roots in the choice of the CTRL-R and CTRL-T control
characters as a way to switch the directionality of text in
VIC-20/Commodore-64 based telecommunication software
used by deaf persons for phone communications about 10-15
years ago.)
Principles:
-----------
1. There shall be complete separation between the world of text entry
and editing (as in edit boxes and word processors), and the world of
text rendition (as in browsers).
2. Text renderers shall employ no implicit methods for inferring text
direction from character codes being used. To determine direction,
they shall rely only upon explicit directionality markers embedded in
the text. Those markers may be manually entered by the user or
added automatically by text entry software, according to the properties
of the entered characters.
This is contrary to the behavior mandated by Unicode.
3. Text entry software (such as word processors) shall provide the
user with means to have full control of the directionality of any
entered text.
Text entry/editing:
-------------------
The following levels of control shall be provided:
1. Full user control - to switch the direction of text being entered
(even if the user switched from Hebrew letters to Latin letters or digits),
the user has to explicitly enter the appropriate directionality markers
around the text whose direction he wants to specify.
2. Stupid automatic control - Hebrew letters shall cause automatic
switching to RTL direction, Latin letters and digits shall cause automatic
switching to LTR direction. Any other characters shall leave the
direction as it is.
3. Intelligent automatic control - level of intelligence and performance
similar to that seen in MS-Word.
* In levels 2 and above, it shall be possible to override the
automatically-determined directionality of text by explicit insertion of
the appropriate directionality markers.
* Levels 1 and 2 are mandatory. Level 3 is optional.
* An optional command, which acts on a range of selected text, shall
rearrange the letters in the range so that the visual rendition will remain
the same yet the all the letters of the range will have the same directionality.
In other words, all directionality markers will be removed from the selected
text range.
Directionality markers:
-----------------------
1. There shall be three markers:
RTL-start
LTR-start
directionality-run-end
(Note: this is after simplification, because run-end markers are paired
with the corresponding start markers. The only reason we might need
two separate end markers is for validating the text format, but this
does not justify reservation of another code.)
2. In non-HTML documents, those markers shall be assigned the following
codes (in systems which use encodings based upon Israeli Standard 1311):
[TBD]
(Note: Microsoft uses the codes 0xFD and 0xFE to mark directionality,
and we'll need a third code as well.)
3. In HTML (and related, such as XML) documents, those markers shall
have representations similar to the following:
RTL-start = <span dir=rtl>
LTR-start = <span dir=ltr>
directionality-run-end = </span>
(See also: http://babel.alis.com:8080/web_ml/html/rfc-i18n/rfc-i18n-4.en.html)
4. The start and directionality-run-end markers must be paired with each
other. The effect of a start marker shall be to push down the directionality
of previous text (logical order speaking) and enter the desired directionality
mode. The effect of the directionality-run-end marker shall be to pop up the
previous directionality.
5. Every document shall be considered as enclosed in implicit LTR-start
and directionality-run-end markers. In other words, it shall be rendered in
LTR direction unless it has a RTL-start marker at its beginning.
Notes:
------
1. This is only the second draft. If there is agreement to the principles
presented in this proposal, but disagreement to the details - then let me
know and I'll revise accordingly.
2. I am sure that I still overlooked all kinds of subtle points while trying
to simplify the proposed standard. I'll appreciate those points being
brought to my attention.
3. The proposal is geared toward the needs of Hebrew users. Other
RTL languages may have needs which I overlooked. Again, let me
know what is needed!
4. By addition of more directionality markers, it may be possible to
extend the proposal to accommodate languages which employ both
horizontal and vertical directions.
--- Omer
E-mail: omerz@actcom.co.il
SPAM Warning: by sending me UNSOLICITED COMMERCIAL
E-MAIL (known also as "SPAM") you irrevocably agree to pay me
US$500.- plus any legal fees incurred while trying to collect this
amount due - for the service of receiving your UNSOLICITED
COMMERCIAL E-MAIL message.