Saturday, October 28, 2000

14651 table update for Unicode 3.0 (last installment)


WG20 folks,


The complete (draft) version is now up:


-rw-r--r--   1 sc22wg20 3960       675940 Oct 26 03:06 symdump-3.0.1d1.txt

-rw-r--r--   1 sc22wg20 3960       777655 Oct 27 22:37 symdump-3.0.1d2.txt

-rw-r--r--   1 sc22wg20 3960       994805 Oct 28 07:15 symdump-3.0.1d3.txt


I left the two earlier versions, but make sure you pick up the

latest (and biggest one) for review at the meeting.


This version is now complete for the Unicode 3.0 (= ISO/IEC 10646-1:2000)



Also, I found and fixed the sifter bug that had led to funny weights for

Hebrew and Arabic letters with vowelings or other marks on them.




I had to throw up my hands for Mongolian. I made an effort to figure

out how to interdigitate standard Mongolian with Todo, Sibe, Manchu,

and Ali Gali, but we are going to need some expert guidance on this



Also, the treatment of Myanmar and Khmer is very preliminary. I followed

the general Brahmi model (and stayed reasonably close to the charts),

but again, we will need some expert input here.


Some punctuation, especially for the newly encoded Asian scripts, will

need some discussion. I took a first stab, but some things are not

that well-defined.


CJK radicals (and Yi radicals) for that matter are treated inconsistently,

because of the property differences between symbols and ideographs. We'll

need to decide what to do with these.


Some statistics of interest. The complete table now assigns 6951 distinct

primary weights (not counting Han and Hangul) and 346 distinct

secondary weights. 10,370 new characters are accounted for in the new

table, which has grown from 649,531 bytes (10,472 lines) to

994,805 bytes (17,770 lines).


Have fun reviewing and see you all next week!