Issues with ACIP to Unicode Conversion and MacOSX
I'm really having fun with Unicode and Tibetan. But part of this is to take a lot of ACIP encoding material that I have as well as available from AsianClassics.org and similar places.
I saw that JSkad had a conversion from ACIP to Unicode (text file). So I tried this, but the output didn't look like Unicode at all. I was using Notepad and Pages (latest), but both didn't show Tibetan Unicode fonts from the output, rather Roman letters with strange numbers.
Now, it could be an operator error, so I need to do something with the text file before using it, or something else. In case someone has ideas what is happening and how to fix this, please post a comment. Also, if you have other tools or ideas how to convert ACIP encoding to Unicode on the Macintosh platform. If I get this working, a lot of really cool Tibetan material will be posted on dharmadictionary and similar places for public access.
3 comments:
Uh-oh, I just tested to see for myself and got the same gibberish. ACIP>Unicode used to work fine. It doesn't seem like Leopard would have messed up a good thing, but I don't know what else has changed since then. I tried numerous different plain text encodings via TextEdit, but the all had different problems when converted.
That's really a shame. I was banking on ACIP>Uni on a Mac. I'll keep searching for a solution.
Thanks. I tried both the tested version of Jskad as well as last night's build. I suspect that the file header needs some specific information that this is a unicode 16 or UTF-8 file, but I'm no expert on Unicode files.
Maybe someone from the Jskad team is reading this...
I found the same thing as you when using Jskad.jar, but I did manage to get readable unicode from a ACIP file in a two step process.
First go to:
Tools→Launch Converter...→ACIP to Wylie (Text->Text)→Convert
Second:
Close the converter dialogue and open the text file that was produced by the converter and select all and copy then paste into Jskad.
Third:
Select all in Jskad, then:
Tools→Convert All→Convert Tibetan Machine Web (non-Unicode) to Unicode.
Fourth:
Select All→Copy and paste into a text file. Save the file (make sure that the encoding is UTF) and you have a Unicode file.
It's a little clumsy, but not too bad, I think.
*Except that the Unicode stackings are far from perfect - at least in Windows. I still haven't tried with Linux. (I left a message about this at http://jigtenmig.blogspot.com/2008/03/tibetan-unicode-fonts-and-this-blog.html)
Post a Comment