Issues with ACIP to Unicode Conversion and MacOSX
I'm really having fun with Unicode and Tibetan. But part of this is to take a lot of ACIP encoding material that I have as well as available from AsianClassics.org and similar places.
I saw that JSkad had a conversion from ACIP to Unicode (text file). So I tried this, but the output didn't look like Unicode at all. I was using Notepad and Pages (latest), but both didn't show Tibetan Unicode fonts from the output, rather Roman letters with strange numbers.
Now, it could be an operator error, so I need to do something with the text file before using it, or something else. In case someone has ideas what is happening and how to fix this, please post a comment. Also, if you have other tools or ideas how to convert ACIP encoding to Unicode on the Macintosh platform. If I get this working, a lot of really cool Tibetan material will be posted on dharmadictionary and similar places for public access.

6 comments:
Uh-oh, I just tested to see for myself and got the same gibberish. ACIP>Unicode used to work fine. It doesn't seem like Leopard would have messed up a good thing, but I don't know what else has changed since then. I tried numerous different plain text encodings via TextEdit, but the all had different problems when converted.
That's really a shame. I was banking on ACIP>Uni on a Mac. I'll keep searching for a solution.
Thanks. I tried both the tested version of Jskad as well as last night's build. I suspect that the file header needs some specific information that this is a unicode 16 or UTF-8 file, but I'm no expert on Unicode files.
Maybe someone from the Jskad team is reading this...
I found the same thing as you when using Jskad.jar, but I did manage to get readable unicode from a ACIP file in a two step process.
First go to:
Tools→Launch Converter...→ACIP to Wylie (Text->Text)→Convert
Second:
Close the converter dialogue and open the text file that was produced by the converter and select all and copy then paste into Jskad.
Third:
Select all in Jskad, then:
Tools→Convert All→Convert Tibetan Machine Web (non-Unicode) to Unicode.
Fourth:
Select All→Copy and paste into a text file. Save the file (make sure that the encoding is UTF) and you have a Unicode file.
It's a little clumsy, but not too bad, I think.
*Except that the Unicode stackings are far from perfect - at least in Windows. I still haven't tried with Linux. (I left a message about this at http://jigtenmig.blogspot.com/2008/03/tibetan-unicode-fonts-and-this-blog.html)
I've converted several hundred of pages of our project with JSKAD on Mac OS X
(http://www.ittm.org/projects/dataInput/
DataInputProject.htm)
If you familiar with Terminal on Mac OS X, try the following command:
java -Dthdl.acip.to.unicode.conversions.use.0F52.et.cetera = true -cp PATH/lib-vanilla/Jskad.jar org.thdl.tib.input.TibetanConverter --colors no --warning-level None --acip-to-tibetan-warning-and-error-messages long --acip-to-unicode ACIP_file.txt >> UNICODE_file.txt
Replace PATH with the JSKAD path and ACIP_file.txt is the input file and UNICODE_file.txt the output file.
BTW, JSKAD don't include a UTF-8 BOM at the beginning of the file, which is in hexa: EF BB BF.
Hope this helps,
Daniel
Thanks, this will be handy, especially for converting a large set of ACIP files using a bash on the command line.
Now, for non-programmers, this all could daunting. There's an option with MacOSX to make an icon that accepts files, and underneath it will trigger bash scripts, so if I ever had more spare time something like this would be handy for those who don't dare to open up the terminal app.
Such as wow power leveling and wow gold world of warcraft gold power leveling
Post a Comment