Quick Start Guide for KoreanCodecs
----------------------------------

$Id: quick_start.txt,v 1.2 2002/04/28 09:10:09 perky Exp $

(this document has EUC-KR characters)


1. Installation

 * Normal

   $ python setup.py install


 * Without aliases (without extra path on python2.0)

   $ python setup.py install --without-aliases


 * Without C Extensions

   $ python setup.py install --without-extension


2. Encoding/Decoding

 * EUC-KR Codec (the most widely-used Korean encoding)

>>> unicode("ѱ .", "euc-kr")
u'\ud55c\uae00\uc774 \uc88b\uc544.'
>>> print _.encode("euc-kr")
ѱ .

 * CP949 Codec (yet another widely-used encoding among M$ windows users)

>>> unicode("    ƴ϶   ̶ Ͽϴ  Գ ", "cp949")
u'\uc7a5\uc0ac\ub294 \ub3c8\uc744 ...'
>>> print _[-10:].encode("cp949")
ϴ  Գ 

 * Johab, Unijohab and ISO-2022-KR Codecs

  (same way with above described)

 * Qwerty2bul Codec

>>> unicode(" ̴ ", "euc-kr")
u'\uc6d0\uc22d\uc774 \uc5c9\ub369\uc774\ub294 \ube68\uac1c'
>>> _.encode("qwerty2bul")
'dnjstnddl djdejddlsms Qkfro'
>>> unicode("Qkfrks rjtdms tkrhk tkrhksms aktdlTdj", "qwerty2bul")
u'\ube68\uac04 \uac83\uc740 \uc0ac\uacfc \uc0ac\uacfc\ub294 \ub9db\uc788\uc5b4'
>>> print _.encode("euc-kr")
    ־


3. StreamReader, StreamWriter

>>> import codecs
>>> f = codecs.open("quick_start.txt", encoding="euc-kr")
>>> lines = f.readlines()
>>> len(lines)
103
>>> lines[25]
u'2. Encoding/Decoding\n'
>>> lines[96]
u'>>> print hangul.format(fmt, result=u("\ub7ec\uc2a4\ud2f0\ub124\uc77c"), subj1=u("\uc704\uc2a4\ud0a4"), subj2=u("\ub4dc\ub78c\ubdd4")).encode("euc-kr")\n'

>>> f = codecs.open("testing.txt", "w", encoding="qwerty2bul")
>>> f.write(unicode("ΰٷ ִ", "euc-kr"))
>>> f.close()
>>> open("testing.txt").read()
'vldrjgnlffp aktdlTek'


4. Hangul Module

>>> from korean import hangul
>>> dir(hangul)
['A', 'AE', 'B', 'BB', 'BS', 'C', 'CHOSUNG_FILLER', ... ]
>>> print hangul.DD.encode("euc-kr"), hangul.GS.encode("euc-kr")
 
>>> print u', '.join(hangul.Chosung).encode('euc-kr')
, , , , , , , , , , , , , , , , , , 
>>> hangul.ishangul(u'A')
False
>>> hangul.ishangul(unicode("", "euc-kr"))
True
>>> hangul.isJaeum(unicode("Ƽ", "euc-kr"))
False
>>> hangul.isJaeum(unicode("", "euc-kr"))
True

>>> u = lambda x: unicode(x, "euc-kr")
>>> print u', '.join(hangul.split(u("Ħ"))).encode("euc-kr")
, , 
>>> print hangul.join([hangul.J, hangul.WA, hangul.L]).encode("euc-kr")

>>> print hangul.join([hangul.K, hangul.WAE, hangul.Null]).encode("euc-kr")


>>> u("ζ ҸӴϰ ζ ")
u'\uaf2c\ubd80\ub791 ...
>>> hangul.disjoint(_)
u'\u1101\u1169\u1107\u116e ...
>>> hangul.conjoin(_)
u'\uaf2c\ubd80\ub791 ...

>>> fmt = u("츮  %s(), %s  %s ?")
>>> print hangul.format(fmt, u("ƶ"), u("ƺ"), u("")).encode("euc-kr")
츮  ƶ, ƺ   ?
>>> print hangul.format(fmt, u(""), u(""), u("")).encode("euc-kr")
츮  ,    ?

>>> fmt = u("%(subj1)s %(subj2)s ġ %(result)s ȴ.")
>>> print hangul.format(fmt, result=u("Ƽ"), subj1=u("Ű"), subj2=u("")).encode("euc-kr")
Ű ߸ ġ Ƽ ȴ.
>>> print hangul.format(fmt, subj2=u("ĭ"), subj1=u("ĭ"), result=u("66% ĭ")).encode("euc-kr")
ĭ ĭ ġ 66% ḭ̆ ȴ.


Yes, you got it!
