python3 한글 처리.

python은 멀티바이트 문자처리가 짜증나게 되어있는듯..

가장 많이 쓰이는 python2, python3의 한글 처리 방법이 다름.

먼저 알아두어야할 것.

python2, python3의 문자열 처리 방법이 다름.

python2은 ascii가 기본

python3은 unicode가 기본

LINUX와 윈도우의 한글 처리 방법이 다름.

윈도우는 CP949가 기본

LINUX는 euc-kr이 기본. (.profile에 명기하기 나름이지만, 보통 euc-kr)

웹 세상에서는 utf-8이 기본.

python3 한글 처리.

1
2
3
4
5
6
import sys
reload(sys)
sys.setdefaultencoding('euc-kr') # 또는 sys.setdefaultencoding('utf-8')
a = "한글"
print( a )
print( len( a ) )
Colored by Color Scripter
cs

덤으로...

vi에서 UTF-8 한글이 깨지면 .vimrc에 아래 문장 추가.

1
2
set fileencodings=utf-8,cp949
set encoding=utf-8
cs

유닉스 쪽에서는 아래와 같은 방법이 제일 나은듯..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can not encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8' # success
export LC_ALL=ko_KR.utf8
:ko:en_US.utf8:en’ # 실패 중….
$ python -c "print (u'voil\u00e0')"
voila
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voila
 
Colored by Color Scripter
cs

'인터넷/모바일 > 머신러닝' 카테고리의 다른 글

node js 설치 (0)	2017.11.03
python : UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 해결책. (0)	2017.10.25
윈도우10에서 텐서플로우 설치하기. (0)	2017.08.08
tensorflow - textsum 테스트 실행 중 (4)	2017.03.26
vmware에서 돌아가는 우분투에 엔비디아 그래픽 드라이버 설치 (1)	2017.03.05

python3 한글 처리.

'인터넷/모바일 > 머신러닝' 카테고리의 다른 글

관련글

티스토리툴바