Python3, url 인코딩하는 이유, 인코딩/디코딩 처리

Python3, url 인코딩하는 이유, 인코딩/디코딩 처리

구글에서 한글로 검색하면 q 파라미터에 한글이 알수없는 영문+숫자 조합으로 encode된것을 확인할 수 있다.

왜 url을 인코딩하는가 ?

url은 ASCII 문자열을 이용해서만 전송될 수 있다. 즉 ASCII가 아닌 한글, 특수 문자(Unsafe, Reserved)는 두개의 16진수를 사용하는 octet형태로 encode된다.

The octet is a unit of digital information in computing and telecommunications that consists of eight bits.
The term is often used when the term byte might be ambiguous, as the byte has historically been used for storage units of a variety of sizes.
- 출처: wikipedia

다음의 Characters는 인코딩하여 사용한다.
- Unsafe Characters
- "<", ">", """(quote), "#", "{", "}", "|", "\", "^", "~", "[", "]", "`"

- RFC 1738

- Reserved Characters
- ";", "/", "?", ":", "@", "=", "&"

- RFC 1738

- whitespace

- RFC 2396

# 파이썬3에서 urllib를 사용하여 encode/decode처리할 수 있다.
from urllib.parse import unquote, quote, quote_plus, urlencode

url = "https://www.google.com/search?"
search_text = "파이썬 예제"

# urlencode
encoded_text = quote(search_text)
print(f"the text encoded is {encoded_text}")

# urldecode
decoded_text = unquote(encoded_text)
print(f"the text decoded is {decoded_text}")

print(f'a combined url is \n{url + "q="+encoded_text}')

the text encoded is %ED%8C%8C%EC%9D%B4%EC%8D%AC%20%EC%98%88%EC%A0%9C
the text decoded is 파이썬 예제
a combined url is 
https://www.google.com/search?q=%ED%8C%8C%EC%9D%B4%EC%8D%AC%20%EC%98%88%EC%A0%9C

urlencode를 사용하여 키, 값 쌍의 구조를 URI구조에 맞게 encode하여 변경한다.

>>> urlencode({"q":search_text})

'q=%ED%8C%8C%EC%9D%B4%EC%8D%AC+%EC%98%88%EC%A0%9C'

requests를 사용하여 encode / decode

# requests 를 사용한 url encode/decode처리
import requests

print(f'encode api: {requests.utils.quote(search_text)}')
print(f'decode api: {requests.utils.unquote(decoded_text)}')

encode api: %ED%8C%8C%EC%9D%B4%EC%8D%AC%20%EC%98%88%EC%A0%9C
decode api: 파이썬 예제

Encoding 설정

text = "반갑습니다."

encoded = quote(text, encoding='euc-kr')

print("unmatched encoding: ", unquote(encoded, encoding='utf-8'))
print("matched encoding: ", unquote(encoded, encoding='euc-kr'))

unmatched encoding:  �ݰ����ϴ�.
matched encoding:  반갑습니다.

quote vs quote_plus( unquote 동일 )

mix_text = "서울 맛집"

print(quote(mix_text)) # whitespace > %20
print(quote_plus(mix_text)) # whitespace > +

%EC%84%9C%EC%9A%B8%20%EB%A7%9B%EC%A7%91
%EC%84%9C%EC%9A%B8+%EB%A7%9B%EC%A7%91

_plus를 사용하면 whitespace 를 %20이 아닌 + 로 표현한다.

저작자표시

'python' 카테고리의 다른 글

Python - XML 생성/선택, 쉬운 예제( Element, SubElement, insert ) (0)	2021.10.17
Python - Konlpy 설치 오류 시 해결방법 공유 ( ImportError: DLL load failed while importing _jpype, ERROR: Could not install packages due to an OSError:) (3)	2021.04.20
Python - pip 설치 라이브러리 및 Path 확인, 버전 업데이트 하기 (0)	2021.03.25
jupyter lab(notebook) 유용한 기능 매직명령어 10가지 및 단축키 (0)	2021.03.24
Python - 멀티스레드 사용, 웹 스크래핑 비동기 처리 ( pandas_reader 주식 데이터 스크랩 ) (0)	2021.02.24

영원한패밀리

Python3, url 인코딩하는 이유, 인코딩/디코딩 처리

Python3, url 인코딩하는 이유, 인코딩/디코딩 처리

왜 url을 인코딩하는가 ?

requests를 사용하여 encode / decode

Encoding 설정

quote vs quote_plus( unquote 동일 )

'python' 카테고리의 다른 글

티스토리툴바