웹 사이트 HTML 코드를 다운로드하여 html 파일로 저장하기

2022. 7. 30. 14:10

웹 사이트 HTML 코드를 다운로드하여 html 파일로 저장하기

글. 수알치 오상문

from datetime import datetime
import asyncio
import pathlib
from aiohttp import ClientSession  # pip install aiohttp

async def main():
    url = "https://www.daum.net"
    html_body = ""
    async with ClientSession() as session:
        async with session.get(url) as response:
            html_body = await response.read()
            return html_body
    return None

html_data = asyncio.run(main())

now = datetime.now()
filename = now.strftime("%Y%m%d_%H%M%S") + '.html'

# 현재 경로의 하위 snapshots 디렉터리에 html 파일을 저장한다.
# 파일명은 연월일_시분초.html 형식이다. 예; 20200730_140105.html
output_dir = pathlib.Path().resolve() / "snapshots" # 하위 디렉터리 지정
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / filename # 저장할 디렉터리와 파일명 결합
output_file.write_text(html_data.decode(), encoding='utf-8') # encoding='utf-8' 없으면 에러날 수도 있음

[실행 결과] snapshots 폴더에 파일이 저장된 것을 확인할 수 있다.

[실행 결과] 저장된 HTML 파일 내용 확인

저작자표시 비영리 변경금지 (새창열림)

'웹 크롤링, 스크래핑' 카테고리의 다른 글

파이썬, 크롤링 스레드 예제 (BeautifulSoup, ThreadPoolExecutor) (0)	2022.07.30
비동기로 여러 사이트 접속하여 HTML 가져와서 파일 저장하기 (0)	2022.07.30
셀레니움, Firefox 웹드라이버 다운로드 사이트 (0)	2022.07.28
파이썬, 셀레니움 특정 요소 기다리기 & 자바스크립트로 로그인하기 (0)	2022.07.27
파이썬, 셀레니움 기능을 클래스로 만드는 예 (0)	2022.07.24

수알치 블로그