BeautifulSoup 기초

수알치 2022. 7. 10. 20:38

<참조> ...

BeautifulSoup 기초

1. BeautifulSoup

BeautifulSoup(뷰티플스프, 뷰티플슾)는 웹 사이트나 html 문서를 분석(파싱)할 때 사용한다.

html = """
<html>
    <head>
    </head>
        <body>
            <p> hello, wolrd! </p>
            <p> by sualchi </p>
        </body>
</html>
"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser') # html은 분석할 문서, 'html.parser'는 파싱 방식

for p in soup.find('p'):
    print(p)

html 문서에서 p 태그만 출력한다.

find() 함수는 하나만 찾아서 돌려준다.

다음 예제처럼 find_all() 함수를 쓰면 모든 p 태그가 출력된다.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser') # <- html_doc은 파싱할 문서고, 'html.parser'는 파싱 방식이에요.

for p in soup.find_all('p'):
print(p)

2. find, select 차이

Beautiful Soup는 두 방식으로 html 문서를 분석한다.

- find 계열 : html tag를 이용하여 찾는다,

- select 계열 : css를 이용하여 찾는다.

find : 1개 태그 찾기 (가장 먼저 찾은 것)
find_all : 모든 태그 찾기
select_one : 1개 태그 찾기 (가장 먼저 찾은 것)
select : 모든 태그 찾기

import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.naver.com')
soup = BeautifulSoup(response.text, 'html.parser')
for p in soup.select('p'):
    print(p)
for link in soup.select('a'):
    print(link.get('href')) # a 태그 href 전부 찾기

저작자표시 비영리 변경금지 (새창열림)