BeautifulSoup 기초

2022. 7. 10. 20:38

<참조> ...

BeautifulSoup 기초

1. BeautifulSoup

BeautifulSoup(뷰티플스프, 뷰티플슾)는 웹 사이트나 html 문서를 분석(파싱)할 때 사용한다.

html = """
<html>
    <head>
    </head>
        <body>
            <p> hello, wolrd! </p>
            <p> by sualchi </p>
        </body>
</html>
"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser') # html은 분석할 문서, 'html.parser'는 파싱 방식

for p in soup.find('p'):
    print(p)

html 문서에서 p 태그만 출력한다.

find() 함수는 하나만 찾아서 돌려준다.

다음 예제처럼 find_all() 함수를 쓰면 모든 p 태그가 출력된다.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser') # <- html_doc은 파싱할 문서고, 'html.parser'는 파싱 방식이에요.

for p in soup.find_all('p'):
print(p)

2. find, select 차이

Beautiful Soup는 두 방식으로 html 문서를 분석한다.

- find 계열 : html tag를 이용하여 찾는다,

- select 계열 : css를 이용하여 찾는다.

find : 1개 태그 찾기 (가장 먼저 찾은 것)
find_all : 모든 태그 찾기
select_one : 1개 태그 찾기 (가장 먼저 찾은 것)
select : 모든 태그 찾기

import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.naver.com')
soup = BeautifulSoup(response.text, 'html.parser')
for p in soup.select('p'):
    print(p)
for link in soup.select('a'):
    print(link.get('href')) # a 태그 href 전부 찾기

저작자표시 비영리 변경금지 (새창열림)

'웹 크롤링, 스크래핑' 카테고리의 다른 글

셀레니움, 드롭다운 Select 상자 항목 선택 (0)	2022.07.11
네이버 블로그 한 페이지 제목 가져오기 (iframe 크롤링) (0)	2022.07.10
셀레니움, 요소(어트리뷰트) 속성 검사, 설정, 삭제 (0)	2022.07.10
셀레니움, text, innerText, innerHTML 차이, 테이블 예제 비교 (0)	2022.07.10
셀레니움, get_attribute()로 속성 값 얻기 (0)	2022.07.10

수알치 블로그