셀레니움, HTML Table 구조 읽기

2022. 7. 10. 11:50

<참조> https://passwd.tistory.com/187

셀레니움, HTML Table 구조 읽기

글. 수알치 오상문

1. HTML 테이블 구조 예제 1

다음과 같은 테이블을 읽어온다고 하자.

HTML 예제 코드 구조는 다음과 같다.

--------------------------------------------------------------------------------------------

<!DOCTYPE html>
<html lang="ko-kr">
<head>
<style>
table, td {
border: 1px solid #444;
}
thead, tfoot {
background-color: #aaa;
color: #111;
}
</style>
</head>

<body>
<h1> 테이블 예제 </h1>
<table>
<thead>
<tr>
<th colspan="2">음식 주문 차림표</th>
</tr>
</thead>
<tbody>
<tr>
<td>비 빔 밥 (1번줄 1번째 칸)</td>
<td>9000원 (1번줄 2번째 칸)</td>
</tr>
<tr>
<td>김치찌개 (2번줄 1번째 칸)</td>
<td>8000원 (2번줄 2번째 칸)</td>
</tr>
<tr>
<td>된장찌개 (3번줄 1번째 칸)</td>
<td>7000원 (3번줄 2번째 칸)</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="2">물은 셀프입니다.</td>
</tr>
</tfoot>
</table>
</body>
</html>

--------------------------------------------------------------------------

요약하면 HTML 테이블 구조는 다음처럼 만들어진다.

<table>
  <thead>
    <tr> <th colspan="2"> </th> The table header </tr>
  </thead>
  <tbody>
    <tr> <td> ... </td>
         <td> ... </td>
    </tr>
    <tr> <td> ... </td>
         <td> ...  </td>
    </tr>
  <tfoot>
    <tr>
       <td colspan="2"> The table footer </td>
    </tr>
  </foot>
</table>

-----------------------------------------------------------------------------

셀레니움에서 HTML 테이블 구조에서 데이터를 가져오는 예제이다.

테이블에서 thead, tbody 정보만 출력해보자.

<파이썬 크롤링 코드> 예전 셀레니움 버전

from selenium import webdriver

driver_path = 'PATH' # 크롬 웹드라이버 경로
driver = webdriver.Chrome(driver_path)

url = "" # 접속할 url 주소

TABLE_XPATH = "" # 테이블 XPATH 주소

table = driver.find_element_by_xpath(TABLE_XPATH)

thead = table.find_element_by_tag_name("thead")
thead_th = thead.find_element_by_tag_name("tr").find_elements_by_tag_name("th")

for th in thead_th:
print(th.text) # text 속성 읽어서 출력

tbody = table.find_element_by_tag_name("tbody")

for tr in tbody.find_elements_by_tag_name("tr"):
for td in tr.find_elements_by_tag_name("td"):
print(td.get_attribute("innerText")) # 내용 출력

<파이썬 크롤링 코드> 요즘 셀레니움 버전

from selenium import webdriver

from selenium.webdriver.common.by import By

driver_path = 'PATH' # 크롬 웹드라이버 경로
driver = webdriver.Chrome(driver_path) # driver_path 생략하면 현재 디렉터리에서 로딩

url = "" # 접속할 url 주소

TABLE_XPATH = "" # 테이블 XPATH 주소

table = driver.find_element(By.XPATH, TABLE_XPATH)

thead = table.find_element(By.NAME, "thead")
thead_th = thead.find_element(By.NAME, "tr").find_elements(By.NAME, "th")

for th in thead_th:
print(th.text) # text 속성 읽어서 출력

tbody = table.find_element(By.NAME, "tbody")

for tr in tbody.find_elements(By.NAME, "tr"):
for td in tr.find_elements(By.NAME, "td"):
print(td.get_attribute("innerText")) # 내용 출력

[참고] 아래 링크도 참고하세요.

https://testmanager.tistory.com/127

Selenium WebDriver를 사용하여 동적 웹 테이블 처리

웹에 게시 된 두 가지 유형의 HTML 표가 있습니다. 정적 테이블 : 데이터가 정적입니다. 즉 행과 열의 수가 고정되어 있습니다. 동적 테이블 : 데이터가 동적입니다. 즉, 행과 열의 수가 고정되어

testmanager.tistory.com

저작자표시 비영리 변경금지

'웹 크롤링, 스크래핑' 카테고리의 다른 글

셀레니움, get_attribute()로 속성 값 얻기 (0)	2022.07.10
셀레니움, XPath로 요소 찾기 표현식 (0)	2022.07.10
셀레니움, 웹브라우저 OFF 상태로 크롤링 하는 옵션 (0)	2022.07.09
selenium AttributeError 'Webdriver' object has no attribute 'find_element_by ... 에러 (1)	2022.07.09
셀레니움, 페이지 로딩 기다리기 (특정 조건) (0)	2022.07.09

수알치 블로그