[Python] 구글 이미지 웹크롤링(Selenium)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Attention, Please!!!

[Python] 구글 이미지 웹크롤링(Selenium) 본문

Algorithm/Python

[Python] 구글 이미지 웹크롤링(Selenium)

G3LU 2024. 3. 5. 14:27

※ 본 게시물은 Selenium 라이브러리를 활용하여 이미지 웹크롤링하는 방법에 대해 다룹니다.

📌 Import Library

#pip install selenium (라이브러리 다운로드) 
from selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support 
import expected_conditions as EC
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service 
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time

📌 크롬 Browser 열기

service = Service(executable_path = "chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.google.co.kr/imghp?hl=ko&ogbl")

위의 코드를 실행하게 되면, 아래의 사진과 같이 크롬 브라우저가 열리는걸 확인할 수 있습니다.

📌 크롬 Browser에서 검색하기

브라우저를 열어 검색창에 검색어를 입력하고 엔터를 쳐서 검색하는 것이 대부분입니다. 이를 자동화하는 코드라고 생각하시면 됩니다.

여기에서 구글의 검색창 element <tag>를 살펴보면 class = 'gLFyf' 이라는 속성 값을 가지고 있는 것을 확인할 수 있습니다. 이에find_element() 이라는 메소드를 사용하여 페이지의 단일 element에 접근 해야합니다. 이는 검색창의 검색엔진을 찾는 행위라고 볼 수 있습니다.

input_element = driver.find_element(By.CLASS_NAME, "gLFyf")
input_element.send_keys("Mercedes" + Keys.ENTER)

📌 Scroll Down & View More

원하는 사진이 나타날때 까지 밑으로 계속해서 Scroll Down 하였던 경험이 있을 것 입니다. 하지만 그럼에도 불구하고 원하는 사진이 맨 아래에도 없다면 "더보기" 라는 것을 누른 경험 또한 있으실 겁니다. 이미지 웹크롤링을 통해 Image Classification을 적용한다면, 아무리 적어도 대략 몇 백개의 사진이 있어야 합니다. 이러한 과정을 자동화하기 위한 코드는 아래와 같습니다.

#Scroll Down
elem = driver.find_element(By.TAG_NAME, 'body')
for i in range(60):
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.1)

#View More
try:
    view_more_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'mye4qd')))
    view_more_button.click()
    for i in range(80):
        elem.send_keys(Keys.PAGE_DOWN)
        time.sleep(0.1)
except:
    pass

📌 이미지 다운로드하기

구글 이미지 검색 결과의 각 썸네일 이미지 class name은 ".rg_i.Q4LuWd" 임을 확인할 수 있습니다. 이에 find_elements() 메소드를 사용하여 해당 class name를 가진 모든 요소를 image로 불러오게 됩니다.

images = driver.find_elements(By.CSS_SELECTOR, ".rg_i.Q4LuWd")
links = [image.get_attribute('src') for image in images if image.get_attribute('src') is not None]
print('찾은 이미지의 개수 : ', len(links))

📌 경로 설정하기

import urllib.request

for k, i in enumerate(links):
    url = i
    urllib.request.urlretrieve(url, '자신의 경로' + str(k) + '.jpg')


print('다운로드를 완료하였습니다.')

'Algorithm > Python' 카테고리의 다른 글

알고리즘 입력 받기 / sys.stdin.readline() (1)	2024.02.05

'Algorithm/Python' Related Articles

알고리즘 입력 받기 / sys.stdin.readline() 2024.02.05

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Attention, Please!!!

Attention, Please!!!

[Python] 구글 이미지 웹크롤링(Selenium) 본문

[Python] 구글 이미지 웹크롤링(Selenium)

📌 Import Library

📌 크롬 Browser 열기

📌 크롬 Browser에서 검색하기

📌 Scroll Down & View More

📌 이미지 다운로드하기

📌 경로 설정하기

'Algorithm > Python' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역