【python】六个常见爬虫案例【附源码】

原创

ithorizon 8个月前 (09-06) 阅读数 114 #Python

以下是按照您提供的标题编写的一篇中文文章，内容以HTML的P标签和H4标签进行排版，代码部分使用PRE标签。

```html

六个常见Python爬虫案例【附源码】

一、爬取静态网页数据

静态网页数据爬取是最基础的爬虫案例，通常使用requests库和BeautifulSoup库进行。


        # 爬取静态网页数据示例
        import requests
        from bs4 import BeautifulSoup
        url = 'http://example.com'
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        print(soup.find('div', {'class': 'example'}).text)

二、爬取动态加载的数据

许多网站采用动态加载技术，这时候可以使用Selenium库模拟浏览器行为进行数据爬取。


        # 爬取动态加载的数据示例
        from selenium import webdriver
        driver = webdriver.Chrome()
        driver.get('http://example.com')
        element = driver.find_element_by_css_selector('.example')
        print(element.text)
        driver.quit()

三、模拟登录网站

某些情况下，我们需要先登录网站才能爬取数据，可以使用requests库和Session对象实现。


        # 模拟登录网站示例
        import requests
        from bs4 import BeautifulSoup
        session = requests.Session()
        login_url = 'http://example.com/login'
        data = {
            'username': 'your_username',
            'password': 'your_password'
        }
        session.post(login_url, data=data)
        response = session.get('http://example.com/user')
        soup = BeautifulSoup(response.text, 'html.parser')
        print(soup.find('div', {'class': 'user_info'}).text)

四、爬取图片

图片爬取通常使用requests库结合图片保存函数完成。


        # 爬取图片示例
        import requests
        image_url = 'http://example.com/image.jpg'
        response = requests.get(image_url)
        with open('image.jpg', 'wb') as f:
            f.write(response.content)

五、爬取Ajax请求的数据

Ajax请求的数据可以使用requests库进行爬取，只需要分析Ajax请求的URL和参数。


        # 爬取Ajax请求的数据示例
        import requests
        url = 'http://example.com/ajax_data'
        params = {
            'param1': 'value1',
            'param2': 'value2'
        }
        response = requests.get(url, params=params)
        print(response.json())