欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

Day537.requests爬虫-python

时间:2023-05-29
requests爬虫 一、基本使用

安装

pip install requests

response的属性以及类型

基本使用

import requestsurl = 'http://www.baidu.com'resp = requests.get(url)resp.encoding = 'utf-8'# print(type(resp))# print(resp.text)# print(resp.url)# print(resp.content)# print(resp.status_code)# print(resp.headers)

get请求

import requestsurl = 'https://www.baidu.com/s?'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'wd':'温州'}resp = requests.get(url=url,params=data,headers=headers)resp.encoding = 'utf-8'print(resp.text)

post请求

import requestsimport jsonurl = 'https://fanyi.baidu.com/sug'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'kw': 'eye'}resp = requests.post(url=url,data=data,headers=headers)resp.encoding = 'utf-8'context = resp.textprint(json.loads(context))

代理

import requestsurl = 'https://www.baidu.com/s'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'wd': 'ip'}proxy = { 'http' : '120.220.220.95:8085'}resp = requests.get(url=url,params=data,headers=headers,proxies=proxy)resp.encoding = 'utf-8'context = resp.textwith open('代理.html','w',encoding='utf-8') as fp: fp.write(context)

爬 古诗文网 登入

import requestsimport lxml.etreeimport urllib.requestsource_url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}login_resp = requests.get(url=source_url, headers=headers)context = login_resp.texttree = lxml.etree.HTML(context)__VIEWSTATE = tree.xpath("//input[@id='__VIEWSTATE']/@value")__VIEWSTATEGENERATOR = tree.xpath("//input[@id='__VIEWSTATEGENERATOR']/@value")img_src = 'https://so.gushiwen.cn' + tree.xpath('//img[@id="imgCode"]/@src')[0]# urllib.request.urlretrieve(img_src, '古诗文网验证码.png') #有坑# 将请求变成一个session对象session = requests.session()code_resp = session.get(img_src)context_code = code_resp.content# wb模式是将2进制数据写入文件with open('古诗文网验证码.png','wb') as fp: fp.write(context_code)img_code = input('请输入验证码')login_url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'login_data = { '__VIEWSTATE': __VIEWSTATE, '__VIEWSTATEGENERATOR': __VIEWSTATEGENERATOR, 'from': 'http://so.gushiwen.cn/user/collect.aspx', 'email': '995931576@qq.com', 'pwd': '852766122', 'code': img_code, 'denglu': '登录'}# 使用上面同一个请求session对象resp = session.post(url=login_url, data=login_data, headers=headers)with open('gsw.html','w',encoding='utf-8') as fp: fp.write(resp.text)


Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。