Day537.requests爬虫-python

时间：2023-05-29

requests爬虫一、基本使用

安装

pip install requests

response的属性以及类型

基本使用

import requestsurl = 'http://www.baidu.com'resp = requests.get(url)resp.encoding = 'utf-8'# print(type(resp))# print(resp.text)# print(resp.url)# print(resp.content)# print(resp.status_code)# print(resp.headers)

get请求

import requestsurl = 'https://www.baidu.com/s?'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'wd':'温州'}resp = requests.get(url=url,params=data,headers=headers)resp.encoding = 'utf-8'print(resp.text)

post请求

import requestsimport jsonurl = 'https://fanyi.baidu.com/sug'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'kw': 'eye'}resp = requests.post(url=url,data=data,headers=headers)resp.encoding = 'utf-8'context = resp.textprint(json.loads(context))

代理

import requestsurl = 'https://www.baidu.com/s'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}data = { 'wd': 'ip'}proxy = { 'http' : '120.220.220.95:8085'}resp = requests.get(url=url,params=data,headers=headers,proxies=proxy)resp.encoding = 'utf-8'context = resp.textwith open('代理.html','w',encoding='utf-8') as fp: fp.write(context)

爬古诗文网登入

import requestsimport lxml.etreeimport urllib.requestsource_url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',}login_resp = requests.get(url=source_url, headers=headers)context = login_resp.texttree = lxml.etree.HTML(context)__VIEWSTATE = tree.xpath("//input[@id='__VIEWSTATE']/@value")__VIEWSTATEGENERATOR = tree.xpath("//input[@id='__VIEWSTATEGENERATOR']/@value")img_src = 'https://so.gushiwen.cn' + tree.xpath('//img[@id="imgCode"]/@src')[0]# urllib.request.urlretrieve(img_src, '古诗文网验证码.png') #有坑# 将请求变成一个session对象session = requests.session()code_resp = session.get(img_src)context_code = code_resp.content# wb模式是将2进制数据写入文件with open('古诗文网验证码.png','wb') as fp: fp.write(context_code)img_code = input('请输入验证码')login_url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'login_data = { '__VIEWSTATE': __VIEWSTATE, '__VIEWSTATEGENERATOR': __VIEWSTATEGENERATOR, 'from': 'http://so.gushiwen.cn/user/collect.aspx', 'email': '995931576@qq.com', 'pwd': '852766122', 'code': img_code, 'denglu': '登录'}# 使用上面同一个请求session对象resp = session.post(url=login_url, data=login_data, headers=headers)with open('gsw.html','w',encoding='utf-8') as fp: fp.write(resp.text)

上一篇：Python脚本，数据类型，数字类型，布尔类型，数字类型转换，字符串类型

下一篇：C语言初步学习笔记——第七节