网站源码获取
步骤代码
在掌握了python基本语法之后,便想继续学习一些python分支的一些东西练练手,便想到了python的爬虫,本文几乎只介绍了最基础的网站源码获取步骤。 网站源码获取 步骤
1.导入相关库
import requestsimport reimport timeimport json
2.编辑模拟信息
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' }
3.进行爬取,并写入文件
代码import requestsimport reimport timeimport jsondef get_one_page(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' } response = requests.get(url,headers=headers) if response.status_code == 200 : return response.text return Nonedef write_to_file(content): #存储到文件中 with open('result.txt','a',encoding='utf-8') as f: f.write(json.dumps(content,ensure_ascii=False)+'n') #利用json.dumps将字典转换成字符串的形式 f.close()def main(): url = 'https://www.maoyan.com/board/4' html = get_one_page(url) write_to_file(html)if __name__ == '__main__': main()