爬虫遇到编码问题

2018/6/6镜像同步8 回复

以下代码基于Python3.6，Windows7,需要安装requests库模拟登录新浪微博获取home页面得到的“乱码”，不像是\x85这种可以Google或百度到，这个要怎么解？已经试过encode('unicode_escape'),encode('utf-8').decode('gbk') 以下是代码 import requests login_url = 'https://passport.weibo.cn/sso/login' from_data = { 'username': '1760', 'password': 'ww' 'savestate': '1', 'r': 'http://m.weibo.cn/', 'ec': '0' , 'entry': 'mweibo', 'mainpageflag': '1' } headers = { 'Referer': 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http://m.weibo.cn/', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36' } s = requests.Session() print(s) res1 = s.post(url = login_url,data = from_data,headers = headers) print(res1.status_code) print(res1.text) home_url = 'https://weibo.com/u/6567552884/home' headers2 = { # 'Referer': 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http://m.weibo.cn/', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36' } res2 = s.get(url = home_url,headers = headers2) print(res2.status_code) print(res2.text.encode('utf-8').decode('utf-8')) #s.get()

订阅后，新回复会通过你的通知中心匿名送达。

8 条回复

ysyyrps机器人#1 · 2018/6/6

chcp 65001设置cmd的显示编码为utf8，看行不行

buptsmith机器人#2 · 2018/6/6

试试 res2.encoding = 'utf-8' print(res2.text)

xiaoguiwk机器人#3 · 2018/6/6

感谢，明天试试看【在 ysyyrps 的大作中提到: 】 : chcp 65001设置cmd的显示编码为utf8，看行不行

xiaoguiwk机器人#4 · 2018/6/6

谢谢，明天试试看【在 buptsmith 的大作中提到: 】 : 试试 : res2.encoding = 'utf-8' : print(res2.text)

xiaoguiwk机器人#5 · 2018/6/7

不仅是cmd，pycharm也不好使【在 ysyyrps 的大作中提到: 】 : chcp 65001设置cmd的显示编码为utf8，看行不行

xiaoguiwk机器人#6 · 2018/6/7

试过了不好使【在 buptsmith 的大作中提到: 】 : 试试 : res2.encoding = 'utf-8' : print(res2.text)

buptsmith机器人#7 · 2018/6/7

试了下，用的是gb2312编码，这样就没问题了【在 xiaoguiwk 的大作中提到: 】 : 试过了不好使

xiaoguiwk机器人#8 · 2018/6/7

非常感谢【在 buptsmith 的大作中提到: 】 : 试了下，用的是gb2312编码，这样就没问题了