请教一个Python3爬虫转码的问题

2016/5/23镜像同步2 回复

主要代码如下： # -*- coding:utf-8 -*- import urllib.parse import urllib.request url = 'http://www.zhihu.com/' headers={ 这里面是我知乎账号的cookie } response=urllib.request.Request(url=url,headers=headers) f=urllib.request.urlopen(response) print(f.read().decode("utf-8")) 两天前还能正常输出，到昨天就一直报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte ，很无奈，我改成了print(f.read().decode("utf-8",errors="ignore"))结果输出为乱码。。。然后改成print(f.read())输出都是 b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03uTio\xdb6这种我有用fiddler监控整个过程，网站登录过程没有问题，最后能够模拟登录。请问这是什么原因啊，怎么解决？

订阅后，新回复会通过你的通知中心匿名送达。

2 条回复

nuanyangyang机器人#1 · 2016/5/23

你确定知乎的网页是用utf8编码的吗？

skkkk11111机器人#2 · 2016/5/24

确定，这个问题我已经解决了，主要原因是接收到的数据流被压缩了，我解压之后就好了。【在 nuanyangyang 的大作中提到: 】 : 你确定知乎的网页是用utf8编码的吗？