对各种编码彻底混乱了。。。求大神解救。

2015/3/2镜像同步10 回复

#coding=utf-8 import urllib2 from bs4 import BeautifulSoup import re page = urllib2.urlopen('http://www.no8ms.bj.cn/cms/xxgk/'); soup = BeautifulSoup(page,fromEncoding="utf-8") text=soup.get_text() print '简介' text1='简介' print text1 #if text1 in text: # print 'ok' 以上程序执行的结果可以正常输出两个“简介” 但是去掉两个井号之后，报错： UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128) 明明可以显示了为何if这么用还是报错？我基础不好，，，求大神解救。。。

订阅后，新回复会通过你的通知中心匿名送达。

9 条回复

Dlovingalice机器人#1 · 2015/3/2

可以默默地@一下暖神么。。。

asif12机器人#2 · 2015/3/2

貌似python2的Unicode是text1=u'简介' 使用python3就不会有混乱了

GentlyGuitar机器人#3 · 2015/3/3

get_text()返回的是Unicode字符串，而你的text1是utf-8字符串，我猜想执行 if text1 in text 的时候会隐式地先把text1解码成unicode，即text1.decode('ascii')，隐式的解码都是用默认的ascii codec，于是编译器会报错。你把这两个的编码统一以后应该就没问题了，text1换成unicode(text1, encoding='utf-8')，或者换成text1.decode('utf-8')，或者text->text.encode('utf-8')。

Dlovingalice机器人#4 · 2015/3/3

太厉害了！！！！！多谢多谢！！！！【在 GentlyGuitar 的大作中提到: 】 : get_text()返回的是Unicode字符串，而你的text1是utf-8字符串，我猜想执行 if text1 in text 的时候会隐式地先把text1解码成unicode，即text1.decode('ascii')，隐式的解码都是用默认的ascii codec，于是编译器会报错。你把这两个的编码统一以后应该就没问题了，text1换成unicode(text1, encoding='utf-8')，或者换成text1.decode('utf-8')，或者text->text.encode('utf-8')。

Dlovingalice机器人#5 · 2015/3/3

非常感谢，帮大忙了。。。。【在 GentlyGuitar 的大作中提到: 】 : get_text()返回的是Unicode字符串，而你的text1是utf-8字符串，我猜想执行 if text1 in text 的时候会隐式地先把text1解码成unicode，即text1.decode('ascii')，隐式的解码都是用默认的ascii codec，于是编译器会报错。你把这两个的编码统一以后应该就没问题了，text1换成unicode(text1, encoding='utf-8')，或者换成text1.decode('utf-8')，或者text->text.encode('utf-8')。

GentlyGuitar机器人#6 · 2015/3/3

【在 Dlovingalice 的大作中提到: 】 : 非常感谢，帮大忙了。。。。不谢。。。

nuanyangyang机器人#7 · 2015/3/3

请用python3。 python2的str不是unicode的。但现有的python2程序几乎都把str当字符串用，很混乱。来自「北邮人论坛手机版」

Dlovingalice机器人#8 · 2015/3/3

喔！捕获大神一只。。。【在 nuanyangyang 的大作中提到: 】 : 请用python3。 : python2的str不是unicode的。但现有的python2程序几乎都把str当字符串用，很混乱。 : 来自「北邮人论坛手机版」

WTF机器人#9 · 2015/3/7

p2的话，在开头加上 import sys reload(sys) sys.setdefaultencoding("utf-8") 通过『我邮2.0』发布