首页新闻找找看学习计划

Python文件读取、写入时报错

0
悬赏园豆:20 [待解决问题]

-- coding: utf-8 --

import jieba

with open('./nlp_test0.txt') as f:

document = f.read()

document_decode = document.decode('GBK')

document_cut = jieba.cut(document_decode)

#print  ' '.join(jieba_cut)  //如果打印结果,则分词效果消失,后面的result无法显示

result = ' '.join(document_cut)

result = result.encode('utf-8')

with open('./nlp_test1.txt', 'w') as f2:

    f2.write(result)

f.close()

f2.close()


UnicodeDecodeError Traceback (most recent call last)
<ipython-input-55-40dd9f231eb6> in <module>
4
5 with open('./nlp_test0.txt') as f:
----> 6 document = f.read()
7
8 document_decode = document.decode('GBK')

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 32: illegal multibyte sequence

wynn学习记录的主页 wynn学习记录 | 初学一级 | 园豆:182
提问于:2020-03-18 21:29
< >
分享
所有回答(2)
0

你的文本是gbk的
或者是包含了gbk的字节

codegay | 园豆:1786 (小虾三级) | 2020-03-19 00:13
0

可以用decode('gbk', 'ignore')忽略非法字符,或者换个编码试试

yytxdy | 园豆:974 (小虾三级) | 2020-03-19 09:14
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册