Reading Non-ascii Characters From A Text File
I'm using python 2.7. I've tried many things like codecs but didn't work. How can I fix this. myfile.txt wörd My code f = open('myfile.txt','r') for line in f: print line f.c
Solution 1:
import codecs
#open it with utf-8 encoding
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()
#check the encoding typeprinttype(file) #it's unicode#unicode should be encoded to standard string to display it properlyprint sfile.encode('utf-8')
#check the type of encoded stringprinttype(sfile.encode('utf-8'))
Solution 2:
- First of all - detect the file's encoding
from chardet import detect
encoding = lambda x: detect(x)['encoding']
print encoding(line)
- then - convert it to unicode or your default encoding str:
n_line=unicode(line,encoding(line),errors='ignore')
print n_line
print n_line.encode('utf8')
Solution 3:
It's the terminal encoding. Try to configure your terminal with the same encoding you are using in your file. I recomend you to use UTF-8.
By the way, is a good practice to decode-encode all your inputs-outputs to avoid problems:
f = open('test.txt','r')
for line in f:
l = unicode(line, encoding='utf-8')# decode the input print l.encode('utf-8') # encode the output
f.close()
Post a Comment for "Reading Non-ascii Characters From A Text File"