Skip to content Skip to sidebar Skip to footer

Reading Non-ascii Characters From A Text File

I'm using python 2.7. I've tried many things like codecs but didn't work. How can I fix this. myfile.txt wörd My code f = open('myfile.txt','r') for line in f: print line f.c

Solution 1:

import codecs
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()

#check the encoding typeprinttype(file) #it's unicode#unicode should be encoded to standard string to display it properlyprint sfile.encode('utf-8')
#check the type of encoded stringprinttype(sfile.encode('utf-8'))

Solution 2:

  1. First of all - detect the file's encoding
from chardet import detect
  encoding = lambda x: detect(x)['encoding']
  print encoding(line)
  1. then - convert it to unicode or your default encoding str:

  n_line=unicode(line,encoding(line),errors='ignore')
  print n_line
  print n_line.encode('utf8')

Solution 3:

It's the terminal encoding. Try to configure your terminal with the same encoding you are using in your file. I recomend you to use UTF-8.

By the way, is a good practice to decode-encode all your inputs-outputs to avoid problems:

f = open('test.txt','r')    
for line in f:
    l = unicode(line, encoding='utf-8')# decode the input                                                                                  print l.encode('utf-8') # encode the output                                                                                            
f.close()

Post a Comment for "Reading Non-ascii Characters From A Text File"