Skip to content Skip to sidebar Skip to footer

Decode Characters Pandas

Below is a sample of my DF ROLE NAME GESELLSCHAFTER DUPONT DUPONT GESCHäFTSFüHRER DUPONT DUPONT KOMPLEMENTäR DU

Solution 1:

You can pass regex=True to replace:

# the included dic seems to have `A` instead of 'Ã'
dic ={'ü':'U', 'ä':'A'}

df['ROLE'] = df['ROLE'].replace(dic, regex=True)

Output:

              ROLE           NAME
0   GESELLSCHAFTER  DUPONT DUPONT
1  GESCHAFTSFUHRER  DUPONT DUPONT
2     KOMPLEMENTAR  DUPONT DUPONT
3   GESELLSCHAFTER  DUPONT DUPONT
4     KOMPLEMENTAR  DUPONT DUPONT

Solution 2:

This solution is quite long and might not work well on a large dataset, first decompose using unicodedata then encode to ascii to remove the accents and decode to utf-8

from unicodedata import normalize
df.ROLE.apply(lambda x: normalize('NFD', x).encode(
    'ascii', 'ignore').decode('utf-8-sig'))

# 0                       AKTIONAR# 1                       AKTIONAR# 2                   AUFSICHTSRAT# 3               AUSABENDE PERSON# 4               AUSUBENDE PERSON# 5                    DEFAULT KEY# 6    GESCHAFTSFAHRENDER DIREKTOR# 7                GESCHAFTSFAHRER# Name: ROLE, dtype: object

Post a Comment for "Decode Characters Pandas"