Skip to content Skip to sidebar Skip to footer

One Hot Encoding Of Multi Label Images In Keras

I am using PASCAL VOC 2012 dataset for image classification. A few images have multiple labels where as a few of them have single labels as shown below. 0 2007_000027.jpg

Solution 1:

If there are sets in second column is possible use MultiLabelBinarizer:

print (df)
                 a                        b
0  2007_000027.jpg               {'person'}
1  2007_000032.jpg  {'aeroplane', 'person'}
2  2007_000033.jpg            {'aeroplane'}
3  2007_000039.jpg            {'tvmonitor'}
4  2007_000042.jpg                {'train'}

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['b']),columns=mlb.classes_)
print (df)
   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

Or Series.str.join with Series.str.get_dummies, but it should be slowier in large DataFrame:

df = df['b'].str.join('|').str.get_dummies()
print (df)

   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

Post a Comment for "One Hot Encoding Of Multi Label Images In Keras"