One Hot Encoding Of Multi Label Images In Keras
I am using PASCAL VOC 2012 dataset for image classification. A few images have multiple labels where as a few of them have single labels as shown below. 0 2007_000027.jpg
Solution 1:
If there are set
s in second column is possible use MultiLabelBinarizer
:
print (df)
a b
0 2007_000027.jpg {'person'}
1 2007_000032.jpg {'aeroplane', 'person'}
2 2007_000033.jpg {'aeroplane'}
3 2007_000039.jpg {'tvmonitor'}
4 2007_000042.jpg {'train'}
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['b']),columns=mlb.classes_)
print (df)
aeroplane person train tvmonitor
0 0 1 0 0
1 1 1 0 0
2 1 0 0 0
3 0 0 0 1
4 0 0 1 0
Or Series.str.join
with Series.str.get_dummies
, but it should be slowier in large DataFrame:
df = df['b'].str.join('|').str.get_dummies()
print (df)
aeroplane person train tvmonitor
0 0 1 0 0
1 1 1 0 0
2 1 0 0 0
3 0 0 0 1
4 0 0 1 0
Post a Comment for "One Hot Encoding Of Multi Label Images In Keras"