Transformation Of Pandas Dataframe Adds A Blank Row
My original question was posted here. I have a dataframe as follows: ID START END SEQ 1 11 12 1 1 14 15 3 1 13 14 2 2 10 14 1 3 11
Solution 1:
Consider resetting index after the pivot/unstack operation:
from io import StringIO
import pandas as pd
data='''
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
'''
test_2 = pd.read_table(StringIO(data), sep="\\s+")
seq = set(test_2['SEQ'].tolist())
test_2['SEQ1'] = test_2.SEQ
test_2 = test_2.pivot_table(index= ['ID','SEQ1']).unstack()
test_2 = test_2.sort_index(axis=1, level=1)
test_2.columns = ['_'.join((col[0], str(col[1]))) for col in test_2]
test_2 = test_2.reset_index()
# ID END_1 SEQ_1 START_1 END_2 SEQ_2 START_2 END_3 SEQ_3 START_3# 0 1 12.0 1.0 11.0 14.0 2.0 13.0 15.0 3.0 14.0# 1 2 14.0 1.0 10.0 NaN NaN NaN NaN NaN NaN# 2 3 15.0 1.0 11.0 17.0 2.0 16.0 NaN NaN NaN
However, as you can see it changes column ordering, so consider a nested list comprehension with sum()
to flatten it, all for a suitable order:
seqmax = max(seq)+1colorder = ['ID'] + sum([['START_'+str(i),'END_'+str(i),'SEQ_'+str(i)]
for i in range(1, seqmax) if i in seq],[])
test_2 = test_2[colorder]
# ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3# 0 1 11.0 12.0 1.0 13.0 14.0 2.0 14.0 15.0 3.0# 1 2 10.0 14.0 1.0 NaN NaN NaN NaN NaN NaN# 2 3 11.0 15.0 1.0 16.0 17.0 2.0 NaN NaN NaN
Post a Comment for "Transformation Of Pandas Dataframe Adds A Blank Row"