How To Sequence Row Based On Another Row?
I am trying to convert a formula from excel to pandas. The DataFrame looks like this: Column A Column B H H H J J J J K K I want to fill column B to increm
Solution 1:
This can be done using the following vectorised method:
Code:
>>> df = pd.DataFrame({'A':['H', 'H', 'H', 'J', 'J', 'J', 'J', 'K', 'K']})>>> df['B'] = df.groupby((df['A'].shift(1) != df['A']).cumsum()).cumcount() + 1
Output:
>>> df
AB0 H 11 H 22 H 33 J 14 J 25 J 36 J 47 K 18 K 2
Explanation:
First, we use df['A'].shift(1) != df['A']
to compare column A with column A shifted by 1. This yields:
>>> df['A'] != df['A'].shift(1)
0True1False2False3True4False5False6False7True8False
Name: A, dtype: bool
Next, we use cumsum()
to return the cumulative sum over that column. This gives us:
>>> (df['A'] != df['A'].shift(1)).cumsum()
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 3
8 3
Name: A, dtype: int32
Now, we can use GroupBy.cumcount()
as usual to enumerate each item in ascending order, adding 1 to start the index at 1. Note that we can't just use
df.groupby('A').cumcount()
Because if, for example, we had:
>>>df
A
0 H
1 H
2 H
3 J
4 J
5 J
6 J
7 K
8 K
9 H
This would give us:
>>>df.groupby('A').cumcount() + 1
0 1
1 2
2 3
3 1
4 2
5 3
6 4
7 1
8 2
9 4
dtype: int64
Note that the final row is 4
and not 1
as expected.
Post a Comment for "How To Sequence Row Based On Another Row?"