Skip to content Skip to sidebar Skip to footer

Retrieving Matching Word Count On A Datacolumn Using Pandas In Python

I have a df, Name Description Ram Ram is one of the good cricketer Sri Sri is one of the member Kumar Kumar is a keeper and a list, my_list=['one','good','ravi',

Solution 1:

Use str.findall + str.join + str.len:

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')') 
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      21  Sri          Sri is one of the member       one      1

EDIT:

import re
my_list=["ONE","good"]

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      21  Sri          Sri is one of the member       one      1

Solution 2:

Took a shot at this with str.findall.

c = df.Description.str.findall('({})'.format('|'.join(my_list)))
df['keys'] = c.apply(','.join) # or c.str.join(',')
df['count'] = c.str.len()

df[df['count'] >0]

  Name                       Description      keys  count
0  Ram  Ram isoneof the good cricketer  one,good      21  Sri          Sri isoneof the memberone1

Post a Comment for "Retrieving Matching Word Count On A Datacolumn Using Pandas In Python"