Skip to content Skip to sidebar Skip to footer

Matching Values From One Csv File To Another And Replace Entire Column Using Pandas/python

Consider the following example: I have a dataset of Movielens- u.item.csv ID|MOVIE NAME (YEAR)|REL.DATE|NULL|IMDB LINK|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S| 1|Toy Story (1995)|01-

Solution 1:

I think you need map by Series created by set_index:

print (df1.set_index('ID')['MOVIE NAME (YEAR)'])
ID
1     Toy Story(1995)2     GoldenEye (1995)
3    Four Rooms(1995)
Name: MOVIE NAME(YEAR), dtype: object

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
01   Toy Story(1995)587496575811   GoldenEye (1995)       387689317121  Four Rooms(1995)4878542960

Or use replace:

df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
01   Toy Story(1995)587496575811   GoldenEye (1995)       387689317121  Four Rooms(1995)4878542960

Difference is if not match, map create NaN and replace let original value:

print (df2)
   user_id  movie_id  rating  unix_timestamp
011587496575811238768931712154878542960 <- 5 not match

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
01  Toy Story (1995)       587496575811  GoldenEye (1995)       387689317121               NaN       4878542960

df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
0        1  Toy Story (1995)       5       874965758
1        1  GoldenEye (1995)       3       876893171
2        1                 5       4       878542960

Post a Comment for "Matching Values From One Csv File To Another And Replace Entire Column Using Pandas/python"