Definitive Way To Parse Alphanumeric Csvs In Python With Scipy/numpy
I've been trying to find a good and flexible way to parse CSV files in Python but none of the standard options seem to fit the bill. I am tempted to write my own but I think that
Solution 1:
Have a look at pandas which is build on top of numpy
.
Here is a small example:
In [7]: df = pd.read_csv('data.csv', sep='\t', index_col='name')
In [8]: df
Out[8]:
favorite_integer favorite_float1 favorite_float2 short_description
name
johnny 5 60.20 0.520 johnny likes fruitflies
bob 1 17.52 0.001 bob, bobby, robert
In [9]: df.describe()
Out[9]:
favorite_integer favorite_float1 favorite_float2
count 2.000000 2.000000 2.000000
mean 3.000000 38.860000 0.260500
std 2.828427 30.179317 0.366988
min 1.000000 17.520000 0.001000
25% 2.000000 28.190000 0.130750
50% 3.000000 38.860000 0.260500
75% 4.000000 49.530000 0.390250
max 5.000000 60.200000 0.520000
In [13]: df.ix['johnny', 'favorite_integer']
Out[13]: 5
In [15]: df['favorite_float1'] # or attribute: df.favorite_float1
Out[15]:
name
johnny 60.20
bob 17.52
Name: favorite_float1
In [16]: df['mean_favorite'] = df.mean(axis=1)
In [17]: df.ix[:, 3:]
Out[17]:
short_description mean_favorite
name
johnny johnny likes fruitflies 21.906667
bob bob, bobby, robert 6.173667
Solution 2:
matplotlib.mlab.csv2rec
returns a numpy
recarray
, so you can do all the great numpy
things to this that you would do with any numpy
array. The individual rows, being record
instances, can be indexed as tuples but also have attributes automatically named for the columns in your data:
rows= matplotlib.mlab.csv2rec('data.csv')
row=rows[0]
print row[0]
print row.name
print row['name']
csv2rec
also understands "quoted strings", unlike numpy.genfromtext
.
In general, I find that csv2rec
combines some of the best features of csv.reader
and numpy.genfromtext
.
Solution 3:
numpy.genfromtxt()
Solution 4:
Why not just use the stdlib csv.DictReader?
Post a Comment for "Definitive Way To Parse Alphanumeric Csvs In Python With Scipy/numpy"