Skip to content Skip to sidebar Skip to footer

Definitive Way To Parse Alphanumeric Csvs In Python With Scipy/numpy

I've been trying to find a good and flexible way to parse CSV files in Python but none of the standard options seem to fit the bill. I am tempted to write my own but I think that

Solution 1:

Have a look at pandas which is build on top of numpy. Here is a small example:

In [7]: df = pd.read_csv('data.csv', sep='\t', index_col='name')
In [8]: df
Out[8]: 
        favorite_integer  favorite_float1  favorite_float2        short_description
name                                                                               
johnny                 5            60.20            0.520  johnny likes fruitflies
bob                    1            17.52            0.001       bob, bobby, robert
In [9]: df.describe()
Out[9]: 
       favorite_integer  favorite_float1  favorite_float2
count          2.000000         2.000000         2.000000
mean           3.000000        38.860000         0.260500
std            2.828427        30.179317         0.366988
min            1.000000        17.520000         0.001000
25%            2.000000        28.190000         0.130750
50%            3.000000        38.860000         0.260500
75%            4.000000        49.530000         0.390250
max            5.000000        60.200000         0.520000
In [13]: df.ix['johnny', 'favorite_integer']
Out[13]: 5
In [15]: df['favorite_float1'] # or attribute: df.favorite_float1
Out[15]: 
name
johnny    60.20
bob       17.52
Name: favorite_float1
In [16]: df['mean_favorite'] = df.mean(axis=1)
In [17]: df.ix[:, 3:]
Out[17]: 
              short_description  mean_favorite
name                                          
johnny  johnny likes fruitflies      21.906667
bob          bob, bobby, robert       6.173667

Solution 2:

matplotlib.mlab.csv2rec returns a numpyrecarray, so you can do all the great numpy things to this that you would do with any numpy array. The individual rows, being record instances, can be indexed as tuples but also have attributes automatically named for the columns in your data:

rows= matplotlib.mlab.csv2rec('data.csv')
row=rows[0]

print row[0]
print row.name
print row['name']

csv2rec also understands "quoted strings", unlike numpy.genfromtext.

In general, I find that csv2rec combines some of the best features of csv.reader and numpy.genfromtext.

Solution 3:

numpy.genfromtxt()

Solution 4:

Why not just use the stdlib csv.DictReader?

Post a Comment for "Definitive Way To Parse Alphanumeric Csvs In Python With Scipy/numpy"