Skip to content Skip to sidebar Skip to footer

Error Reading An Uploaded Csv Using Dask In Django: 'inmemoryuploadedfile' Object Has No Attribute 'startswith'

I'm building a Django app that enables users to upload a CSV via a form using a FormField. Once the CSV is imported I use the Pandas read_csv(filename) command to read in the CSV s

Solution 1:

I finally got it working. Here's a Django specific solution building on the answer from @mdurant who thankfully pointed me in the right direction.

By default Django stores files under 2.5MB in memory and so Dask isn't able to access it in the way Pandas does as Dask asks for a location in actual storage. However, when the file is over 2.5MB Django stores the file in a temp folder which can then be located with the Django command temporary_file_path(). This temp file path can then be used directly by Dask. I found some really useful information about how Django actually handles files in the background in their docs: https://docs.djangoproject.com/en/3.0/ref/files/uploads/#custom-upload-handlers.

In case you can't predict in advance your user uploaded file sizes (as is in my case) and you happen to have a file less than 2.5MB you can change FILE_UPLOAD_HANDLERS in your Django settings file so that it writes all files to a temp storage folder regardless of size so it can always be accessed by Dask.

Here is how I changed my code in case this is helpful for anyone else in the same situation.

In views.py

def import_csv(request):

    if request.method == 'POST':
        form = ImportFileForm(request.POST, request.FILES)
        if form.is_valid():

             # the temporary_file_path() shows Dask where to find the file
             df_in = dd.read_csv(request.FILES['file_name'].temporary_file_path())

And in settings.py adding in the setting as below makes Django always write an uploaded file to temp storage whether the file is under 2.5MB or not so it can always be accessed by Dask

FILE_UPLOAD_HANDLERS = ['django.core.files.uploadhandler.TemporaryFileUploadHandler',]

Solution 2:

It seems you are not passing a file on disc, but some django-specific buffer object. Since you are expecting large files, you probably want to tell django to stream the uploads directly to disc and give you the filename for dask; i.e., is request.FILES['file_name'] actually somewhere in your storage? The error message seems to suggest not, in which case you need to configure django (sorry, I don't know how).

Note that Dask can deal with in-memory file-like objects such as io.BytesIO, using the MemoryFileSystem, but this isn't very typical, and won't help with your memory issues.

Post a Comment for "Error Reading An Uploaded Csv Using Dask In Django: 'inmemoryuploadedfile' Object Has No Attribute 'startswith'"