How To Solve 403 Error In Scrapy

December 24, 2023 Post a Comment

I'm new to scrapy and I made the scrapy project to scrap data. I'm trying to scrapy the data from the website but I'm getting following error logs 2016-08-29 14:07:57 [scrapy] INFO

Solution 1:

Like Avihoo Mamka mentioned in the comment you need to provide some extra request headers to not get rejected by this website.

In this case it seems to just be the User-Agent header. By default scrapy identifies itself with user agent "Scrapy/{version}(+http://scrapy.org)". Some websites might reject this for one reason or another.

To avoid this just set headers parameter of your Request with a common user agent string:

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
yieldRequest(url, headers=headers)

You can find a huge list of user-agents here, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results

Baca Juga

You can implement it to work with your spiders start_urls too:

classMySpider(scrapy.Spider):
    name = "myspider"
    start_urls = (
        'http://scrapy.org',
    )

    defstart_requests(self):
        headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
        for url inself.start_urls:yield Request(url, headers=headers)

Solution 2:

Add the following script on your settings.py file. This works well if you are combining selenium with scrapy

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}

howtostartbloggingformoney

How To Solve 403 Error In Scrapy

Solution 1:

Solution 2:

Post a Comment for "How To Solve 403 Error In Scrapy"

Widget HTML #3