How To Solve 403 Error In Scrapy
I'm new to scrapy and I made the scrapy project to scrap data. I'm trying to scrapy the data from the website but I'm getting following error logs 2016-08-29 14:07:57 [scrapy] INFO
Solution 1:
Like Avihoo Mamka mentioned in the comment you need to provide some extra request headers to not get rejected by this website.
In this case it seems to just be the User-Agent
header. By default scrapy identifies itself with user agent "Scrapy/{version}(+http://scrapy.org)"
. Some websites might reject this for one reason or another.
To avoid this just set headers
parameter of your Request
with a common user agent string:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
yieldRequest(url, headers=headers)
You can find a huge list of user-agents here, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results
You can implement it to work with your spiders start_urls
too:
classMySpider(scrapy.Spider):
name = "myspider"
start_urls = (
'http://scrapy.org',
)
defstart_requests(self):
headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
for url inself.start_urls:yield Request(url, headers=headers)
Solution 2:
Add the following script on your settings.py file. This works well if you are combining selenium with scrapy
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
Post a Comment for "How To Solve 403 Error In Scrapy"