Scrapy Crawler Spider Doesn't Follow Links
For this, I used example in Scrapy crawl spider example: http://doc.scrapy.org/en/latest/topics/spiders.html I want to get links from a web page and follow them to parse table with
Solution 1:
Scrapy is misinterpreting the content type of the start url.
You can verify this by using scrapy shell:
$ scrapy shell 'http://www.euroleague.net/main'2013-11-1816:39:26+0900 [scrapy] INFO: Scrapy 0.21.0 started (bot: scrapybot)
...
AttributeError: 'Response'object has no attribute 'body_as_unicode'
See my previous answer about the missing body_as_unicode attribute. I notice that the server does not set any content-type header.
CrawlSpider ignores non-html responses, so the responses are not processed and no links are followed.
I would suggest opening a issue on github, as I think Scrapy should be able to handle this case transparently.
As a work around you could override the CrawlSpider parse
method, create an HtmlResponse
from the response object passed, and pass that to the superclass parse
method.
Solution 2:
prepend "www" to allowed domains.
Post a Comment for "Scrapy Crawler Spider Doesn't Follow Links"