Skip to content Skip to sidebar Skip to footer

How To Extract Asin From An Amazon Product Page

I have the following webpage Product page and I'm trying to get the ASIN from it (in this case ASIN=B014MHZ90M) and I don't have a clue on how to get it from the page. I'm using P

Solution 1:

Looking at the Amazon page you linked, the ASIN number appears in the "Product Details" section. Using the scrapy shell the following xpath

response.xpath('//li[contains(.,"ASIN: ")]//text()').extract()

returns

[u'ASIN: ', u'B014MHZ90M']

For debugging XPATHs I always use scrapy shell and Firebug for Firefox.

Solution 2:

I use this:

re.match("http[s]?://www.amazon.(\w+)(.*)/(dp|gp/product)/(?P<asin>\w+).*", url, flags=re.IGNORECASE)

Solution 3:

You can get that from the url.

r = re.search('www.amazon.com/dp/(.+)/', response.url)
print r.group(1)

Solution 4:

https://www.amazon.com/gp/seller/asin-upc-isbn-info.html

Amazon Standard Identification Numbers (ASINs) are unique blocks of 10 letters and/or numbers that identify items.

Your best option and probably the easiest one is to run a regex on the URL looking for a 10 char string between two "/".

'/\w{10}/'

You can then simply omit the "/"s from the result.

Post a Comment for "How To Extract Asin From An Amazon Product Page"