Skip to content Skip to sidebar Skip to footer

Beautiful Soup 4 Find_all Don't Find Links That Beautiful Soup 3 Finds

I noticed a really annoying bug: BeautifulSoup4 (package: bs4) often finds less tags than the previous version (package: BeautifulSoup). Here's a reproductible instance of that iss

Solution 1:

You have lxml installed, which means that BeautifulSoup 4 will use that parser over the standard-library html.parser option.

You can upgrade lxml to 3.2.1 (which for me returns 1701 results for your test page); lxml itself uses libxml2 and libxslt which may be to blame too here. You may have to upgrade those instead / as well. See the lxml requirements page; currently libxml2 2.7.8 or newer is recommended.

Or explicitly specify the other parser when parsing the soup:

s4 = bs4.BeautifulSoup(r.text, 'html.parser')

Post a Comment for "Beautiful Soup 4 Find_all Don't Find Links That Beautiful Soup 3 Finds"