Skip to content Skip to sidebar Skip to footer

Python Beautifulsoup Level 1 Only Text

I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different. Here is the website http://engine.data.cnzz.com/main.php?s=engine&uv

Solution 1:

Limit your search to direct children of the table element only by setting the recursive argument to False:

table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)

Solution 2:

@MartijnPieters' solution is already perfect, but don't forget that BeautifulSoup allows you to use multiple attributes as well when locating elements. See the following code:

from bs4 import BeautifulSoup as bsoup
import requests as rq
import re

url = "http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31"
r = rq.get(url)
r.encoding = "gb2312"

soup = bsoup(r.content, "html.parser")
div = soup.find("div", class_="right1")
rows = div.find_all("tr", {"class":re.compile(r"list\d+"), "style":"cursor:pointer;"})

for row in rows:
    first_td = row.find_all("td")[0]
    print first_td.get_text().encode("utf-8")

Notice how I also added "style":"cursor:pointer;". This is unique to the top-level rows and is not an attribute of the inner rows. This gives the same result as the accepted answer:

百度汇总
360搜索
新搜狗
谷歌
微软必应
雅虎
0
有道
其他
[Finished in 2.6s]

Hopefully this also helps.

Post a Comment for "Python Beautifulsoup Level 1 Only Text"