Python Beautifulsoup Level 1 Only Text
I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different. Here is the website http://engine.data.cnzz.com/main.php?s=engine&uv
Solution 1:
Limit your search to direct children of the table
element only by setting the recursive
argument to False:
table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)
Solution 2:
@MartijnPieters' solution is already perfect, but don't forget that BeautifulSoup
allows you to use multiple attributes as well when locating elements. See the following code:
from bs4 import BeautifulSoup as bsoup
import requests as rq
import re
url = "http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31"
r = rq.get(url)
r.encoding = "gb2312"
soup = bsoup(r.content, "html.parser")
div = soup.find("div", class_="right1")
rows = div.find_all("tr", {"class":re.compile(r"list\d+"), "style":"cursor:pointer;"})
for row in rows:
first_td = row.find_all("td")[0]
print first_td.get_text().encode("utf-8")
Notice how I also added "style":"cursor:pointer;"
. This is unique to the top-level rows and is not an attribute of the inner rows. This gives the same result as the accepted answer:
百度汇总
360搜索
新搜狗
谷歌
微软必应
雅虎
0
有道
其他
[Finished in 2.6s]
Hopefully this also helps.
Post a Comment for "Python Beautifulsoup Level 1 Only Text"