参考文档:
- 20.1. HTMLParser — Simple HTML and XHTML parser
- 20.2. sgmllib — Simple SGML parser
- 20.3. htmllib — A parser for HTML documents
- 21.5. urllib — Open arbitrary resources by URL
- 21.6. urllib2 — extensible library for opening URLs
- 21.7. httplib — HTTP protocol client
http://www.crummy.com/software/BeautifulSoup/
http://www.crummy.com/software/BeautifulSoup/documentation.zh.html
http://www.diveintopython.org/html_processing/index.html
、
、
python爬虫跑Javascript
http://developer.51cto.com/art/201003/190832.htm
好像说py+v8也可以的样子
、
、
一些示例和skill