Board logo

标题: [技术讨论] python采集搜索引擎关键字 [打印本页]

作者: ivor    时间: 2016-2-21 22:35     标题: python采集搜索引擎关键字

python做爬虫的人真多,我就练练手
  1. # Python 3.5.1
  2. # coding:utf-8
  3. # 采集搜索引擎关键字
  4. import urllib.request, re
  5. text = ["北京", "上海", "青岛"]
  6. for choice in text:
  7. keywards = urllib.request.quote(choice)
  8. url = "http://sug.so.360.cn/suggest?callback=suggest_so&encodein=utf-8&encodeout=utf-8&format=json&fields=word,obdata&word=" + keywards
  9. headers = {
  10. "GET":url,
  11. "Host":"sug.so.360.cn",
  12. "Referer":"http://www.so.com/",
  13. "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
  14. }
  15. req = urllib.request.Request(url)
  16. for header in headers:
  17. req.add_header(header,headers[header])
  18. html = urllib.request.urlopen(req).read()
  19. html_decode=html.decode("utf-8")
  20. result = re.findall("\"([\u4e00-\u9fa5].*?)\"",html_decode)
  21. for item in result:
  22. print(item)
  23. input("Press Enter key to continue……")
复制代码





欢迎光临 批处理之家 (http://bbs.bathome.net/) Powered by Discuz! 7.2