注册 登录
  • 欢迎访问开心洋葱网站,在线教程,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站,欢迎加入开心洋葱 QQ群
  • 为方便开心用户,开心洋葱官网已经开启复制功能!
  • 欢迎访问开心洋葱网站,手机也能访问哦~欢迎加入开心洋葱多维思维学习平台 QQ群
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏开心洋葱吧~~~~~~~~~~~~~!
  • 由于近期流量激增,小站的ECS没能经的起亲们的访问,本站依然没有盈利,如果各位看如果觉着文字不错,还请看官给小站打个赏~~~~~~~~~~~~~!

python在google上搜索

python 水墨上仙 1615次浏览 已收录 手机上查看

这段代码不用google api,而是直接抓取Google的搜索结果页,找到需要的链接后存储在links.txt, Google的搜索页面可能会变化,这段代码也需要修改

            <p></p>
            <p>
            </p>
import re,urllib,urllib2
class GoogleHarvester:
    re_links = re.compile(r'<a class=l href="(.+?)"',re.IGNORECASE|re.DOTALL)
    def __init__(self):
        pass
    def harvest(self,terms):
        '''Searchs Google for these terms. Returns only the links (URL).
           Input: terms (string) -- one or several words to search.
           Output: A list of urls (strings).
                   Duplicates links are removed, links are sorted.

           Example: print GoogleHarvester().harvest('monthy pythons')
        '''
        print "Google: Searching for '%s'" % terms
        links = {}
        currentPage = 0
        while True:
            print "Google: Querying page %d (%d links found so far)" % (currentPage/100+1, len(links))
            address = "http://www.google.com/search?q=%s&num=100&hl=en&start=%d" % (urllib.quote_plus(terms),currentPage)
            request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
            urlfile = urllib2.urlopen(request)
            page = urlfile.read(200000)
            urlfile.close()
            for url in GoogleHarvester.re_links.findall(page):
                links[url] = 0
            if "</div>Next</a></table></div><center>" in page: # Is there a "Next" link for next page of results ?
                currentPage += 100  # Yes, go to next page of results.
            else:
                break   # No, break out of the while True loop.
        print "Google: Found %d links." % len(links)
        return sorted(links.keys())  
# Example: Search for "monthy pythons"
links = GoogleHarvester().harvest('monthy pythons')
open("links.txt","w+b").write("\n".join(links))
            <br>


开心洋葱 , 版权所有丨如未注明 , 均为原创丨未经授权请勿修改 , 转载请注明python在google上搜索
喜欢 (0)
[开心洋葱]
分享 (0)
水墨上仙
关于作者:
水墨上仙
……