本文共 1214 字,大约阅读时间需要 4 分钟。
rules = ( Rule(LinkExtractor(allow=r'WebPage/Company.*'),follow=True,callback='parse_company'), Rule(LinkExtractor(allow=r'WebPage/JobDetail.*'), callback='parse_item', follow=True), )
跟踪Rule代码看它的参数:
link_extractor, callback=None, cb_kwargs=None, follow=None, process_links=None, process_request=identity
LinkExtrator的参数用法,跟踪代码看参数:
allow=(), deny=(), allow_domains=(), deny_domains=(), restrict_xpaths=(), tags=('a', 'area'), attrs=('href',), canonicalize=False, unique=True, process_value=None, deny_extensions=None, restrict_css=(), strip=True
restrict_css('.jon-info')
是限定
<div class=jon-info>中间的范围</div>
转载地址:http://lgsdx.baihongyu.com/