基于终端指令的持久化存储 
存储数据放到爬虫文件的parse方法的返回值中。 
存储只能为json, csv, xml等文本类型。 
scrapy crawl spider_name -o output_path。执行spider_name,将输出放到output_path中。 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import  scrapyclass  Spider1Spider (scrapy.Spider ):    name = 'spider1'           allowed_domains = ['www.bilibili.com' ]          start_urls = ['https://www.bilibili.com/v/popular/rank/all' ]     def  parse (self, response ):                  selector_list = response.xpath('//li//div[@class="info"]/a/@href' )                  data = selector_list.extract()                  return  {"data" : data} 
基于管道的持久化存储 
	前提 :创建工程project2并创建爬虫文件spider2。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import  scrapyfrom  project2.items import  Project2Itemclass  Spider2Spider (scrapy.Spider ):    name = 'spider2'           allowed_domains = ['www.bilibili.com' ]          start_urls = ['https://www.bilibili.com/v/popular/rank/all' ]     def  parse (self, response ):                  selector_list = response.xpath('//li//div[@class="info"]/a/@href' )                  data = selector_list.extract()                  item = Project2Item()         item["rank_list" ] = ' ' .join(data)                  yield  item 
1 2 3 4 5 6 7 8 9 10 11 12 13 import  scrapyclass  Project2Item (scrapy.Item ):              rank_list = scrapy.Field() 
	前提 :setting.py中解注释ITEM_PIPELINES。
1 2 3 4 ITEM_PIPELINES = {   'project2.pipelines.Project2Pipeline' : 300 , } 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 from  itemadapter import  ItemAdapterclass  Project2Pipeline :    fp = None           def  open_spider (self, spider ):         print ("开始爬虫" )         self.fp = open ('./bilibili_rank.txt' , 'w' , encoding='utf-8' )          def  process_item (self, item, spider ):         data = item["rank_list" ]         self.fp.write(data)         return  item          def  close_spider (self, spider ):         print ("结束爬虫" )         self.fp.close() 
scrapy crawl spider2开始爬取数据。
相比较于终端指令持久存储方式,使用管道方式存储方式更灵活。可以存在任何类型文件或者数据库中 。