考研词汇造句4
day16
It’s just an epithet that I say the color of your sweater looks like a cow. It’s a blot on our relationship.
How to attain a state of calmness when peeing.
A brilliant shaft of light burst through the mine shaft. I have been captive for one month there. The incoming warmth really makes feel alive again.
We are only mortals. We are all designed to be mortal. We all have mortal fear of death.
He is bailed? Who paid the bail? What if he jumped bail and run to America? It’s crazy to ...
考研词汇造句3
day11
Flower can bloom once again, but people’s youth can’t.
My attitude towards her begin to shift.
I dreamed of building a cage with timbers or lumber in rural areas. But I need to install some tubes for showering. And if the tubes burst, I need to get a plumber to fix it.
what mortified him is that his wife cheated him.
Do not hesitate, be a decisive man, especially in decisive moment.
What is your motive for studying? My motive is to pursue the cardinal virtues like truth, goodne ...
考研词汇造句2
day6
President’ conversion to communist shocked all people. It must due to last month’s data conversion when they found president’s secret email with cpp. People start to denounce him as a traitor.
Drinking impairs your mental health. Just decline their invitation for drinking, you don’t need to be so obedient.
My health no longer deteriorates, I am happy now.
I only have dim memory about him, he was a formidable guy. His hair always in wild tangles. And he like to wear a straw hat and d ...
考研词汇造句1
day1
I have never envisaged being so apprehensive about getting her reply. Yesterday, I saw smudges of blood in her face and her lipstick was smudged. I asked could I play lol with her and she consented it. Unbelievable, I got her consent.
Seeing what happened to China recently makes me realize it’s quite difficult to administer a country. China didn’t expect a reciprocal agreement with US in Alaska. The mutability of the US government sucks. US breached the international law lest China figh ...
《精神焦虑症的自救——病理分析卷》
神经系统如何工作
自主神经系统:根据自己意愿控制,由大脑和脊髓组成,伸至肌肉。
非自主神经系统:不受自己意愿控制。控制中心从大脑到内脏,控制内脏、汗液和唾液。
神经衰弱的发展过程和治疗原则
起因->恐惧->肾上腺素分泌->更加恐惧的恶性循环
治疗原则:
面对而不是逃避。
接受而不是抗争。
飘然而过而不是过分关注。
耐心等待而不是失去耐性。
必须记住:
身体对新的情绪做出反应需要一段时间,在这段时间内,身体反应出的仍然是前面几周、几个月、甚至几年的紧张和恐惧。
真正的接受
如果你对手出汗、心跳、胃痉挛并不十分在意,那你就是真正的接受了。就算开始不能平静接受,那也没关系,这个阶段平静接受几乎是不可能的。
要有意识的无所作为,它意味着规避,意味着与会,意味着飘然,意味着等待。
无所作为,顺其自然并不会让你的意志力丧失。
有年轻人说:“我感到自己必须保持警惕,如果听之任之,我的某根神经肯定会失灵。所以保持对自身的控制、不让自己垮掉是完全有必要的”。这种担心是没必要的。
真正的放松
放松就是努力放松的缺失。
你没必要费力地去争取放松,你应该等待。
要让你的身体在不受你控 ...
Python爬虫-scrapy反爬策略
UA伪装和Cookie设置
在DownloaderMiddleware的process_request中设置request.headers['User-Agent']和request.cookies
在setting.py中设置
代理IP设置
在DownloaderMiddleware的process_request和process_exception中设置request.meta['proxy']。
在setting.py中设置
Download时延
对目标网站的爬取速度不应太快,否则很容易被封IP。所以要设置时延。
settings.py中设置DOWNLOAD_DELAY(运行时默认使用0.5DOWNLOAD_DELAY到1.5DOWNLOAD_DELAY之间的值)。并开启AUTOTHROTTLE_ENABLED根据网站负载动态调整下载速度。
AutoThrottle extension Design goals:
be nicer to sites instead of using default download delay of zero
automatic ...
Python爬虫-scrapy中使用CrawlSpider进行全站数据爬取
目标:爬取豆瓣图书科普分类下的若干页图的书名。
CrawlSpider:为Spider的一个子类,可以很方便地提取页面中的链接并对请求内容进行解析。
使用CrawlSpider:创建好scrapy工程后,输入scrapy genspider -t crawl CrawlSpiderName www.xxx.com。
使用CrawlSpider
前提:setting.py中设置好LOG_LEVEL,ROBOTSTXT_OBEY,DOWNLOADER_MIDDLEWARES,ITEM_PIPELINES。
介绍
LinkExtractor:链接提取器,可以将start_urls中的页面满足特定正则表达式的链接提取出来。
Rules:规则解析器。规则解析器由链接提取器创建,可以将链接提取器提取的链接交给callback指定的解析函数进行解析。Rules中的follow参数指定是否进行迭代提取,也就是是否在LinkExtractor提取出的链接所对应的页面中,继续应用rules进行解析。
Rules:
Which is a list of one (or more) Rulebjects. ...
Python爬虫-scrapy使用middlewares设置headers和代理IP
目标:爬取豆瓣小说标签页面共计三页图书的信息。
流程:
在setting.py中根据需要设置LOG_LEVEL,ROBOTSTXT_OBEY,ITEM_PIPELINES,DOWNLOADER_MIDDLEWARES等项。
在middlewares.py中定义request.headers,用于UA伪装、IP代理等设置。
在items.py中定义数据项,用于封装spider_name.py解析到的数据。
在spider_name.py文件中定义URL和数据解析的逻辑(可能进行多层解析、不同URL的解析)。
在piplines.py中定义数据存储逻辑。
项目结构
项目主要文件的源码
settings.py(修改过的部分)
12345678LOG_LEVEL = "ERROR"ROBOTSTXT_OBEY = FalseDOWNLOADER_MIDDLEWARES = { 'project3.middlewares.Project3DownloaderMiddleware': 543,}ITEM_PIPELINES ...
Python爬虫-scrapy五大核心组件
从中可以看出Engine类似于总指挥。
Spiders向Engine提交Requests。
Engine将Requests交给Scheduler进行过滤重复请求等操作。
Scheduler返回新的Requests给Engine。
Engine将Requests交给Downloader
Downloader从网上下载好内容封装成Response返回给Engine。
Engine将Response交给Spiders进行数据解析处理,得到Item或进一步提交Requests。但最终是要yield item交给Engine。
Engine将item交给Item Piplines进行数据持久化存储。
图中有两个Middlewares,从④⑤⑥中可看出Requests和Responses会经过Middleware,则可在Middleware中对Requests或Responses进行修改。如Requests中修改headers或者添加代理IP等。
Python爬虫-scrapy数据持久化存储
基于终端指令的持久化存储
存储数据放到爬虫文件的parse方法的返回值中。
存储只能为json, csv, xml等文本类型。
scrapy crawl spider_name -o output_path。执行spider_name,将输出放到output_path中。
12345678910111213141516171819import scrapyclass Spider1Spider(scrapy.Spider): name = 'spider1' # 允许爬取的域名列表 allowed_domains = ['www.bilibili.com'] # 要爬取的URL列表 start_urls = ['https://www.bilibili.com/v/popular/rank/all'] def parse(self, response): # 返回内容封装在selector中,其返回所有符合条件的selector对象组成的列表。 selecto ...