ボールを蹴りたいシステムエンジニア

ボール蹴りが大好きなシステムエンジニア、ボールを蹴る時間確保の為に時間がある時には勉強する。

2016-10-11

scrapyでcookieを設定してクロールする

python scrapy クローラー

spidersパッケージ内のクロール処理メインのモジュールで以下のようにmake_requests_from_urlを定義してその中でcookieセット処理を実装する事でログインが必要なサイトでもクロールできた。

class ExampleSpider(CrawlSpider):

　～～～

    def make_requests_from_url(self, url):
        request = super(ExampleSpider, self).make_requests_from_url(url)
        request.cookies['test_key'] = 'value1'
        request.cookies['test_key2'] = 'value2'

        return request

注意点としてCrawlSpiderの継承クラスである事。
scrapy.Spiderの継承クラスの場合は異なるっぽい事をどっかで見かけた。（未検証）

参考

python - how to overwrite / use cookies in scrapy - Stack Overflow