tech-aiの日記

はじめに

深層学習の学習のデータを集めるのがとても時間がかかって難しいです。
そこで、icrawlerは簡単に画像を収集できるのでメモとして残しておきます。

使用環境

Google Colaboratory
python3

ライブラリのインストール

!pip install icrawler

画像収集

from icrawler.builtin import BingImageCrawler

crawler = BingImageCrawler(storage={"root_dir": "hasimoto"})
crawler.crawl(keyword="橋本奈々未", max_num=100)

今回はBingで画像を収集を行った。他にもGoogle検索など色々できる。

keyword：検索するキーワード
max_num：検索回数

実行中

2020-07-26 04:35:11,479 - INFO - icrawler.crawler - start crawling...
2020-07-26 04:35:11,479 - INFO - icrawler.crawler - starting 1 feeder threads...
2020-07-26 04:35:11,480 - INFO - feeder - thread feeder-001 exit
2020-07-26 04:35:11,481 - INFO - icrawler.crawler - starting 1 parser threads...
2020-07-26 04:35:11,490 - INFO - icrawler.crawler - starting 1 downloader threads...
2020-07-26 04:35:11,919 - INFO - parser - parsing result page https://www.bing.com/images/async?q=橋本奈々未&first=0
　　　　　　　　　　　　・
　　　　　　　　　　　　・
　　　　　　　　　　　　・
　　　　　　　　　　　　・
2020-07-26 04:35:12,895 - INFO - downloader - image #1	
2020-07-26 04:36:38,505 - INFO - downloader - thread downloader-001 exit
2020-07-26 04:36:38,594 - INFO - icrawler.crawler - Crawling task done!

さいごに

深層学習の学習データ収集や趣味で画像収集を行うのにも役に立ちます

参考にしたサイト

pypi.org