You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
go into the crawler-system directory by cd crawler-system
using scrapy build-in tools: scrapy genspider <spiderName> <targetUrl> to generate a spider template.
Configurations setup
go into the directory there is a settings.py script file.
you can turn on/off the logging, database, pipelines, middlewares, and other components in it (ref: pttCrawlerSystem/setting.py).
Develop a spider
go to main.py script file and add new line with cmdline.execute("scrapy crawl <spiderName>".split()), and comment other line with cmdline.execute(...) for testing your spider.