A news crawler for Life-Long Learning LM
| Media Name | Website | Spider Name |
|---|---|---|
| 自由時報 | news.ltn.com.tw | ltn |
| 中央社 | www.cna.com.tw | cna |
| 中國時報 | www.chinatimes.com | ct |
| 三立新聞 | www.setn.com | setn |
| 華視新聞 | news.cts.com.tw | cts |
scrapy crawl <spider_name>
- if you don't want to save data to database, you can delete NewsCrawlerPGStoragePipeline in setting.py
- you can change postgresql setting use environment variables, see more info in pipelines.py