Minner is an easy way to make any web scraper for data-mining. Builded in C++14, with only one shared library, libcurl. With log messages through slack and terminal.
In original version (some parts are still), this scraper is only a service for NF-eBOT, but now, my objective is to refactor this project to make more people use this.
Make fork and refactor for your situation.
- gcc >= 3.5.1
- libcurl - install via OS package manager (ex: apt install libcurl)
- Create
doc/config.hwithdoc/config.h.disttemplate.
cmake . && make
./minner --SCRAPER_KEY
(best choice for dev and good choice for production)
- Install Docker
- Create
doc/config.hwithdoc/config.h.disttemplate.
docker build -t nfebot/minner .
docker run -ti --rm nfebot/minner --SCRAPER_KEY
(best choice for Windows and dev)
- Install Vagrant
vagrant up && vagrant ssh
- Create
doc/config.hwithdoc/config.h.disttemplate.
cd /data && cmake . && make
./minner --SCRAPER_KEY
--nfe-notas-tecnicas nfe.fazenda.gov.br / Notas Técnicas
--nfe-avisos nfe.fazenda.gov.br / Avisos
--sped sped.rfb.gov.br / Destaques
app: application source filesapp/include: application lib/modules source fileapp/include/parsers: web page parse layerapp/include/services: external web services
build: where builded executable is saved (with you use ./scripts/gcc_build.sh)doc: configuration filelib: vendor libsscripts: scripts to help build and installspike: files to test technologies or ideas
- Make
doc/config.hmore simple - Change all #include to use .h files
- Make const parameters in
include/helpers.h - Refactor this code block in
app/main.cpp:
rapidxml::xml_document<> doc;
char *cstr = new char[res.size() + 1];
strcpy(cstr, res.c_str());
doc.parse<0>(cstr);
And a lot of more refactors...
@mattgodbolt @dascandy @famastefano @grisumbras @Corristo
and other guys in C++ Slack Group