Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
29 changes: 29 additions & 0 deletions INSTRUCOES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Desafio Webcrawler BIT

## Configurações Iniciais

Basta apenas se certificar de que possui o _Python 3_ e seu pacote de instalação (_pip_) instalado, precisa também ter o _MongoDB_ instalado na sua máquina e instalar os pacotes que estão dentro do arquivo _"requirements.txt"_. Feito isso, tudo já estará pronto para continuar!

[Link para auxílio na instalação do MongoDB](https://docs.mongodb.com/manual/installation/)

_Para a instalação dos pacotes..._
```console
$ pip install -r requirements.txt
```


## Como Rodar

Execute o comando (dentro da pasta _quotes_project_)
```console
$ scrapy crawl quotes
```

O web crawler irá rodar, de acordo com o que está definido em _quote_spider.py_. Logo após, as informações serão salvas no banco de dados criado com o nome de _quotestoscrape_, dentro da collection _diogo_castro_, conforme definido pelo pipeline criado.

Ambos os arquivos exportados, tanto em JSON quanto em CSV estão disponíveis na pasta principal
do projeto (_quotes_project_), caso seja necessário ter uma base de como os dados estão estruturados e dispostos.

## Relatórios

Todas as queries necessárias solicitadas, com comentários explicando quais serão os retornos, estão dentro do arquivo _queries.js_. Tudo que precisa ser feito é executar esses comandos para que os resultados sejam retornados da forma esperada.
101 changes: 101 additions & 0 deletions quotes_project/diogo_castro.csv

Large diffs are not rendered by default.

Loading