Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
8211778
Minor typo, rephrasing; add HA comments to datapusher-uwsgi.ini
jqnatividad Nov 4, 2020
9c80225
Ensure psycopg2 is installed in datapusher virtualenv
jqnatividad Nov 5, 2020
88b77e8
Assign configuration variables only after it has been loaded by Flask
mdutoo Dec 31, 2021
d9939fa
document the new environment variables
categulario Mar 23, 2022
b0c19f5
Merge pull request #217 from jqnatividad/production-deployment-typo
amercader Mar 29, 2022
7aa0060
Bump dev reqs
amercader Apr 13, 2022
2f2ef9e
Migrate tests to pytest
amercader Apr 13, 2022
99b08fb
Enable github actions
amercader Apr 13, 2022
833b4c1
Tests badge
amercader Apr 13, 2022
2a9ee54
Add pytest-cov requirement
amercader Apr 13, 2022
12f07bf
Migrate missing test file
amercader Apr 13, 2022
7991f0f
Don't test Python 3.10 just yet
amercader Apr 13, 2022
5548ed7
Merge branch 'new-reqs-april-2022'
amercader Apr 13, 2022
09c70df
Merge branch 'master' into categulario/setup-logging
amercader Apr 13, 2022
6f1cf91
Merge pull request #246 from categulario/categulario/setup-logging
amercader Apr 13, 2022
792a6e2
Bump version
amercader Apr 13, 2022
4491c7e
fix configuration loading : 'global' keyword must be used else global…
mdutoo May 13, 2022
a0eb9e4
Assign configuration variables only after it has been loaded by Flask
mdutoo Dec 31, 2021
1ba135b
fix configuration loading : 'global' keyword must be used else global…
mdutoo May 13, 2022
9115eb4
Merge branch 'fix_configuration_loading' of github.com:ozwillo/datapu…
mdutoo May 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Tests
on: [push, pull_request]
jobs:
test:
strategy:
matrix:
python-version: [2.7, 3.6, 3.7, 3.8, 3.9]
fail-fast: false
name: Python ${{ matrix.python-version }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install requirements (Python 2)
if: ${{ matrix.python-version == '2.7' }}
run: pip install -r requirements-dev-py2.txt && pip install .
- name: Install requirements (Python 3)
if: ${{ matrix.python-version != '2.7' }}
run: pip install -r requirements-dev.txt && pip install .
- name: Run tests
run: pytest --cov=datapusher --cov-append --cov-report=xml --disable-warnings tests
- name: Upload coverage report to codecov
uses: codecov/codecov-action@v1
with:
file: ./coverage.xml
47 changes: 24 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
[![Build Status](https://travis-ci.org/ckan/datapusher.png?branch=master)](https://travis-ci.org/ckan/datapusher)
[![Coverage Status](https://coveralls.io/repos/ckan/datapusher/badge.png?branch=master)](https://coveralls.io/r/ckan/datapusher?branch=master)
[![Tests](https://github.com/ckan/datapusher/actions/workflows/test.yml/badge.svg)](https://github.com/ckan/datapusher/actions/workflows/test.yml)
[![Latest Version](https://img.shields.io/pypi/v/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
[![Downloads](https://img.shields.io/pypi/dm/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
Expand Down Expand Up @@ -67,7 +66,7 @@ If you need to change the host or port, copy `deployment/datapusher_settings.py`

To run the tests:

nosetests
pytest

## Production deployment

Expand All @@ -85,24 +84,24 @@ probably need to set up Nginx as a reverse proxy in front of it and something li
Supervisor to keep the process up.


# Install requirements for the DataPusher
sudo apt install python3-venv python3-dev build-essential
sudo apt-get install python-dev python-virtualenv build-essential libxslt1-dev libxml2-dev git libffi-dev
# Install requirements for the DataPusher
sudo apt install python3-venv python3-dev build-essential
sudo apt-get install python-dev python-virtualenv build-essential libxslt1-dev libxml2-dev git libffi-dev

# Create a virtualenv for datapusher
# Create a virtualenv for datapusher
sudo python3 -m venv /usr/lib/ckan/datapusher

# Create a source directory and switch to it
sudo mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src
# Create a source directory and switch to it
sudo mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src

# Clone the source (you should target the latest tagged version)
sudo git clone -b 0.0.17 https://github.com/ckan/datapusher.git
# Clone the source (you should target the latest tagged version)
sudo git clone -b 0.0.17 https://github.com/ckan/datapusher.git

# Install the DataPusher and its requirements
cd datapusher
sudo /usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
sudo /usr/lib/ckan/datapusher/bin/python setup.py develop
# Install the DataPusher and its requirements
cd datapusher
sudo /usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
sudo /usr/lib/ckan/datapusher/bin/python setup.py develop

# Create a user to run the web service (if necessary)
sudo addgroup www-data
Expand Down Expand Up @@ -132,8 +131,8 @@ The default DataPusher configuration uses SQLite as the backend for the jobs dat
sudo -u postgres createuser -S -D -R -P datapusher_jobs
sudo -u postgres createdb -O datapusher_jobs datapusher_jobs -E utf-8

# Run this in the virtualenv where DataPusher is installed
pip install psycopg2
# Run this in the virtualenv where DataPusher is installed
pip install psycopg2

# Edit SQLALCHEMY_DATABASE_URI in datapusher_settings.py accordingly
# eg SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs
Expand All @@ -143,9 +142,9 @@ The default DataPusher configuration uses SQLite as the backend for the jobs dat

```
# ... rest of datapusher-uwsgi.ini
workers = 3
threads = 3
lazy-apps = true
workers = 3
threads = 3
lazy-apps = true
```

## Configuring
Expand Down Expand Up @@ -184,12 +183,14 @@ Here's a summary of the options available.
| SSL_VERIFY | False | Do not validate SSL certificates when requesting the data file (*Warning*: Do not use this setting in production) |
| TYPES | [messytables.StringType, messytables.DecimalType, messytables.IntegerType, messytables.DateUtilType] | [Messytables][] types used internally, can be modified to customize the type guessing |
| TYPE_MAPPING | {'String': 'text', 'Integer': 'numeric', 'Decimal': 'numeric', 'DateUtil': 'timestamp'} | Internal Messytables type mapping |
| LOG_FILE | `/tmp/ckan_service.log` | Where to write the logs. Use an empty string to disable |
| STDERR | `True` | Log to stderr? |


Most of the configuration options above can be also provided as environment variables prepending the name with `DATAPUSHER_`, eg `DATAPUSHER_SQLALCHEMY_DATABASE_URI`, `DATAPUSHER_PORT`, etc.
Most of the configuration options above can be also provided as environment variables prepending the name with `DATAPUSHER_`, eg `DATAPUSHER_SQLALCHEMY_DATABASE_URI`, `DATAPUSHER_PORT`, etc. In the specific case of `DATAPUSHER_STDERR` the possible values are `1` and `0`.


By default DataPusher uses SQLite as the database backend for the jobs information. This is fine for local development and sites with low activity, but for sites that need more performance should use Postgres as the backend for the jobs database (eg `SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs`. See also [High Availability Setup](#high-availability-setup). If SQLite is used, is probably a good idea to store the database in a location other than `/tmp`. This will prevent the database being dropped, causing out of sync errors in the CKAN side. A good place to store it is the CKAN storage folder (if DataPusher is installed in the same server), generally in `/var/lib/ckan/`.
By default, DataPusher uses SQLite as the database backend for jobs information. This is fine for local development and sites with low activity, but for sites that need more performance, Postgres should be used as the backend for the jobs database (eg `SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs`. See also [High Availability Setup](#high-availability-setup). If SQLite is used, its probably a good idea to store the database in a location other than `/tmp`. This will prevent the database being dropped, causing out of sync errors in the CKAN side. A good place to store it is the CKAN storage folder (if DataPusher is installed in the same server), generally in `/var/lib/ckan/`.


## Usage
Expand Down
2 changes: 1 addition & 1 deletion datapusher/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.0.17'
__version__ = '0.0.18'
65 changes: 47 additions & 18 deletions datapusher/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,22 @@
import ckanserviceprovider.util as util
from ckanserviceprovider import web

global_logger = logging.getLogger(__name__)
global_logger.setLevel(logging.DEBUG)

if locale.getdefaultlocale()[0]:
lang, encoding = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, locale=(lang, encoding))
else:
locale.setlocale(locale.LC_ALL, '')

MAX_CONTENT_LENGTH = web.app.config.get('MAX_CONTENT_LENGTH') or 10485760
CHUNK_SIZE = web.app.config.get('CHUNK_SIZE') or 16384
CHUNK_INSERT_ROWS = web.app.config.get('CHUNK_INSERT_ROWS') or 250
DOWNLOAD_TIMEOUT = web.app.config.get('DOWNLOAD_TIMEOUT') or 30
USE_PROXY = 'DOWNLOAD_PROXY' in web.app.config
if USE_PROXY:
DOWNLOAD_PROXY = web.app.config.get('DOWNLOAD_PROXY')

if web.app.config.get('SSL_VERIFY') in ['False', 'FALSE', '0', False, 0]:
SSL_VERIFY = False
else:
SSL_VERIFY = True

if not SSL_VERIFY:
requests.packages.urllib3.disable_warnings()
MAX_CONTENT_LENGTH = None
CHUNK_SIZE = None
CHUNK_INSERT_ROWS = None
DOWNLOAD_TIMEOUT = None
USE_PROXY = None
DOWNLOAD_PROXY = None
SSL_VERIFY = None

_TYPE_MAPPING = {
'String': 'text',
Expand All @@ -57,13 +52,47 @@
_TYPES = [messytables.StringType, messytables.DecimalType,
messytables.IntegerType, messytables.DateUtilType]

TYPE_MAPPING = web.app.config.get('TYPE_MAPPING', _TYPE_MAPPING)
TYPES = web.app.config.get('TYPES', _TYPES)

DATASTORE_URLS = {
'datastore_delete': '{ckan_url}/api/action/datastore_delete',
'resource_update': '{ckan_url}/api/action/resource_update'
}

def init():
"""
Updates declared global variables with values taken from datapusher_settings.py user configuration.
datapusher_settings.py must have been loaded first (so call it after web.init()), else it won't
be taken into account.

"""
global_logger.info("init()")
# 'global' keyword must be used else global variables won't change :
global MAX_CONTENT_LENGTH
MAX_CONTENT_LENGTH = web.app.config.get('MAX_CONTENT_LENGTH') or 10485760
global_logger.info("MAX_CONTENT_LENGTH=%s", str(MAX_CONTENT_LENGTH))
global CHUNK_SIZE
CHUNK_SIZE = web.app.config.get('CHUNK_SIZE') or 16384
global CHUNK_INSERT_ROWS
CHUNK_INSERT_ROWS = web.app.config.get('CHUNK_INSERT_ROWS') or 250
global DOWNLOAD_TIMEOUT
DOWNLOAD_TIMEOUT = web.app.config.get('DOWNLOAD_TIMEOUT') or 30
global USE_PROXY
USE_PROXY = 'DOWNLOAD_PROXY' in web.app.config
if USE_PROXY:
DOWNLOAD_PROXY = web.app.config.get('DOWNLOAD_PROXY')

global SSL_VERIFY
if web.app.config.get('SSL_VERIFY') in ['False', 'FALSE', '0', False, 0]:
SSL_VERIFY = False
else:
SSL_VERIFY = True

if not SSL_VERIFY:
requests.packages.urllib3.disable_warnings()

global TYPE_MAPPING
TYPE_MAPPING = web.app.config.get('TYPE_MAPPING', _TYPE_MAPPING)
global TYPES
TYPES = web.app.config.get('TYPES', _TYPES)


class HTTPError(util.JobError):
Expand Down
1 change: 1 addition & 0 deletions datapusher/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

def serve():
web.init()
jobs.init()
web.app.run(web.app.config.get('HOST'), web.app.config.get('PORT'))


Expand Down
5 changes: 5 additions & 0 deletions deployment/datapusher-uwsgi.ini
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@ max-requests = 5000
vacuum = true
callable = application
buffer-size = 32768

## see High Availability Setup
#workers = 3
#threads = 3
#lazy-apps = true
3 changes: 2 additions & 1 deletion deployment/datapusher_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@
SSL_VERIFY = os.environ.get('DATAPUSHER_SSL_VERIFY', True)

# logging
#LOG_FILE = '/tmp/ckan_service.log'
LOG_FILE = os.environ.get('DATAPUSHER_LOG_FILE', '/tmp/ckan_service.log')
STDERR = bool(int(os.environ.get('DATAPUSHER_STDERR', '1')))
4 changes: 4 additions & 0 deletions requirements-dev-py2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-r requirements.txt
httpretty==0.9.4
pytest
pytest-cov
5 changes: 3 additions & 2 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-r requirements.txt
httpretty==0.9.4
nose
httpretty==1.1.4
pytest
pytest-cov
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
argparse
ckanserviceprovider==0.0.10
ckanserviceprovider==1.0.0
html5lib==1.0.1
messytables==0.15.2
certifi
requests[security]==2.24.0
requests[security]==2.27.1
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',

'Programming Language :: Python :: 3.9',
],

# What does your project relate to?
Expand Down
Loading