Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Tests
on: [push, pull_request]
jobs:
test:
strategy:
matrix:
python-version: [2.7, 3.6, 3.7, 3.8, 3.9]
fail-fast: false
name: Python ${{ matrix.python-version }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install requirements (Python 2)
if: ${{ matrix.python-version == '2.7' }}
run: pip install -r requirements-dev-py2.txt && pip install .
- name: Install requirements (Python 3)
if: ${{ matrix.python-version != '2.7' }}
run: pip install -r requirements-dev.txt && pip install .
- name: Run tests
run: pytest --cov=datapusher --cov-append --cov-report=xml --disable-warnings tests
- name: Upload coverage report to codecov
uses: codecov/codecov-action@v1
with:
file: ./coverage.xml
47 changes: 24 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
[![Build Status](https://travis-ci.org/ckan/datapusher.png?branch=master)](https://travis-ci.org/ckan/datapusher)
[![Coverage Status](https://coveralls.io/repos/ckan/datapusher/badge.png?branch=master)](https://coveralls.io/r/ckan/datapusher?branch=master)
[![Tests](https://github.com/ckan/datapusher/actions/workflows/test.yml/badge.svg)](https://github.com/ckan/datapusher/actions/workflows/test.yml)
[![Latest Version](https://img.shields.io/pypi/v/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
[![Downloads](https://img.shields.io/pypi/dm/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/datapusher.svg)](https://pypi.python.org/pypi/datapusher/)
Expand Down Expand Up @@ -67,7 +66,7 @@ If you need to change the host or port, copy `deployment/datapusher_settings.py`

To run the tests:

nosetests
pytest

## Production deployment

Expand All @@ -85,24 +84,24 @@ probably need to set up Nginx as a reverse proxy in front of it and something li
Supervisor to keep the process up.


# Install requirements for the DataPusher
sudo apt install python3-venv python3-dev build-essential
sudo apt-get install python-dev python-virtualenv build-essential libxslt1-dev libxml2-dev git libffi-dev
# Install requirements for the DataPusher
sudo apt install python3-venv python3-dev build-essential
sudo apt-get install python-dev python-virtualenv build-essential libxslt1-dev libxml2-dev git libffi-dev

# Create a virtualenv for datapusher
# Create a virtualenv for datapusher
sudo python3 -m venv /usr/lib/ckan/datapusher

# Create a source directory and switch to it
sudo mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src
# Create a source directory and switch to it
sudo mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src

# Clone the source (you should target the latest tagged version)
sudo git clone -b 0.0.17 https://github.com/ckan/datapusher.git
# Clone the source (you should target the latest tagged version)
sudo git clone -b 0.0.17 https://github.com/ckan/datapusher.git

# Install the DataPusher and its requirements
cd datapusher
sudo /usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
sudo /usr/lib/ckan/datapusher/bin/python setup.py develop
# Install the DataPusher and its requirements
cd datapusher
sudo /usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
sudo /usr/lib/ckan/datapusher/bin/python setup.py develop

# Create a user to run the web service (if necessary)
sudo addgroup www-data
Expand Down Expand Up @@ -132,8 +131,8 @@ The default DataPusher configuration uses SQLite as the backend for the jobs dat
sudo -u postgres createuser -S -D -R -P datapusher_jobs
sudo -u postgres createdb -O datapusher_jobs datapusher_jobs -E utf-8

# Run this in the virtualenv where DataPusher is installed
pip install psycopg2
# Run this in the virtualenv where DataPusher is installed
pip install psycopg2

# Edit SQLALCHEMY_DATABASE_URI in datapusher_settings.py accordingly
# eg SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs
Expand All @@ -143,9 +142,9 @@ The default DataPusher configuration uses SQLite as the backend for the jobs dat

```
# ... rest of datapusher-uwsgi.ini
workers = 3
threads = 3
lazy-apps = true
workers = 3
threads = 3
lazy-apps = true
```

## Configuring
Expand Down Expand Up @@ -184,12 +183,14 @@ Here's a summary of the options available.
| SSL_VERIFY | False | Do not validate SSL certificates when requesting the data file (*Warning*: Do not use this setting in production) |
| TYPES | [messytables.StringType, messytables.DecimalType, messytables.IntegerType, messytables.DateUtilType] | [Messytables][] types used internally, can be modified to customize the type guessing |
| TYPE_MAPPING | {'String': 'text', 'Integer': 'numeric', 'Decimal': 'numeric', 'DateUtil': 'timestamp'} | Internal Messytables type mapping |
| LOG_FILE | `/tmp/ckan_service.log` | Where to write the logs. Use an empty string to disable |
| STDERR | `True` | Log to stderr? |


Most of the configuration options above can be also provided as environment variables prepending the name with `DATAPUSHER_`, eg `DATAPUSHER_SQLALCHEMY_DATABASE_URI`, `DATAPUSHER_PORT`, etc.
Most of the configuration options above can be also provided as environment variables prepending the name with `DATAPUSHER_`, eg `DATAPUSHER_SQLALCHEMY_DATABASE_URI`, `DATAPUSHER_PORT`, etc. In the specific case of `DATAPUSHER_STDERR` the possible values are `1` and `0`.


By default DataPusher uses SQLite as the database backend for the jobs information. This is fine for local development and sites with low activity, but for sites that need more performance should use Postgres as the backend for the jobs database (eg `SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs`. See also [High Availability Setup](#high-availability-setup). If SQLite is used, is probably a good idea to store the database in a location other than `/tmp`. This will prevent the database being dropped, causing out of sync errors in the CKAN side. A good place to store it is the CKAN storage folder (if DataPusher is installed in the same server), generally in `/var/lib/ckan/`.
By default, DataPusher uses SQLite as the database backend for jobs information. This is fine for local development and sites with low activity, but for sites that need more performance, Postgres should be used as the backend for the jobs database (eg `SQLALCHEMY_DATABASE_URI=postgresql://datapusher_jobs:YOURPASSWORD@localhost/datapusher_jobs`. See also [High Availability Setup](#high-availability-setup). If SQLite is used, its probably a good idea to store the database in a location other than `/tmp`. This will prevent the database being dropped, causing out of sync errors in the CKAN side. A good place to store it is the CKAN storage folder (if DataPusher is installed in the same server), generally in `/var/lib/ckan/`.


## Usage
Expand Down
2 changes: 1 addition & 1 deletion datapusher/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.0.17'
__version__ = '0.0.18'
5 changes: 5 additions & 0 deletions deployment/datapusher-uwsgi.ini
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@ max-requests = 5000
vacuum = true
callable = application
buffer-size = 32768

## see High Availability Setup
#workers = 3
#threads = 3
#lazy-apps = true
3 changes: 2 additions & 1 deletion deployment/datapusher_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@
SSL_VERIFY = os.environ.get('DATAPUSHER_SSL_VERIFY', True)

# logging
#LOG_FILE = '/tmp/ckan_service.log'
LOG_FILE = os.environ.get('DATAPUSHER_LOG_FILE', '/tmp/ckan_service.log')
STDERR = bool(int(os.environ.get('DATAPUSHER_STDERR', '1')))
4 changes: 4 additions & 0 deletions requirements-dev-py2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-r requirements.txt
httpretty==0.9.4
pytest
pytest-cov
5 changes: 3 additions & 2 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-r requirements.txt
httpretty==0.9.4
nose
httpretty==1.1.4
pytest
pytest-cov
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
argparse
ckanserviceprovider==0.0.10
ckanserviceprovider==1.0.0
html5lib==1.0.1
messytables==0.15.2
certifi
requests[security]==2.24.0
requests[security]==2.27.1
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',

'Programming Language :: Python :: 3.9',
],

# What does your project relate to?
Expand Down
76 changes: 36 additions & 40 deletions tests/test_acceptance.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,10 @@
"""
import os
import json
import unittest
import datetime

from nose.tools import assert_equal, raises
import httpretty
import requests
import pytest

import datapusher.main as main
import datapusher.jobs as jobs
Expand All @@ -37,9 +35,9 @@ def get_static_file(filename):
return open(join_static_path(filename), 'rb').read()


class TestImport(unittest.TestCase):
class TestImport():
@classmethod
def setup_class(cls):
def setUpClass(cls):
cls.host = 'www.ckan.org'
cls.api_key = 'my-fake-key'
cls.resource_id = 'foo-bar-42'
Expand Down Expand Up @@ -93,18 +91,15 @@ def register_urls(self, filename='simple.csv', format='CSV',
body=json.dumps({'success': True}),
content_type='application/json')


# A URL that mocks checking if a datastore table exists
datastore_check_url = 'http://www.ckan.org/api/3/action/datastore_search'
httpretty.register_uri(httpretty.POST, datastore_check_url,
body=json.dumps({'success': True}),
content_type='application/json')


return source_url, res_url

@httpretty.activate
@raises(util.JobError)
def test_too_large_content_length(self):
"""It should raise JobError if the returned Content-Length header
is too large.
Expand Down Expand Up @@ -136,10 +131,10 @@ def test_too_large_content_length(self):
content_length=size,
content_type='application/json')

jobs.push_to_datastore('fake_id', data, True)
with pytest.raises(util.JobError):
jobs.push_to_datastore('fake_id', data, True)

@httpretty.activate
@raises(util.JobError)
def test_too_large_file(self):
"""It should raise JobError if the data file is too large.

Expand Down Expand Up @@ -172,7 +167,8 @@ def test_too_large_file(self):
'content-length': None
})

jobs.push_to_datastore('fake_id', data, True)
with pytest.raises(util.JobError):
jobs.push_to_datastore('fake_id', data, True)

@httpretty.activate
def test_content_length_string(self):
Expand Down Expand Up @@ -246,12 +242,12 @@ def test_simple_csv(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(headers, [{'type': 'timestamp', 'id': 'date'},
assert (headers == [{'type': 'timestamp', 'id': 'date'},
{'type': 'numeric', 'id': 'temperature'},
{'type': 'text', 'id': 'place'}])
assert_equal(len(results), 6)
assert_equal(
results[0],
assert len(results) == 6
assert (
results[0] ==
{'date': datetime.datetime(2011, 1, 1, 0, 0), 'place': 'Galway',
'temperature': 1})

Expand All @@ -277,11 +273,11 @@ def test_simple_tsv(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(headers, [{'type': 'timestamp', 'id': 'date'},
assert (headers == [{'type': 'timestamp', 'id': 'date'},
{'type': 'numeric', 'id': 'temperature'},
{'type': 'text', 'id': 'place'}])
assert_equal(len(results), 6)
assert_equal(results[0],
assert len(results) == 6
assert (results[0] ==
{'date': datetime.datetime(2011, 1, 1, 0, 0),
'place': 'Galway', 'temperature': 1})

Expand All @@ -307,11 +303,11 @@ def test_simple_ssv(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(headers, [{'type': 'timestamp', 'id': 'date'},
{'type': 'numeric', 'id': 'temperature'},
assert (headers == [{'type': 'timestamp', 'id': 'date'},
{'type': 'numeric', 'id': 'temperature'},
{'type': 'text', 'id': 'place'}])
assert_equal(len(results), 6)
assert_equal(results[0],
assert len(results) == 6
assert (results[0] ==
{'date': datetime.datetime(2011, 1, 1, 0, 0),
'place': 'Galway', 'temperature': 1})

Expand All @@ -336,11 +332,11 @@ def test_simple_xls(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(headers, [{'type': 'timestamp', 'id': 'date'},
assert (headers == [{'type': 'timestamp', 'id': 'date'},
{'type': 'numeric', 'id': 'temperature'},
{'type': 'text', 'id': 'place'}])
assert_equal(len(results), 6)
assert_equal(results[0],
assert len(results) == 6
assert (results[0] ==
{'date': datetime.datetime(2011, 1, 1, 0, 0),
'place': 'Galway', 'temperature': 1})

Expand All @@ -365,7 +361,7 @@ def test_real_csv(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(headers, [{'type': 'text', 'id': 'Directorate'},
assert (headers == [{'type': 'text', 'id': 'Directorate'},
{'type': 'text', 'id': 'Service Area'},
{'type': 'text', 'id': 'Expenditure Category'},
{'type': 'timestamp', 'id': 'Payment Date'},
Expand All @@ -376,8 +372,8 @@ def test_real_csv(self):
{'type': 'text',
'id': 'Cost Centre Description'},
{'type': 'numeric', 'id': 'Grand Total'}])
assert_equal(len(results), 230)
assert_equal(results[0],
assert len(results) == 230
assert (results[0] ==
{'Directorate': 'Adult and Culture',
'Service Area': 'Ad Serv-Welfare Rights- ',
'Expenditure Category': 'Supplies & Services',
Expand Down Expand Up @@ -411,12 +407,11 @@ def test_weird_header(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(len(headers), 9)
assert_equal(len(results), 82)
assert_equal(headers[0]['id'].strip(), '1985')
assert_equal(results[1]['1993'].strip(), '379')
assert len(headers) == 9
assert len(results) == 82
assert headers[0]['id'].strip() == '1985'
assert results[1]['1993'].strip() == '379'

@raises(util.JobError)
@httpretty.activate
def test_bad_url(self):
"""It should raise HTTPError(JobError) if the resource.url is badly
Expand All @@ -436,9 +431,9 @@ def test_bad_url(self):
}
}

jobs.push_to_datastore('fake_id', data, True)
with pytest.raises(util.JobError):
jobs.push_to_datastore('fake_id', data, True)

@raises(util.JobError)
@httpretty.activate
def test_bad_scheme(self):
"""It should raise HTTPError(JobError) if the resource.url is an
Expand All @@ -458,7 +453,8 @@ def test_bad_scheme(self):
}
}

jobs.push_to_datastore('fake_id', data, True)
with pytest.raises(util.JobError):
jobs.push_to_datastore('fake_id', data, True)

@httpretty.activate
def test_mostly_numbers(self):
Expand All @@ -481,8 +477,8 @@ def test_mostly_numbers(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(len(headers), 19)
assert_equal(len(results), 133)
assert len(headers) == 19
assert len(results) == 133

@httpretty.activate
def test_long_file(self):
Expand All @@ -505,8 +501,8 @@ def test_long_file(self):

headers, results = jobs.push_to_datastore('fake_id', data, True)
results = list(results)
assert_equal(len(headers), 1)
assert_equal(len(results), 4000)
assert len(headers) == 1
assert len(results) == 4000

@httpretty.activate
def test_do_not_push_when_same_hash(self):
Expand Down
Loading