Skip to content

Commit 65e6168

Browse files
macmvfaunaee
andauthored
🎉 New source: Fauna (airbytehq#15274)
* Add fauna source * Update changelog to include the correct PR * Improve docs (#1) * Applied suggestions to improve docs (#2) * Applied suggestions to improve docs * Cleaned up the docs * Apply suggestions from code review Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com> * Update airbyte-integrations/connectors/source-fauna/source_fauna/spec.yaml Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com> Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com> * Flake Checker (#3) * Run ./gradlew :airbyte-integrations:connectors:source-fauna:flakeCheck * Fix all the warnings * Set additionalProperties to true to adhere to acceptance tests * Remove custom fields (airbytehq#4) * Remove custom fields from source.py * Remove custom fields from spec.yaml * Collections that support incremental sync are found correctly * Run formatter * Index values and termins are verified * Stripped additional_columns from collection config and check() * We now search for an index at the start of each sync * Add default for missing data in collection * Add a log message about the index chosen to sync an incremental stream * Add an example for a configured incremental catalog * Check test now validates the simplified check function * Remove collection name from spec.yaml and CollectionConfig * Update test_util.py to ahere to the new config * Update the first discover test to validate that we can find indexes correctly * Remove other discover tests, as they no longer apply * Full refresh test now works with simplified expanded columns * Remove unused imports * Incremental test now adheres to the find_index_for_stream system * Database test passes, so now all unit tests pass again * Remove extra fields from required section * ttl is nullable * Data defaults to an empty object * Update tests to reflect ttl and data select changes * Fix expected records. All unit tests and acceptance tests pass * Cleanup docs for find_index_for_stream * Update setup guide to reflect multiple collections * Add docs to install the fauna shell * Update examples and README to conform to the removal of additional columns Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com>
1 parent 448828b commit 65e6168

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4232
-0
lines changed
Lines changed: 12 additions & 0 deletions
Loading

airbyte-integrations/builds.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
| End-to-End Testing | [![source-e2e-test](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-e2e-test%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-e2e-test) |
3434
| Exchange Rates API | [![source-exchange-rates](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-exchange-rates%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-exchange-rates) |
3535
| Facebook Marketing | [![source-facebook-marketing](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-facebook-marketing%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-facebook-marketing) |
36+
| Fauna | [![source-fauna](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-fauna%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-fauna) |
3637
| Files | [![source-file](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-file%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-file) |
3738
| Flexport | [![source-file](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-flexport%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-flexport) |
3839
| Freshdesk | [![source-freshdesk](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-freshdesk%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-freshdesk) |
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*
2+
!Dockerfile
3+
!main.py
4+
!source_fauna
5+
!setup.py
6+
!secrets
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Python version tools
2+
.tool-versions
3+
../../../.tool-versions
4+
# emacs auto-save files
5+
*~
6+
*#
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
FROM python:3.9.11-alpine3.15 as base
2+
3+
# build and load all requirements
4+
FROM base as builder
5+
WORKDIR /airbyte/integration_code
6+
7+
# upgrade pip to the latest version
8+
RUN apk --no-cache upgrade \
9+
&& pip install --upgrade pip \
10+
&& apk --no-cache add tzdata build-base
11+
12+
13+
COPY setup.py ./
14+
# install necessary packages to a temporary folder
15+
RUN pip install --prefix=/install .
16+
17+
# build a clean environment
18+
FROM base
19+
WORKDIR /airbyte/integration_code
20+
21+
# copy all loaded and built libraries to a pure basic image
22+
COPY --from=builder /install /usr/local
23+
# add default timezone settings
24+
COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime
25+
RUN echo "Etc/UTC" > /etc/timezone
26+
27+
# bash is installed for more convenient debugging.
28+
RUN apk --no-cache add bash
29+
30+
# copy payload code only
31+
COPY main.py ./
32+
COPY source_fauna ./source_fauna
33+
34+
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
35+
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
36+
37+
LABEL io.airbyte.version=dev
38+
LABEL io.airbyte.name=airbyte/source-fauna
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# New Readers
2+
3+
If you know how Airbyte works, read [bootstrap.md](bootstrap.md) for a quick introduction to this source. If you haven't
4+
used airbyte before, read [overview.md](overview.md) for a longer overview about what this connector is and how to use
5+
it.
6+
7+
# For Fauna Developers
8+
9+
## Running locally
10+
11+
First, start a local fauna container:
12+
```
13+
docker run --rm --name faunadb -p 8443:8443 fauna/faunadb
14+
```
15+
16+
In another terminal, cd into the connector directory:
17+
```
18+
cd airbyte-integrations/connectors/source-fauna
19+
```
20+
21+
Once started the container is up, setup the database:
22+
```
23+
fauna eval "$(cat examples/setup_database.fql)" --domain localhost --port 8443 --scheme http --secret secret
24+
```
25+
26+
Finally, run the connector:
27+
```
28+
python main.py spec
29+
python main.py check --config examples/config_localhost.json
30+
python main.py discover --config examples/config_localhost.json
31+
python main.py read --config examples/config_localhost.json --catalog examples/configured_catalog.json
32+
```
33+
34+
To pick up a partial failure you need to pass in a state file. To test via example induce a crash via bad data (e.g. a missing required field), update `examples/sample_state_full_sync.json` to contain your emitted state and then run:
35+
36+
```
37+
python main.py read --config examples/config_localhost.json --catalog examples/configured_catalog.json --state examples/sample_state_full_sync.json
38+
```
39+
40+
## Running the intergration tests
41+
42+
First, cd into the connector directory:
43+
```
44+
cd airbyte-integrations/connectors/source-fauna
45+
```
46+
47+
The integration tests require a secret config.json. Ping me on slack to get this file.
48+
Once you have this file, put it in `secrets/config.json`. A sample of this file can be
49+
found at `examples/secret_config.json`. Once the file is created, build the connector:
50+
```
51+
docker build . -t airbyte/source-fauna:dev
52+
```
53+
54+
Now, run the integration tests:
55+
```
56+
python -m pytest -p integration_tests.acceptance
57+
```
58+
59+
60+
# Fauna Source
61+
62+
This is the repository for the Fauna source connector, written in Python.
63+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/fauna).
64+
65+
## Local development
66+
67+
### Prerequisites
68+
**To iterate on this connector, make sure to complete this prerequisites section.**
69+
70+
#### Minimum Python version required `= 3.9.0`
71+
72+
#### Build & Activate Virtual Environment and install dependencies
73+
From this connector directory, create a virtual environment:
74+
```
75+
python -m venv .venv
76+
```
77+
78+
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
79+
development environment of choice. To activate it from the terminal, run:
80+
```
81+
source .venv/bin/activate
82+
pip install -r requirements.txt
83+
```
84+
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
85+
86+
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
87+
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
88+
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
89+
should work as you expect.
90+
91+
#### Building via Gradle
92+
From the Airbyte repository root, run:
93+
```
94+
./gradlew :airbyte-integrations:connectors:source-fauna:build
95+
```
96+
97+
#### Create credentials
98+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/fauna)
99+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_fauna/spec.yaml` file.
100+
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
101+
See `examples/secret_config.json` for a sample config file.
102+
103+
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source fauna test creds`
104+
and place them into `secrets/config.json`.
105+
106+
### Locally running the connector
107+
```
108+
python main.py spec
109+
python main.py check --config secrets/config.json
110+
python main.py discover --config secrets/config.json
111+
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
112+
```
113+
114+
### Locally running the connector docker image
115+
116+
#### Build
117+
First, make sure you build the latest Docker image:
118+
```
119+
docker build . -t airbyte/source-fauna:dev
120+
```
121+
122+
You can also build the connector image via Gradle:
123+
```
124+
./gradlew :airbyte-integrations:connectors:source-fauna:airbyteDocker
125+
```
126+
When building via Gradle, the docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` `LABEL`s in
127+
the Dockerfile.
128+
129+
#### Run
130+
Then run any of the connector commands as follows:
131+
```
132+
docker run --rm airbyte/source-fauna:dev spec
133+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-fauna:dev check --config /secrets/config.json
134+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-fauna:dev discover --config /secrets/config.json
135+
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-fauna:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
136+
```
137+
## Testing
138+
Make sure to familiarize yourself with [pytest test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery) to know how your test files and methods should be named.
139+
First install test dependencies into your virtual environment:
140+
```
141+
pip install .[tests]
142+
```
143+
### Unit Tests
144+
To run unit tests locally, from the connector directory run:
145+
```
146+
python -m pytest unit_tests
147+
```
148+
149+
### Integration Tests
150+
There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).
151+
#### Custom Integration tests
152+
Place custom tests inside `integration_tests/` folder, then, from the connector root, run
153+
```
154+
python -m pytest integration_tests
155+
```
156+
#### Acceptance Tests
157+
Customize `acceptance-test-config.yml` file to configure tests. See [Source Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/source-acceptance-tests-reference) for more information.
158+
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
159+
To run your integration tests with acceptance tests, from the connector root, run
160+
```
161+
python -m pytest integration_tests -p integration_tests.acceptance
162+
```
163+
To run your integration tests with docker
164+
165+
### Using gradle to run tests
166+
All commands should be run from airbyte project root.
167+
To run unit tests:
168+
```
169+
./gradlew :airbyte-integrations:connectors:source-fauna:unitTest
170+
```
171+
To run acceptance and custom integration tests:
172+
```
173+
./gradlew :airbyte-integrations:connectors:source-fauna:integrationTest
174+
```
175+
176+
## Dependency Management
177+
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
178+
We split dependencies between two groups, dependencies that are:
179+
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
180+
* required for the testing need to go to `TEST_REQUIREMENTS` list
181+
182+
### Publishing a new version of the connector
183+
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
184+
1. Make sure your changes are passing unit and integration tests.
185+
1. Bump the connector version in `Dockerfile` -- just increment the value of the `LABEL io.airbyte.version` appropriately (we use [SemVer](https://semver.org/)).
186+
1. Create a Pull Request.
187+
1. Pat yourself on the back for being an awesome contributor.
188+
1. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# See [Source Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/source-acceptance-tests-reference)
2+
# for more information about how to configure these tests
3+
connector_image: airbyte/source-fauna:dev
4+
tests:
5+
spec:
6+
- spec_path: "source_fauna/spec.yaml"
7+
connection:
8+
- config_path: "secrets/config.json"
9+
status: "succeed"
10+
- config_path: "secrets/config-deletions.json"
11+
status: "succeed"
12+
- config_path: "integration_tests/config/invalid.json"
13+
status: "failed"
14+
discovery:
15+
- config_path: "secrets/config.json"
16+
- config_path: "secrets/config-deletions.json"
17+
basic_read:
18+
- config_path: "secrets/config.json"
19+
configured_catalog_path: "integration_tests/configured_catalog.json"
20+
empty_streams: []
21+
expect_records:
22+
path: "integration_tests/expected_records.txt"
23+
extra_fields: no
24+
exact_order: yes
25+
extra_records: no
26+
- config_path: "secrets/config-deletions.json"
27+
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
28+
empty_streams: []
29+
expect_records:
30+
path: "integration_tests/expected_deletions_records.txt"
31+
extra_fields: no
32+
exact_order: yes
33+
extra_records: no
34+
incremental:
35+
- config_path: "secrets/config.json"
36+
configured_catalog_path: "integration_tests/configured_catalog.json"
37+
# Note that the time in this file was generated with this fql:
38+
# ToMicros(ToTime(Date("9999-01-01")))
39+
future_state_path: "integration_tests/abnormal_state.json"
40+
- config_path: "secrets/config-deletions.json"
41+
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
42+
future_state_path: "integration_tests/abnormal_deletions_state.json"
43+
full_refresh:
44+
- config_path: "secrets/config.json"
45+
configured_catalog_path: "integration_tests/configured_catalog.json"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/usr/bin/env sh
2+
3+
# Build latest connector image
4+
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2-)
5+
6+
# Pull latest acctest image
7+
docker pull airbyte/source-acceptance-test:latest
8+
9+
# Run
10+
docker run --rm -it \
11+
-v /var/run/docker.sock:/var/run/docker.sock \
12+
-v /tmp:/tmp \
13+
-v $(pwd):/test_input \
14+
airbyte/source-acceptance-test \
15+
--acceptance-test-config /test_input
16+
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Fauna Source
2+
3+
[Fauna](https://fauna.com/) is a serverless "document-relational" database that user's interact with via APIs. This connector delivers Fauna as an airbyte source.
4+
5+
This source is implemented in the [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python).
6+
It also uses the [Fauna Python Driver](https://docs.fauna.com/fauna/current/drivers/python), which
7+
allows the connector to build FQL queries in python. This driver is what queries the Fauna database.
8+
9+
Fauna has collections (similar to tables) and documents (similar to rows).
10+
11+
Every document has at least 3 fields: `ref`, `ts` and `data`. The `ref` is a unique string identifier
12+
for every document. The `ts` is a timestamp, which is the time that the document was last modified.
13+
The `data` is arbitrary json data. Because there is no shape to this data, we also allow users of
14+
airbyte to specify which fields of the document they want to export as top-level columns.
15+
16+
Users can also choose to export the `data` field itself in the raw and in the case of incremental syncs, metadata regarding when a document was deleted.
17+
18+
We currently only provide a single stream, which is the collection the user has chosen. This is
19+
because to support incremental syncs we need an index with every collection, so it ends up being easier to just have the user
20+
setup the index and tell us the collection and index name they wish to use.
21+
22+
## Full sync
23+
24+
This source will simply call the following [FQL](https://docs.fauna.com/fauna/current/api/fql/): `Paginate(Documents(Collection("collection-name")))`.
25+
This queries all documents in the database in a paginated manner. The source then iterates over all the results from that query to export data from the connector.
26+
27+
Docs:
28+
[Paginate](https://docs.fauna.com/fauna/current/api/fql/functions/paginate?lang=python).
29+
[Documents](https://docs.fauna.com/fauna/current/api/fql/functions/documents?lang=python).
30+
[Collection](https://docs.fauna.com/fauna/current/api/fql/functions/collection?lang=python).
31+
32+
## Incremental sync
33+
34+
### Updates (uses an index over ts)
35+
36+
The source will call FQL similar to this: `Paginate(Range(Match(Index("index-name")), <last-sync-ts>, []))`.
37+
The index we match against has the values `ts` and `ref`, so it will sort by the time since the document
38+
has been modified. The Range() will limit the query to just pull the documents that have been modified
39+
since the last query.
40+
41+
Docs:
42+
[Range](https://docs.fauna.com/fauna/current/api/fql/functions/range?lang=python).
43+
[Match](https://docs.fauna.com/fauna/current/api/fql/functions/match?lang=python).
44+
[Index](https://docs.fauna.com/fauna/current/api/fql/functions/iindex?lang=python).
45+
46+
### Deletes (uses the events API)
47+
48+
If the users wants deletes, we have a seperate query for that:
49+
`Paginate(Events(Documents(Collection("collection-name"))))`. This will paginate over all the events
50+
in the documents of the collection. We also filter this to only give us the events since the recently
51+
modified documents. Using these events, we can produce a record with the "deleted at" field set, so
52+
that users know the document has been deleted.
53+
54+
Docs:
55+
[Events](https://docs.fauna.com/fauna/current/api/fql/functions/events?lang=python).
56+
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
plugins {
2+
id 'airbyte-python'
3+
id 'airbyte-docker'
4+
id 'airbyte-source-acceptance-test'
5+
}
6+
7+
airbytePython {
8+
moduleDirectory 'source_fauna_singer'
9+
}

0 commit comments

Comments
 (0)