Data Ductus is migrating from Brilliant to Xledger as a business system. The data stored in Xledger is needed for internal business analytics. This app will fetch the necessary business data from Xledger's GraphQL API and write the data to a Data Lake as .parquet files. This raw data will be transformed and loaded into a structure more suitable for business analytics, but that is outside the scope of this application. The focus of this application is to produce and keep the raw data synchronized in the Data Lake. The application is built on Azure serverless infrastructure.
The app is developed as a function app in Azure (Figure 1). It consists of several Azure functions and associated triggers. Each function is responsible for fetching and keeping a particular type of data synchronized in the Data Lake. For example, one function handles timesheet data, while another handles project data. An App Configuration component is used to store the state of the synchronization. All data is written to the ddbistorage account.
Figure 1: High-level architecture showing the flow of data from timers to Azure Functions and the Data Lake. The illustration only shows one function, but the function app will have a timer and function for each type of business data.
Performs a full load of all timesheet data using the timesheets endpoint. After a full data load, only new data is retrieved using a combination of the timesheet_deltas and timesheets endpoints. The data includes time reported by Data Ductus employees.
Performs a full load of all project data using the projects endpoint. After a full data load, only new data is retrieved using a combination of the project_deltas and projects endpoints. The data includes information on ongoing projects for Data Ductus.
Performs a full load of all employee data using the employees endpoint. After a full data load, only new data is retrieved using a combination of the employee_deltas and employees endpoints. The data includes information on all Data Ductus employees, such as contact information, employment type, salary group, etc.
Performs a full load of all customer data using the customers endpoint. After a full data load, only new data is retrieved using a combination of the customer_deltas and customers endpoints. The data includes information on all Data Ductus customers, such as company name and contact information.
Performs a full load of all supplier data using the suppliers endpoint. After a full data load, only new data is retrieved using a combination of the supplier_deltas and suppliers endpoints. The data includes information on all Data Ductus suppliers, such as company name and contact information.
Performs a full load of all transaction data using the transactions endpoint. After a full data load, only new data is retrieved using a combination of the transaction_deltas and transactions endpoints. The data includes all invoices, both incoming and outgoing, for Data Ductus. It corresponds to "Huvudbokstransaktioner" in Xledger.
Performs a full load of all accounts receivable transaction data using the arTransactions endpoint. After a full data load, only new data is retrieved using a combination of the arTransaction_deltas and arTransactions endpoints. These transactions are invoices sent from Data Ductus to external entities (e.g., invoices to customers).
Performs a full load of all accounts payable transaction data using the apTransactions endpoint. After a full data load, only new data is retrieved using a combination of the apTransaction_deltas and apTransactions endpoints. These transactions are invoices sent to Data Ductus (e.g., invoices from suppliers or travel expenses from employees).
Fetches a table of Employee Groups from Xledger using a Flexlink and writes it to the Data lake.
Fetches a table of Employment Types from Xledger using a Flexlink and writes it to the Data lake.
Fetches a table of cost category data from Xledger using a Flexlink and writes it to the Data Lake.
Fetches a table of project cost setup data from Xledger using a Flexlink and writes it to the Data Lake.
Fetches a table of cost element data per time type using a Flexlink and writes it to the Data Lake.
Fetches all part time employees using flex links. A seperate flex link is used for every subsidery (data ductus luleå ab, tromb ab etc).
Go to Xledger and look for the correct endpoint for your data (https://demo.xledger.net/GraphQL). If your data has endpoints that support deltas (e.g., timesheet_deltas, employee_deltas), you can perform both full synchronizations and synchronize changes over time. Otherwise, you can only perform a full synchronization each time the function is triggered. Once you have a query you are satisfied with in Xledger, copy the node fields and create a new function following the same template as the existing functions in the function folder. You can reuse most of the code.
- Identify the table in Xledger you want to export.
- Click "Create Export Link".
- Choose to export the link in Excel format. Parameter refinement is not currently used.
- Copy the Flexlink and securely store it in the azure pipeline variable group. There is one variable group for dev and one for prod.
- Add the variable to
./infrastructure/environments/[dev|prod]/variables.tf. - Update
./infrastructure/environments/modules/function-app/variables.tfto include the variable to pass the flexlink as a variable to the function-app moule. - Update
./infrastructure/environments/modules/function-app/main.tfto include the variable in app_settings. - Update
./shared/environment_configto load the variable from the environment. - Update
./templates/deploy_infrastructure.ymlto deploy the app with the new variable. - Write the function using existing library code. See
./function/employee_groupsas template. You can copy most of the function code with a few modifications, but have to write thesettings.pyfile. - Register the function in
function_app.py.
API keys for the dev and prod environments are generated in an Xledger account. Administrator access is required. The demo API keys expire after 2 weeks, so the prod API key is used for both xledger-dev and xledger-prod. The API keys from Prod are copied over to demo every 2 weeks along with all other Xledger data. API keys are stored within variable groups in the Azure DevOps pipeline and are deployed as environment variable of the function app in the pipeline. Upon expiry, these keys will need to be updated to keep the app running.
The files are written to a Data Lake in the ddbistorage account. Depending on the environment, the data is written to either the xledger-dev or xledger-prod container. There are two types of files: full_sync or sync_changes. Full_sync files contain a complete data synchronization, while sync_changes files contain only the items that have changed since the last full synchronization. The files are organized in containers (folders), one folder per business data type.
Below is an example of the file format and structure:
`timesheets/full_sync-20240711_21_17_09-timesheets.parquet`
`timesheets/sync_changes-20240712_21_17_09-timesheets.parquet`
`projects/full_sync-20240711_21_17_09-projects.parquet`
`projects/sync_changes-20240712_21_17_09-projects.parquet`
A service connection called "xledger" has been configured in Azure DevOps, pointing to an app registration inside Azure Portal. It was configured using the automated approach in Azure DevOps. Service principal properties are stored in a variable group within the pipeline and are loaded into the pipeline for authentication.
The infrastructure is defined in Terraform. An Azure storage account is used as the backend for the state. The function app is deployed as a module, with different values for dev and prod. There is also a shared-data module that contains all existing resources used by the function app. These are resoures that Atollo already has deployed and are not managed by this terraform code. All deployment variables, backend settings, provider settings, modules etc. can be found in:
infrastructure/.
The function app is also deployed as part of the Azure DevOps pipeline.
