US Department of Transportation (USDOT) Intelligent Transportation Systems (ITS) Joint Program Office (JPO) Utilities
The JPO ITS utilities repository serves as a central location for deploying open-source utilities used by other JPO-ITS repositories.
Table of Contents
- jpo-utils
- Minimum RAM: 16 GB
- Supported operating systems:
- Ubuntu 22.04 Linux (Recommended)
- Windows 10/11 Professional (Professional version required for Docker virtualization)
- OSX 10 Mojave
- NOTE: Not all images have ARM64 builds (they can still be ran through a compatibility layer)
- Docker-compose V2 - version 3.4 or newer
The jpo-utils repository is intended to be ran with docker-compose v2 as it uses functionality added in the v2 release.
Read the following guides to familiarize yourself with the jpo-utils Docker configuration.
Important!
You must rename sample.env to .env for Docker to automatically read the file. Do not push this file to source control.
A MongoDB instance that is initialized as a standalone replica-set and has configured users is configured in the docker-compose-mongo file. To use a different setup_mongo.sh or create_indexes.js script, pass in the relative path of the new script by overriding the KAFKA_INIT_SCRIPT_RELATIVE_PATH or MONGO_CREATE_INDEXES_SCRIPT_RELATIVE_PATH environmental variables. These scripts facilitate the initialization of the MongoDB Database along with the created indexes.
Where the COMPOSE_PROFILES variable in you're .env file are as follows:
mongo_full- deploys all resources in the docker-compose-mongo.yml filemongo- only deploys themongoandmongo-setupservicesmongo_express- only deploys themongo-expressservice
- Create a copy of
sample.envand rename it to.env. - Update the variable
DOCKER_HOST_IPto the local IP address of the system running docker which can be found by running theifconfigcommand- Hint: look for "inet addr:" within "eth0" or "en0" for OSX
- Set the password for
MONGO_ADMIN_DB_PASSandMONGO_READ_WRITE_PASSenvironmental variables to a secure password. - Set the
COMPOSE_PROFILESvariable to:mongo_full - Run the following command:
docker-compose up -d - Go to
localhost:8082in your browser and verify thatmongo-expresscan see the created database
The Bitnami Legacy Kafka is being used as a hybrid controller and broker in the docker-compose-kafka file. To use a different kafka_init.sh script, pass in the relative path of the new script by overriding the KAFKA_INIT_SCRIPT_RELATIVE_PATH environmental variable. This can help in initializing new topics at startup. Note: the intention for the future is to switch this to use the official Apache Kafka image since Bitnami is no longer a reliable image provider.
An optional kafka-init, schema-registry, and kafka-ui instance can be deployed by configuring the COMPOSE_PROFILES as follows:
kafka_full- deploys all resources in the docker-compose-kafka.yml filekafka- only deploys thekafkaserviceskafka_setup- deploys akafka-setupservice that creates topics in thekafkaservice.kafka_schema_registry- deploys akafka-schema-registryservice that can be used to manage schemas for kafka topicskafka_ui- deploys a web interface to interact with the kafka cluster
During operation, Kafka stores all the messages published to topics in files called logs. These logs are stored on the host system and can quickly become quite large for deployments with real CV volumes of data. A kafka log may be split across multiple files called segments. Each segment is limited in size based upon the KAFKA_LOG_SEGMENT_BYTES environment variable. The number of segments stored is variable but collectively, the total volume of all segments for 1 partition will not exceed the value specified in KAFKA_LOG_RETENTION_BYTES. When Kafka needs to delete data, either because the total KAFKA_LOG_RETENTION_BYTES is exceeded or because data in the oldest segment exceeds the KAFKA_LOG_RETENTION_HOURS an entire segment file will be deleted. Please note, Kafka will never delete the active log segment. So even if the data is far older than the value specified in KAFKA_LOG_RETENTION_HOURS the data may not be deleted if there is not enough data to fill up the segment and cause the creation of a new log segment. Additionally, please note that the values specified in KAFKA_LOG_RETENTION_BYTES and KAFKA_LOG_SEGMENT_BYTES are on a per topic per partition basis. So for a deployment with multiple partitions and topics, the total used storage will likely become quite large even for what may seem like small individual log and segment sizes. When configuring the KAFKA_LOG_SEGMENT_BYTES and KAFKA_LOG_RETENTION_BYTES variables. Make sure that the KAFKA_LOG_RETENTION_BYTES is larger than the KAFKA_LOG_SEGMENT_BYTES. If the value is not larger, Kafka will not be able to generate a new log segment before needing to rotate the logs which will lead to unpredictable behavior.
The Kafka topics created by the kafka-setup service are configured in the kafka-topics-values.yaml file. The topics in that file are organized by the application, and sorted into "Stream Topics" (those with cleanup.policy = delete) and "Table Topics" (with cleanup.policy = compact).
The following enviroment variables can be used to configure Kafka Topic creation.
| Environment Variable | Description |
|---|---|
KAFKA_TOPIC_CREATE_ODE |
Whether to create topics for the ODE |
KAFKA_TOPIC_CREATE_GEOJSONCONVERTER |
Whether to create topics for the GeoJSON Converter |
KAFKA_TOPIC_CREATE_CONFLICTMONITOR |
Whether to create topics for the Conflict Monitor |
KAFKA_TOPIC_CREATE_DEDUPLICATOR |
Whether to create topics for the Deduplicator |
KAFKA_TOPIC_CREATE_OTHER |
Whether to create topics for other applications, this is only useful when you attach a custom kafka-topics-values.yaml file with other topics |
KAFKA_TOPICS_VALUES_FILE |
Path to a custom kafka-topics-values.yaml file |
KAFKA_TOPIC_PARTITIONS |
Number of partitions |
KAFKA_TOPIC_REPLICAS |
Number of replicas |
KAFKA_TOPIC_MIN_INSYNC_REPLICAS |
Minumum number of in-sync replicas (for use with ack=all) |
KAFKA_TOPIC_RETENTION_MS |
Retention time for stream topics, milliseconds |
KAFKA_TOPIC_DELETE_RETENTION_MS |
Tombstone retention time for compacted topics, milliseconds |
KAFKA_CUSTOM_TOPIC_RETENTION_MS |
Retention time for custom stream topics to allow for a secondary retention time. If more granular retention time is required, this can be further customized by configuring the retentionMs per defined topic in the customTopics objects within kafka-topics-values.yaml |
The following environment variables are used to configure the Kafka client for Confluent Cloud.
| Environment Variable | Description |
|---|---|
KAFKA_SECURITY_PROTOCOL |
Security protocol for Kafka |
KAFKA_SASL_MECHANISM |
SASL mechanism for Kafka |
KAFKA_SASL_JAAS_CONFIG |
SASL JAAS configuration for Kafka |
KAFKA_SSL_ENDPOINT_ALGORITHM |
SSL endpoint algorithm for Kafka |
- Create a copy of
sample.envand rename it to.env. - Update the variable
DOCKER_HOST_IPto the local IP address of the system running docker which can be found by running theifconfigcommand- Hint: look for "inet addr:" within "eth0" or "en0" for OSX
- Set the
COMPOSE_PROFILESvariable to:kafka_full - Run the following command:
docker-compose up -d - Go to
localhost:8001in your browser and verify thatkafka-uican see the created kafka cluster and initialized topics
The mongo-connector service connects to specified Kafka topics and deposits these messages to separate collections in the MongoDB Database. The codebase that provides this functionality comes from Confluent using their community licensed cp-kafka-connect image. Documentation for this image can be found here.
Kafka connectors are managed by the
Set the COMPOSE_PROFILES environmental variable as follows:
kafka_connectwill only spin up thekafka-connectandkafka-initservices in docker-compose-connect- NOTE: This implies that you will be using a separate Kafka and MongoDB cluster
kafka_connect_standalonewill run the following:kafka-connectservice from docker-compose-connectkafka-initservice from docker-compose-connectkafkaservice from docker-compose-kafkamongoandmongo-setupservices from docker-compose-mongo
The Kafka connectors created by the kafka-connect-setup service are configured in the kafka-connectors-values.yaml file. The connectors in that file are organized by the application, and given parameters to define the Kafka -> MongoDB sync connector:
| Connector Variable | Required | Condition | Description |
|---|---|---|---|
topicName |
Yes | Always | The name of the Kafka topic to sync from |
collectionName |
Yes | Always | The name of the MongoDB collection to write to |
generateTimestamp |
No | Optional | Enable or disable adding a timestamp to each message (true/false) |
connectorName |
No | Optional | Override the name of the connector from the collectionName to this field instead |
useTimestamp |
No | Optional | Converts the timestampField field at the top level of the value to a BSON date |
timestampField |
No | Required if useTimestamp is true |
The name of the timestamp field at the top level of the message |
useKey |
No | Optional | Override the document _id field in MongoDB to use a specified keyField from the message |
keyField |
No | Required if useKey is true |
The name of the key field |
The following environment variables can be used to configure Kafka Connectors:
| Environment Variable | Description |
|---|---|
CONNECT_URL |
Kafka connect API URL |
CONNECT_LOG_LEVEL |
Kafka connect log level (OFF, ERROR, WARN, INFO) |
CONNECT_TASKS_MAX |
Number of concurrent tasks to configure on kafka connectors |
CONNECT_CREATE_ODE |
Whether to create kafka connectors for the ODE |
CONNECT_CREATE_GEOJSONCONVERTER |
Whether to create topics for the GeojsonConverter |
CONNECT_CREATE_CONFLICTMONITOR |
Whether to create kafka connectors for the Conflict Monitor |
CONNECT_CREATE_DEDUPLICATOR |
Whether to create kafka connectors for the Deduplicator |
CONNECT_KAFKA_CONNECTORS_VALUES_FILE |
Path to a custom kafka-connectors-values.yaml file |
CONNECT_CREATE_OTHER |
Whether to create kafka connectors for other applications, this is only useful when you attach a custom kafka-connectors-values.yaml file with other connectors |
- Create a copy of
sample.envand rename it to.env. - Update the variable
DOCKER_HOST_IPto the local IP address of the system running docker - Set the password for
MONGO_ADMIN_DB_PASSandMONGO_READ_WRITE_PASSenvironmental variables to a secure password. - Set the
COMPOSE_PROFILESvariable to:kafka_connect_standalone,mongo_express,kafka_ui,kafka_setup - Navigate back to the root directory and run the following command:
docker compose up -d - Produce a sample message to one of the sink topics by using
kafka_uiby:- Go to
localhost:8001 - Click local -> Topics
- Select
topic.OdeBsmJson - Select
Produce Message - Leave the defaults except set the
Valuefield to{"foo":"bar"} - Click
Produce Message
- Go to
- View the synced message in
mongo-expressby:- Go to
localhost:8082 - Click
ode-- Or click whatever value you set theMONGO_DB_NAMEto - Click
OdeBsmJson, and now you should see your message!
- Go to
- Feel free to test this with other topics or by producing to these topics using the ODE
The monitoring stack consists of Prometheus for metrics collection and Grafana for visualization, along with several exporters that collect metrics from different services. The configuration is defined in docker-compose-monitoring.yml.
Set the COMPOSE_PROFILES environmental variable as follows:
monitoring_full- deploys all resources in the docker-compose-monitoring.yml fileprometheus- deploys only the Prometheus servicegrafana- deploys only the Grafana servicenode_exporter- deploys only the Node Exporter service for system metricskafka_exporter- deploys only the Kafka Lag Exporter servicemongodb_exporter- deploys only the MongoDB Exporter service
The following environment variables can be used to configure the monitoring stack:
| Environment Variable | Description |
|---|---|
PROMETHEUS_RETENTION |
Data retention period for Prometheus (default: 15d) |
GRAFANA_ADMIN_USER |
Grafana admin username (default: admin) |
GRAFANA_ADMIN_PASSWORD |
Grafana admin password (default: grafana) |
KAFKA_LAG_EXPORTER_ROOT_LOG_LEVEL |
Root log level for kafka lag exporter (default: WARN) |
KAFKA_LAG_EXPORTER_LOG_LEVEL |
Kafka lag exporter log level (default: INFO) |
KAFKA_LAG_EXPORTER_KAFKA_LOG_LEVEL |
Kafka log level for kafka lag exporter (default: ERROR) |
- Create a copy of
sample.envand rename it to.env. - Set the
COMPOSE_PROFILESvariable to:monitoring_full - Update any passwords in the
.envfile for security - Run the following command:
docker compose up -d - Access the monitoring interfaces:
- Grafana:
http://localhost:3000(default credentials: admin/grafana) - Prometheus:
http://localhost:9090
- Grafana:
- The following metrics endpoints will be available:
- Node Exporter:
http://localhost:9100/metrics - Kafka Lag Exporter:
http://localhost:8000/metrics - MongoDB Exporter:
http://localhost:9216/metrics
- Node Exporter:
The scrape configurations for the monitoring stack are defined in the prometheus.yml file. If you would like to add a new scrape configuration, you can do so by adding a new job to the scrape_configs section. Please note that this file doesn't support environment variables, so you will need to manually edit the file.
The following scrape configurations are available:
prometheus- scrapes the Prometheus metricsnode_exporter- scrapes the Node Exporter metricskafka_exporter- scrapes the Kafka Lag Exporter metricsmongodb_exporter- scrapes the MongoDB Exporter metrics
While default passwords are provided for development convenience, it is strongly recommended to:
- Change all passwords before deploying to any environment
- Never use default passwords in production
- Use secure password generation and management practices
- Consider using Docker secrets or environment management tools for production deployments