Skip to content

rpyxao/dumping-machine

 
 

Repository files navigation

Dumping Machine

Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet.


Table of Contents (Optional)


Installation

  • Clone this repo to your local machine using https://github.com/grupozap/dumping-machine

Build requirements

  • JDK 8

Setup

Make sure you've made changes to config/application.yml

$ ./gradlew clean run

Compatibility


Partition

Partitioning is by date and hour

{TOPIC_NAME}/{DATE}/{HOUR}/{PARQUET_FILE}

Example:

prod-dataplatform-events/dt=2019-08-30/hr=22/1_78465.parquet
prod-dataplatform-events/dt=2019-08-30/hr=23/3_78977.parquet
prod-dataplatform-events/dt=2019-08-31/hr=00/8_77567.parquet

Hive Metastore

Dumping Machine supports Hive Metastore for the following operations:

  • Create database
  • Create table
  • Update table
  • Add partition

Team

Made with ❤️ by the Grupo ZAP engineering team

About

Dumping Machine is an application which dumps Kafka Avro topics to S3 or HDFS as Parquet

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Java 99.3%
  • Dockerfile 0.7%