Running a Hadoop cluster locally is a labor-intensive process, especially with Kerberos enabled.
This repository allows you to quickly launch only the necessary Hadoop components (HDFS, Hive, YARN, Spark) in Docker with full Kerberos support.
All components can be run independently of each other!
- krb5 - Kerberos server
- hdfs-nn - HDFS Namenode
- hdfs-db - HDFS Datanode
- hive-server - Hive Server
- hive-metastore - Hive Metastore
- hive-metastore-db - Hive DB
- nodemanager - YARN Node Manager
- resourcemanager - YARN Resource Manager
- historyserver - YARN History Server
- clients - Hadoop CLI, Hive Client, Spark client
To build all the images, just call build for docker-vm.yml.
docker-compose -f docker-vm.yml build All system and user keytabs are located in the /opt/keytabs directory
docker-compose -f docker-compose.yml up -d krb5 hdfs-nn hdfs-dndocker-compose -f docker-compose.yml up -d krb5 hdfs-nn hdfs-dn hive-server hive-metastore hive-metastore-dbdocker-compose -f docker-compose.yml up -d krb5 hdfs-nn hdfs-dn resourcemanager nodemanager historyserverdocker-compose -f docker-compose.yml up -d krb5 hdfs-nn hdfs-dn hive-server hive-metastore hive-metastore-db resourcemanager nodemanager historyserver kinit -kt /opt/hadoop/keytabs/hdfs.keytab hdfs/clients.lc.cluster@LC.CLUSTERbeeline -u "jdbc:hive2://hive-server.lc.cluster:10000/default;principal=hive/_HOST@LC.CLUSTER"# hdfs
val fs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://ns:8020"), sc.hadoopConfiguration)
val status = fs.listStatus(new org.apache.hadoop.fs.Path("/"))
status.foreach(x => println(x.getPath))
# hive
spark.sql("show databases").show()