Skip to content

Running Spark Applications in container #49

@lqshow

Description

@lqshow

Overview

工作中碰到需要在容器内提交 Spark 的应用,记录下容器配置相关的一些信息。

主要通过环境变量的方式将信息传递给容器中的执行脚本。

Spark configuration

Property Name desc
spark.driver.bindAddress 容器 IP 地址 或直接填写 0.0.0.0
spark.driver.host 主机 IP 地址
spark.driver. port 驱动器监听端口号
spark.ui.port 应用程序dashboard的端口
spark.blockManager.port 块管理器监听的端口

Docker

docker run \
  -ti \
  --rm \
  -p 5000-5010:5000-5010 \
  -e SPARK_DRIVER_PORT=5001 \
  -e SPARK_UI_PORT=5002 \
  -e SPARK_BLOCKMGR_PORT=5003 \
  -e SPARK_DRIVER_HOST="host.domain" \
  spark-driver

Kubenetest

注:以下方案只适用于 Spark 集群在 kubernetes 集群外

apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-dirver-config
data:
  SPARK_CONF_DIR: "/etc/spark/conf"
  HADOOP_CONF_DIR: "/etc/hadoop/conf"
  HADOOP_USER_NAME: hadoop
  SPARK_DRIVER_PORT: "5001"
  SPARK_UI_PORT: "5002"
  SPARK_BLOCKMGR_PORT: "5003"
  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-dirver-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spark-dirver
  template:
    metadata:
      labels:
        app: spark-dirver
    spec:
      containers:
      - name: spark-driver-container
        image: spark-driver
        imagePullPolicy: Always
        command: ["/bin/start.sh"]
        ports:
          - name: app
            containerPort: 3000
          # hostPort 直接将容器的端口与所调度的节点上的端口路由
          - name: spark-driver
            containerPort: 5001
            hostPort: 5001
          - name: spark-ui
            containerPort: 5002
            hostPort: 5002
          - name: spark-blockmgr
            containerPort: 5003
            hostPort: 5003
        env:
          # 通过 Downward API 将 Pod 宿主的 IP 注入到容器的环境变量中
          - name: SPARK_DRIVER_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
        envFrom:
          - configMapRef:
              name: spark-dirver-config
      volumes:
        - name: spark-config-volume
          configMap:
            defaultMode: 0744
            name: spark-dirver-conf
        - name: hadoop-config-volume
          configMap:
            defaultMode: 0744
            name: hadoop-conf

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions