-
Notifications
You must be signed in to change notification settings - Fork 70
changing the p9fs options for i/o performance #1095
Description
Description of problem
tuning the p9fs block size and cache modes allows significant performance boost (10x) in KVM. Is there a process for setting these in cc-runtime?
Example
In KVM, I have found two options that significantly increase IO (see below).
This is on a server with a raidz2 ZFS array with 8x Micron 9100 1.2TB NVMe SSDs. There's plenty of head room, Raw IO on this array is around 3GB/sec/process plateauing at about 30GB/sec for 20 processes in iozone3.
With KVM guests using plan9 file system, it looks possible to get about 1GB/sec per CPU but we're getting only about 130MB/sec with clear containers and bind-mounted storage.
Host:
as the filesystem (ZFS) is consistent on the host by design, it's safe to use the passthough mode
<filesystem type='mount' accessmode='passthrough'>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Client
in the mount options, adjusting the msize (packet payload in bytes) and disabling the client cache has a huge effect on I/O
msize=524288,cache=none
Actual result
With my standard KVM clients on the same host, I get about 1GB/sec/process (measured with iozone3). With the cc-runtime backed docker storage (bind mounts) I get only 130MB/sec.
Any help in setting these options in the cc-runtime options would be great as they are critical to good performance.
Settings output
[Runtime]
Debug = false
[Runtime.Version]
Semver = "3.0.23"
Commit = "64d2226"
OCI = "1.0.1"
[Runtime.Config]
Path = "/usr/share/defaults/clear-containers/configuration.toml"
[Hypervisor]
MachineType = "pc"
Version = "QEMU emulator version 2.7.1(2.7.1+git.d4a337fe91-11.cc), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers"
Path = "/usr/bin/qemu-lite-system-x86_64"
Debug = false
BlockDeviceDriver = "virtio-scsi"
[Image]
Path = "/usr/share/clear-containers/cc-20640-agent-6f6e9e.img"
[Kernel]
Path = "/usr/share/clear-containers/vmlinuz-4.14.22-86.container"
Parameters = ""
[Proxy]
Type = "ccProxy"
Version = "Version: 3.0.23+git.3cebe5e"
Path = "/usr/libexec/clear-containers/cc-proxy"
Debug = false
[Shim]
Type = "ccShim"
Version = "shim version: 3.0.23 (commit: 205ecf7)"
Path = "/usr/libexec/clear-containers/cc-shim"
Debug = false
[Agent]
Type = "hyperstart"
Version = "<<unknown>>"
[Host]
Kernel = "4.9.0-6-amd64"
Architecture = "amd64"
VMContainerCapable = true
[Host.Distro]
Name = "Debian GNU/Linux"
Version = "9"
[Host.CPU]
Vendor = "GenuineIntel"
Model = "Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz"
Runtime config files
Runtime default config files
/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml
Runtime config file contents
Output of "cat "/etc/clear-containers/configuration.toml"":
# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX: Name: Intel® Clear Containers
# XXX: Type: cc
[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""
# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0 --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores --> will be set to the actual number of physical cores
default_vcpus = 1
# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
# This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0 --> will be set to 1
# > 1 <= 5 --> will be set to the specified number
# > 5 --> will be set to 5
default_bridges = 1
# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false
# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or
# virtio-blk.
block_device_driver = "virtio-scsi"
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
#
# Default false
#enable_debug = true
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"
# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true
[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"
# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true
[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
# - bridged
# Uses a linux bridge to interconnect the container interface to
# the VM. Works for most cases except macvlan and ipvlan.
#
# - macvtap
# Used when the Container network interface can be bridged using
# macvtap.
internetworking_model="bridged"
Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"":
# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "config/configuration.toml.in"
# XXX: Project:
# XXX: Name: Intel® Clear Containers
# XXX: Type: cc
[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""
# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0 --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores --> will be set to the actual number of physical cores
default_vcpus = 1
# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
# This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0 --> will be set to 1
# > 1 <= 5 --> will be set to the specified number
# > 5 --> will be set to 5
default_bridges = 1
# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false
# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or
# virtio-blk.
block_device_driver = "virtio-scsi"
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
#
# Default false
#enable_debug = true
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"
# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true
[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"
# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true
[agent.cc]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
# - bridged
# Uses a linux bridge to interconnect the container interface to
# the VM. Works for most cases except macvlan and ipvlan.
#
# - macvtap
# Used when the Container network interface can be bridged using
# macvtap.
internetworking_model="bridged"
Agent
version:
unknown
Logfiles
Runtime logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent runtime problems found in system journal.
Proxy logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent proxy problems found in system journal.
Shim logs
/usr/bin/cc-collect-data.sh: line 242: journalctl: command not found
No recent shim problems found in system journal.
Container manager details
Have docker
Docker
Output of "docker version":
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:54 2017
OS/Arch: linux/amd64
Experimental: false
Output of "docker info":
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 17.12.0-ce
Storage Driver: zfs
Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
Zpool Health: not available
Parent Dataset: zpool2/docker
Space Used By Parent: 3466528128
Space Available: 3851731992320
Parent Quota: no
Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime runc
Default Runtime: cc-runtime
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: 64d2226 (expected: b2567b37d7b75eb4cf325b77297b140ea686ce8f)
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.0-6-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 376.6GiB
Name: <redacted>
ID: <redacted>
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 42
Goroutines: 54
System Time: 2018-04-13T19:42:44.065725616+10:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Output of "systemctl show docker":
/usr/bin/cc-collect-data.sh: line 167: systemctl: command not found
No kubectl
Packages
Have dpkg
Output of "dpkg -l|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":
ii cc-proxy 3.0.23+git.3cebe5e-27 amd64
ii cc-runtime 3.0.23+git.64d2226-27 amd64
ii cc-runtime-bin 3.0.23+git.64d2226-27 amd64
ii cc-runtime-config 3.0.23+git.64d2226-27 amd64
ii cc-shim 3.0.23+git.205ecf7-27 amd64
ii clear-containers-image 20640-48 amd64 Clear containers image
ii linux-container 4.14.22-86 amd64 linux kernel optimised for container-like workloads.
ii qemu-lite 2.7.1+git.d4a337fe91-11 amd64 linux kernel optimised for container-like workloads.
ii qemu-system-x86 1:2.8+dfsg-6+deb9u3 amd64 QEMU full system emulation binaries (x86)
Have rpm
Output of "rpm -qa|egrep "(cc-oci-runtime|cc-proxy|cc-runtime|cc-shim|kata-proxy|kata-runtime|kata-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86)"":