Metal³
The Metal³ project (pronounced: “Metal Kubed”) provides components for bare metal host management with Kubernetes. You can enrol your bare metal machines, provision operating system images, and then, if you like, deploy Kubernetes clusters to them. From there, operating and upgrading your Kubernetes clusters can be handled by Metal³. Moreover, Metal³ is itself a Kubernetes application, so it runs on Kubernetes, and uses Kubernetes resources and APIs as its interface.
Metal³ is one of the providers for the Kubernetes sub-project Cluster API. Cluster API provides infrastructure agnostic Kubernetes lifecycle management, and Metal³ brings the bare metal implementation.
This is paired with one of the components from the OpenStack ecosystem, Ironic for booting and installing machines. Metal³ handles the installation of Ironic as a standalone component (there’s no need to bring along the rest of OpenStack). Ironic is supported by a mature community of hardware vendors and supports a wide range of bare metal management protocols which are continuously tested on a variety of hardware. Backed by Ironic, Metal³ can provision machines, no matter the brand of hardware.
In summary, you can write Kubernetes manifests representing your hardware and your desired Kubernetes cluster layout. Then Metal³ can:
- Discover your hardware inventory
- Configure BIOS and RAID settings on your hosts
- Optionally clean a host’s disks as part of provisioning
- Install and boot an operating system image of your choice
- Deploy Kubernetes
- Upgrade Kubernetes or the operating system in your clusters with a non-disruptive rolling strategy
- Automatically remediate failed nodes by rebooting them and removing them from the cluster if necessary
You can even deploy Metal³ to your clusters so that they can manage other clusters using Metal³…
Metal³ is open-source and welcomes community contributions. The community meets at the following venues:
- #cluster-api-baremetal on Kubernetes Slack
- Metal³ development mailing list
- From the mailing list, you’ll also be able to find the details of a weekly Zoom community call on Wednesdays at 14:00 GMT
About this guide
This user guide aims to explain the Metal³ feature set, and provide how-tos for using Metal³. It’s not a tutorial (for that, see the Getting Started Guide). Nor is it a reference (for that, see the API Reference Documentation, and of course, the code itself.)
Baremetal provisioning
This is a guide to provision baremetal servers using the Metal³ project. It is a generic guide with basic implementation, different hardware may require different configuration.
In this guide we will use minikube as management cluster.
All commands are executed on the host where minikube is set up.
This is a separate machine, e.g. your laptop or one of the servers, that has access to the network where the servers are in order to provision them.
Install requirements on the host
Login to the host from where you want to provision. The baremetal nodes should be accesible from the host via one of the following protocols.
* IPMI
* Redfish
* WSMAN
* iRMC
* ibmc
* iLO
See Install Ironic for other requirements.
Install following requirements on the host.
* Python
* Golang
* Docker for ubuntu and podman for Centos
* Ansible
Configure host
-
Create network settings. We are creating 2 bridge interfaces: provisioning and external. The provisioning interface is used by Ironic to provision the BareMetalHosts and the external interface allows them to communicate with each other and connect to internet.
# Create a veth iterface peer. sudo ip link add ironicendpoint type veth peer name ironic-peer # Create provisioning bridge. sudo brctl addbr provisioning sudo ip addr add dev ironicendpoint 172.22.0.1/24 sudo brctl addif provisioning ironic-peer sudo ip link set ironicendpoint up sudo ip link set ironic-peer up # Create the external bridge sudo brctl addbr external sudo ip addr add dev external 192.168.111.1/24 sudo ip link set external up # Add udp forwarding to firewall, this allows to use ipmitool (port 623) # as well as allowing TFTP traffic outside the host (random port) iptables -A FORWARD -p udp -j ACCEPT # Add interface to provisioning bridge brctl addif provisioning eno1 # Set VLAN interface to be up ip link set up dev bmext # Check if bmext interface is addded to the bridge brctl show baremetal | grep bmext # Add bmext to baremeatal bridge brctl addif baremetal bmext
Prepare image cache
-
Start httpd container. This is used to host the the OS images that the BareMetalHosts will be provisioned with.
sudo docker run -d --net host --privileged --name httpd-infra -v /opt/metal3-dev-env/ironic:/shared --entrypoint /bin/runhttpd --env
Download the node image and put it in the folder where the httpd container can host it.
wget -O /opt/metal3-dev-env/ironic/html/images https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.27.1
Convert the qcow2 image to raw format and get the hash of the raw image
# Change IMAGE_NAME and IMAGE_RAW_NAME according to what you download from artifactory cd /opt/metal3-dev-env/ironic/hrtml/images IMAGE_NAME="CENTOS_9_NODE_IMAGE_K8S_v1.27.1.qcow2" IMAGE_RAW_NAME="CENTOS_9_NODE_IMAGE_K8S_v1.27.1-raw.img" qemu-img convert -O raw "${IMAGE_NAME}" "${IMAGE_RAW_NAME}" # Create sha256 hash sha256sum "${IMAGE_RAW_NAME}" | awk '{print $1}' > "${IMAGE_RAW_NAME}.sha256sum"
Launch management cluster using minikube
-
Create a minikube cluster to use as management cluster.
minikube start # Configuring ironicendpoint with minikube minikube ssh sudo brctl addbr ironicendpoint minikube ssh sudo ip link set ironicendpoint up minikube ssh sudo brctl addif ironicendpoint eth2 minikube ssh sudo ip addr add 172.22.0.9/24 dev ironicendpoint
-
Initialize Cluster API and the Metal3 provider.
kubectl create namespace metal3 clusterctl init --core cluster-api:v1.4.2 --bootstrap kubeadm:v1.4.2 --control-plane kubeadm:v1.4.2 --infrastructure=metal3:v1.4.0 -v5
Install provisioning components
-
Launch baremetal operator.
# Clone BMO repo git clone https://github.com/metal3-io/baremetal-operator.git # Run deploy.sh ./baremetal-operator/tools/deploy.sh -b -k -t
-
Launch Ironic.
# Run deploy.sh ./baremetal-operator/tools/deploy.sh -i -k -t
Create Secrets and BareMetalHosts
Create yaml files for each BareMetalHost that will be used. Below is an example.
---
apiVersion: v1
kind: Secret
metadata:
name: <<secret_name_bmh1>>
type: Opaque
data:
username: <<username_bmh1>>
password: <<password_bmh1>>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: <<id_bmh1>>
spec:
online: true
bootMACAddress: <<mac_address_bmh1>>
bootMode: legacy
bmc:
address: <<address_bmh1>> // this depends on the protocol that are mentioned above, they depend on hardware vendor
credentialsName: <<secret_name_bmh1>>
disableCertificateVerification: true
Apply the manifests.
kubectl apply -f ./bmh1.yaml -n metal3
At this point, the BareMetalHosts will go through registering
and inspection
phases before they become available
.
Wait for all of them to be available. You can check their status with kubectl get bmh -n metal3
.
The next step is to create a workload cluster from these BareMetalHosts.
Create and apply cluster, controlplane and worker template
# The URL of the kernel to deploy.
export DEPLOY_KERNEL_URL="http://172.22.0.1:6180/images/ironic-python-agent.kernel"
# The URL of the ramdisk to deploy.
export DEPLOY_RAMDISK_URL="http://172.22.0.1:6180/images/ironic-python-agent.initramfs"
# The URL of the Ironic endpoint.
export IRONIC_URL="http://172.22.0.1:6385/v1/"
# The URL of the Ironic inspector endpoint.
export IRONIC_INSPECTOR_URL="http://172.22.0.1:5050/v1/"
# Do not use a dedicated CA certificate for Ironic API.
# Any value provided in this variable disables additional CA certificate validation.
# To provide a CA certificate, leave this variable unset.
# If unset, then IRONIC_CA_CERT_B64 must be set.
export IRONIC_NO_CA_CERT=true
# Disables basic authentication for Ironic API.
# Any value provided in this variable disables authentication.
# To enable authentication, leave this variable unset.
# If unset, then IRONIC_USERNAME and IRONIC_PASSWORD must be set.
export IRONIC_NO_BASIC_AUTH=true
# Disables basic authentication for Ironic inspector API.
# Any value provided in this variable disables authentication.
# To enable authentication, leave this variable unset.
# If unset, then IRONIC_INSPECTOR_USERNAME and IRONIC_INSPECTOR_PASSWORD must be set.
export IRONIC_INSPECTOR_NO_BASIC_AUTH=true
# Export node image variable and node image hash varibale that we created before.
# Change name according to what was downlowded from artifactory
export IMAGE_RAW_URL=http://172.22.0.1/images/CENTOS_9_NODE_IMAGE_K8S_v1.27.1-raw.img
export IMAGE_RAW_CHECKSUM=http://172.22.0.1/images/CENTOS_9_NODE_IMAGE_K8S_v1.27.1-raw.img.sha256sum
# Generate templates with clusterctl, change control plane and worker count according to
# the number of BareMetalHosts
clusterctl generate cluster capm3-cluster --flavor development \
--kubernetes-version v1.27.0 \
--control-plane-machine-count=3 \
--worker-machine-count=3 \
> capm3-cluster-template.yaml
# Apply the template
kubectl apply -f capm3-cluster-template.yaml
Bare Metal Operator
The Bare Metal Operator (BMO) is a custom Kubernetes controller that deploys baremetal hosts, represented in Kubernetes by BareMetalHost (BMH), as Kubernetes nodes. To this end Ironic is used.
The BMO controller is responsible for the following:
- Inspect the host’s hardware details and report them on the corresponding BareMetalHost. This includes information about CPUs, RAM, disks, NICs, and more.
- Provision hosts with a desired image.
- Clean a host’s disk contents before or after provisioning
The BareMetalHost represents a bare metal host (server). The BareMetalHost contains information about the server as shown below. For brevity, some part of the output are omitted, but we can classify the fields into the following broad categories.
- known server properties: Fields such as
bootMACAddress
properties of the server and are known in advance. - unknown server properties: Fields such as
CPU
anddisk
are properties of the server and are discovered by Ironic. - user supplied: Fields such as
image
are supplied by user to dictate boot image for the server. - dynamic fields: Fields such as IP could be dynamically assigned to the server at run time by DHCP server.
During the life cycle of a bare metal host, upgrade is one example, some of these fields change with information coming from Ironic or other controllers while fields, such as MAC address, do not change (upgrade is one example).
BMO can also work with the Cluster API Provider Metal3 (CAPM3) controller. With the involvement of CAPM3 and Ironic, a simplified information flow path and an overview of the BareMetalHost resource is shown below:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-0
namespace: metal3
spec:
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: node-0-bmc-secret
bootMACAddress: 00:5a:91:3f:9a:bd
image:
checksum: http://172.22.0.1/images/CENTOS_8_NODE_IMAGE_K8S_v1.22.2-raw.img.md5sum
url: http://172.22.0.1/images/CENTOS_8_NODE_IMAGE_K8S_v1.22.2-raw.img
networkData:
name: test1-workers-tbwnz-networkdata
namespace: metal3
online: true
userData:
name: test1-workers-vd4gj
namespace: metal3
status:
hardware:
cpu:
arch: x86_64
count: 2
hostname: node-0
nics:
- ip: 172.22.0.73
mac: 00:5a:91:3f:9a:bd
name: enp1s0
ramMebibytes: 4096
storage:
- hctl: "0:0:0:0"
name: /dev/sda
serialNumber: drive-scsi0-0-0-0
sizeBytes: 53687091200
type: HDD
It would help to use an example to describe what BMO does. There are two operations of interest, getting hardware details of the server and booting the server with a given image, including user supplied cloud-init data. The BareMetalHost resource contains address and authentication information towards a server.
BMO communicates this information to Ironic and gets hardware details (a.k.a. inspection data), such as CPU and disk, of the server in return. This information is added to the BareMetalHost resource status. In order to get such server related information, the server is booted with service ramdisk. If there are hardware related changes, the BareMetalHost is updated accordingly.
The following diagrams ilustrates the information flow and components involved. From the left, the first two boxes represent Kubernetes custom controllers reconciling the custom resources shown inside. The comments, in yellow, show some relevant fields in these resources.
The right most box represents the bare metal server on which the inspection is done, Operating system is installed and bootstrap script is run. And, the third box shows Ironic which synchronizes the information about the Bare Metal server between the two sides.
Next, with the information coming from the CAPM3 side, the BareMetalHost is updated with image and cloud-init data. That information is also conveyed to Ironic and the server is booted accordingly.
This happens for example when the user scales a MachineDeployment so that the server should be added to the cluster, or during an upgrade when it must change the image it is booting from.
The information flow and operations described above are a bit simplified. CAPM3 provides more data and there are other operations, such as disk cleaning, on the Ironic side as well. However, the overall process remains the same. BMO keeps the server and BareMetalHost resource in sync.
To this end, it takes the server as a source of truth for some fields, such as Hardware details. For other fields, such as Boot image, it takes the information from CAPM3 as a source of the truth and does the sync accordingly.
Install Baremetal Operator
Installing Baremetal Operator (BMO) involves usually three steps:
- Clone Metal3 BMO repository
https://github.com/metal3-io/baremetal-operator.git
. - Adapt the configuration settings to your specific needs.
- Deploy BMO in the cluster with or without Ironic.
Note: This guide assumes that a local clone of the repository is available.
Configuration Settings
Review and edit the file ironic.env
found in config/default
.
The operator supports several configuration options for controlling
its interaction with Ironic.
DEPLOY_RAMDISK_URL
– The URL for the ramdisk of the image
containing the Ironic agent.
DEPLOY_KERNEL_URL
– The URL for the kernel to go with the deploy
ramdisk.
DEPLOY_ISO_URL
– The URL for the ISO containing the Ironic agent for
drivers that support ISO boot. Optional if kernel/ramdisk are set.
IRONIC_ENDPOINT
– The URL for the operator to use when talking to
Ironic.
IRONIC_INSPECTOR_ENDPOINT
– The URL for the operator to use when talking to
Ironic Inspector.
IRONIC_CACERT_FILE
– The path of the CA certificate file of Ironic, if needed
IRONIC_INSECURE
– (“True”, “False”) Whether to skip the ironic certificate
validation. It is highly recommend to not set it to True.
IRONIC_CLIENT_CERT_FILE
– The path of the Client certificate file of Ironic,
if needed. Both Client certificate and Client private key must be defined for
client certificate authentication (mTLS) to be enabled.
IRONIC_CLIENT_PRIVATE_KEY_FILE
– The path of the Client private key file of Ironic,
if needed. Both Client certificate and Client private key must be defined for
client certificate authentication (mTLS) to be enabled.
IRONIC_SKIP_CLIENT_SAN_VERIFY
– (“True”, “False”) Whether to skip the ironic
client certificate SAN validation.
BMO_CONCURRENCY
– The number of concurrent reconciles performed by the
Operator. Default is the number of CPUs, but no less than 2 and no more than 8.
PROVISIONING_LIMIT
– The desired maximum number of hosts that could be (de)provisioned
simultaneously by the Operator. The limit does not apply to hosts that use
virtual media for provisioning. The Operator will try to enforce this limit,
but overflows could happen in case of slow provisioners and / or higher number of
concurrent reconciles. For such reasons, it is highly recommended to keep
BMO_CONCURRENCY value lower than the requested PROVISIONING_LIMIT. Default is 20.
IRONIC_EXTERNAL_URL_V6
– This is the URL where Ironic will find the image for
nodes that use IPv6. In dual stack environments, this can be used to tell Ironic which IP
version it should set on the BMC.
Kustomization Configuration
It is possible to deploy baremetal-operator
with three different operator
configurations, namely:
- operator with ironic
- operator without ironic
- ironic without operator
A detailed overview of the configuration is presented in the following sections.
Notes on external Ironic
When an external Ironic is used, the following requirements must be met:
-
Either HTTP basic or no-auth authentication must be used (Keystone is not supported).
-
API version 1.74 (Xena release cycle) or newer must be available.
Authenticating to Ironic
Because hosts under the control of Metal³ need to contact the Ironic and Ironic Inspector APIs during inspection and provisioning, it is highly advisable to require authentication on those APIs, since the provisioned hosts running user workloads will remain connected to the provisioning network.
Configuration
The baremetal-operator
supports connecting to Ironic and Ironic Inspector
configured with the following auth_strategy
modes:
noauth
(no authentication)http_basic
(HTTP Basic access authentication)
Note that Keystone authentication methods are not yet supported.
Authentication configuration is read from the filesystem, beginning at the root
directory specified in the environment variable METAL3_AUTH_ROOT_DIR
. If this
variable is empty or not specified, the default is /opt/metal3/auth
.
Within the root directory there are separate subdirectories, ironic
for
Ironic client configuration, and ironic-inspector
for Ironic Inspector client
configuration. (This allows the data to be populated from separate secrets when
deploying in Kubernetes.)
noauth
This is the default, and will be chosen if the auth root directory does not exist. In this mode, the baremetal-operator does not attempt to do any authentication against the Ironic APIs.
http_basic
This mode is configured by files in each authentication subdirectory named
username
and password
, and containing the Basic auth username and password,
respectively.
Running Bare Metal Operator with or without Ironic
This section explains the deployment scenarios of deploying Bare Metal Operator(BMO) with or without Ironic as well as deploying only Ironic scenario.
These are the deployment use cases addressed:
-
Deploying baremetal-operator with Ironic.
-
Deploying baremetal-operator without Ironic.
-
Deploying only Ironic.
Current structure of baremetal-operator config directory
tree config/
config/
├── basic-auth
│ ├── default
│ │ ├── credentials_patch.yaml
│ │ └── kustomization.yaml
│ └── tls
│ ├── credentials_patch.yaml
│ └── kustomization.yaml
├── certmanager
│ ├── certificate.yaml
│ ├── kustomization.yaml
│ └── kustomizeconfig.yaml
├── crd
│ ├── bases
│ │ ├── metal3.io_baremetalhosts.yaml
│ │ ├── metal3.io_firmwareschemas.yaml
│ │ └── metal3.io_hostfirmwaresettings.yaml
│ ├── kustomization.yaml
│ ├── kustomizeconfig.yaml
│ └── patches
│ ├── cainjection_in_baremetalhosts.yaml
│ ├── cainjection_in_firmwareschemas.yaml
│ ├── cainjection_in_hostfirmwaresettings.yaml
│ ├── webhook_in_baremetalhosts.yaml
│ ├── webhook_in_firmwareschemas.yaml
│ └── webhook_in_hostfirmwaresettings.yaml
├── default
│ ├── ironic.env
│ ├── kustomization.yaml
│ ├── manager_auth_proxy_patch.yaml
│ ├── manager_webhook_patch.yaml
│ └── webhookcainjection_patch.yaml
├── kustomization.yaml
├── manager
│ ├── kustomization.yaml
│ └── manager.yaml
├── namespace
│ ├── kustomization.yaml
│ └── namespace.yaml
├── OWNERS
├── prometheus
│ ├── kustomization.yaml
│ └── monitor.yaml
├── rbac
│ ├── auth_proxy_client_clusterrole.yaml
│ ├── auth_proxy_role_binding.yaml
│ ├── auth_proxy_role.yaml
│ ├── auth_proxy_service.yaml
│ ├── baremetalhost_editor_role.yaml
│ ├── baremetalhost_viewer_role.yaml
│ ├── firmwareschema_editor_role.yaml
│ ├── firmwareschema_viewer_role.yaml
│ ├── hostfirmwaresettings_editor_role.yaml
│ ├── hostfirmwaresettings_viewer_role.yaml
│ ├── kustomization.yaml
│ ├── leader_election_role_binding.yaml
│ ├── leader_election_role.yaml
│ ├── role_binding.yaml
│ └── role.yaml
├── render
│ └── capm3.yaml
├── samples
│ ├── metal3.io_v1alpha1_baremetalhost.yaml
│ ├── metal3.io_v1alpha1_firmwareschema.yaml
│ └── metal3.io_v1alpha1_hostfirmwaresettings.yaml
├── tls
│ ├── kustomization.yaml
│ └── tls_ca_patch.yaml
└── webhook
├── kustomization.yaml
├── kustomizeconfig.yaml
├── manifests.yaml
└── service_patch.yaml
The config
directory has one top level folder for deployment, namely default
and it deploys only baremetal-operator through kustomization file calling
manager
folder. In addition, basic-auth
, certmanager
, crd
, namespace
,
prometheus
, rbac
, tls
and webhook
folders have their own kustomization
and yaml files. samples
folder includes yaml representation of sample CRDs.
Current structure of ironic-deployment directory
tree ironic-deployment/
ironic-deployment/
├── base
│ ├── ironic.yaml
│ └── kustomization.yaml
├── components
│ ├── basic-auth
│ │ ├── auth.yaml
│ │ ├── ironic-auth-config
│ │ ├── ironic-auth-config-tpl
│ │ ├── ironic-htpasswd
│ │ ├── ironic-inspector-auth-config
│ │ ├── ironic-inspector-auth-config-tpl
│ │ ├── ironic-inspector-htpasswd
│ │ └── kustomization.yaml
│ ├── keepalived
│ │ ├── ironic_bmo_configmap.env
│ │ ├── keepalived_patch.yaml
│ │ └── kustomization.yaml
│ └── tls
│ ├── certificate.yaml
│ ├── kustomization.yaml
│ ├── kustomizeconfig.yaml
│ └── tls.yaml
├── default
│ ├── ironic_bmo_configmap.env
│ └── kustomization.yaml
├── overlays
│ ├── basic-auth_tls
│ │ ├── basic-auth_tls.yaml
│ │ └── kustomization.yaml
│ └── basic-auth_tls_keepalived
│ └── kustomization.yaml
├── OWNERS
└── README.md
The ironic-deployment
folder contains kustomizations for deploying Ironic.
It makes use of kustomize components for basic auth, TLS and keepalived configurations.
This makes it easy to combine the configurations, for example basic auth + TLS.
There are some ready made overlays in the overlays
folder that shows how this can be done.
For more information, check the readme in the ironic-deployment
folder.
Deployment commands
There is a useful deployment script that configures and deploys BareMetal Operator and Ironic. It requires some variables :
- IRONIC_HOST : domain name for Ironic and inspector
- IRONIC_HOST_IP : IP on which Ironic and inspector are listening
In addition you can configure the following variables. They are optional. If you leave them unset, then passwords and certificates will be generated for you.
- KUBECTL_ARGS : Additional arguments to kubectl apply
- IRONIC_USERNAME : username for ironic
- IRONIC_PASSWORD : password for ironic
- IRONIC_INSPECTOR_USERNAME : username for inspector
- IRONIC_INSPECTOR_PASSWORD : password for inspector
- IRONIC_CACERT_FILE : CA certificate path for ironic
- IRONIC_CAKEY_FILE : CA certificate key path, unneeded if ironic
- certificates exist
- IRONIC_CERT_FILE : Ironic certificate path
- IRONIC_KEY_FILE : Ironic certificate key path
- IRONIC_INSPECTOR_CERT_FILE : Inspector certificate path
- IRONIC_INSPECTOR_KEY_FILE : Inspector certificate key path
- IRONIC_INSPECTOR_CACERT_FILE : CA certificate path for inspector, defaults to
- IRONIC_CACERT_FILE
- IRONIC_INSPECTOR_CAKEY_FILE : CA certificate key path, unneeded if inspector certificates exist
- MARIADB_KEY_FILE: Path to the key of MariaDB
- MARIADB_CERT_FILE: Path to the cert of MariaDB
- MARIADB_CAKEY_FILE: Path to the CA key of MariaDB
- MARIADB_CACERT_FILE: Path to the CA certificate of MariaDB
Then run :
./tools/deploy.sh [-b -i -t -n -k]
-b
: deploy BMO-i
: deploy Ironic-t
: deploy with TLS enabled-n
: deploy without authentication-k
: deploy with keepalived
This will deploy BMO and / or Ironic with the proper configuration.
Useful tips
It is worth mentioning some tips for when the different configurations are useful as well. For example:
-
Only BMO is deployed, in a case when Ironic is already running, e.g. as part of Cluster API Provider Metal3 (CAPM3) when a successful pivoting state was met and ironic being deployed.
-
BMO and Ironic are deployed together, in a case when CAPM3 is not used and baremetal-operator and ironic containers to be deployed together.
-
Only Ironic is deployed, in a case when BMO is deployed as part of CAPM3 and only Ironic setup is sufficient, e.g. clusterctl provided by Cluster API(CAPI) deploys BMO, so that it can take care of moving the BaremetalHost during the pivoting.
Important Note When the baremetal-operator is deployed through metal3-dev-env, baremetal-operator container inherits the following environment variables through configmap:
$PROVISIONING_IP
$PROVISIONING_INTERFACE
In case you are deploying baremetal-operator locally, make sure to populate and export these environment variables before deploying.
Automated Cleaning
One of the Ironic’s feature exposed to Metal3 Baremetal Operator is node automated cleaning. When enabled, automated cleaning kicks off when a node is provisioned first time and on every time deprovisioned.
There are two automated cleaning modes available which can be set via automatedCleaningMode
field of a BareMetalHost spec
.
metadata
to enable the disk cleaningdisabled
to disable the disk cleaning
We named enabling mode metadata
instead of simply enabled
because we expect that in the future we will expand the feature to allow
selecting certains disks (specified via metadata) of a node to be cleaned, which is currently out of scope.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: example-node
spec:
automatedCleaningMode: metadata
online: true
bootMACAddress: 00:8a:b6:8e:ac:b8
bootMode: legacy
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: example-node-bmc-secret
automatedCleaningMode: metadata
For a node with disabled
value, no cleaning will be performed during deprovisioning. Note that this might introduce security
vulnerabilities in case there is sensitive data which must be wiped out from the disk when the host is being recycled.
If automatedCleaningMode
is not set by the user, it will be set to the default mode metadata
. To know more about cleaning
steps that Ironic performs on the node, see the cleaning steps.
If you are using Cluster-api-provider-metal3 on top of Baremetal Operator, then please see this.
Automatic secure boot
The automatic secure boot feature allows enabling and disabling UEFI (Unified Extensible Firmware Interface) secure boot when provisioning a host. This feature requires supported hardware and compatible OS image. The current hardwares that support enabling UEFI secure boot are iLO
, iRMC
and Redfish
drivers.
Check also:
Why do we need it
We need the Automatic secure boot when provisioning a host with high security requirements. Based on checksum and signature, the secure boot protects the host from loading malicious code in the boot process before loading the provisioned operating system.
How to use it
To enable Automatic secure boot, first check if hardware is supported and then specify the value UEFISecureBoot
for bootMode
in the BareMetalHost custom resource. Please note, it is enabled before booting into the deployed instance and disabled when the ramdisk is running and on tear down. Below you can check the example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-1
spec:
online: true
bootMACAddress: 00:5c:52:31:3a:9c
bootMode: UEFISecureBoot
...
This will enable UEFI before booting the instance and disable it when deprovisioned. Note that the default value for bootMode
is UEFI
.
Live ISO
The live-iso API in Metal3 allows booting a BareMetalHost with a live ISO image instead of writing an image to the local disk using the IPA deploy ramdisk.
Why we need it?
In some circumstances, i.e to reduce boot time for ephemeral workloads, it may be possible to boot an iso and not deploy any image to disk (saving the time to write the image and reboot). This API is also useful for integration with 3rd party installers distributed as a CD image, for example leveraging the existing toolchains like fedora-coreos installer might be desirable.
How to use it?
Here is an example with a BareMetalHost CRD, where iso referenced by the url and live-iso
set in DiskFormat will be
live-booted without deploying an image to disk. Additionally, live ISO mode is supported with any
virtualmedia driver when used as a BMC driver. Also, checksum options are not required in this case, and will be
ignored if specified:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: live-iso-booted-node
spec:
image:
url: http://1.2.3.4/image.iso
format: live-iso
online: true
Note: rootDeviceHints
, networkData
and userData
will not be used
since the image is not written to disk.
For more details, please see the design proposal.
Detached annotation
The detached annotation provides a way to prevent management of a BareMetalHost.
It works by deleting the host information from Ironic without triggering deprovisioning.
The BareMetal Operator will recreate the host in Ironic again once the annotation is removed.
This annotation can be used with BareMetalHosts in Provisioned
, ExternallyProvisioned
or Available
states.
Normally, deleting a BareMetalHost will always trigger deprovisioning. This can be problematic and unnecessary if we just want to, for example, move the BareMetalHost from one cluster to another. By applying the annotation before removing the BareMetalHost from the old cluster, we can ensure that the host is not disrupted by this (normally it would be deprovisioned). The next step is then to recreate it in the new cluster without triggering a new inspection. See the status annotation page for how to do this.
The annotation key is baremetalhost.metal3.io/detached
and the value can be anything (it is ignored).
Here is an example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: example
annotations:
baremetalhost.metal3.io/detached: ""
spec:
online: true
bootMACAddress: 00:8a:b6:8e:ac:b8
bootMode: legacy
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: example-bmc-secret
...
Why is this annotation needed?
- It provides a way to move BareMetalHosts between clusters (essentially deleting them in the old cluster and recreating them in the new) without going through deprovisioning, inspection and provisioning.
- It allows deleting the BareMetalHost object without triggering deprovisioning. This can be used to hand over management of the host to a different system without disruption.
For more details, please see the design proposal.
Status annotation
The status annotation is useful when you need to avoid inspection of a BareMetalHost. This can happen if the status is already known, for example, when moving the BareMetalHost from one cluster to another. By setting this annotation, the BareMetal Operator will take the status of the BareMetalHost directly from the annotation.
The annotation key is baremetalhost.metal3.io/status
and the value is a JSON representation of the BareMetalHosts status
field.
One simple way of extracting the status and turning it into an annotation is using kubectl like this:
# Save the status in json format to a file
kubectl get bmh <name-of-bmh> -o jsonpath="{.status}" > status.json
# Save the BMH and apply the status annotation to the saved BMH.
kubectl -n metal3 annotate bmh <name-of-bmh> \
baremetalhost.metal3.io/status="$(cat status.json)" \
--dry-run=client -o yaml > bmh.yaml
Note that the above example does not apply the annotation to the BareMetalHost directly since this is most likely not useful to apply it on one that already has a status.
Instead it saves the BareMetalHost with the annotation applied to a file bmh.yaml
.
This file can then be applied in another cluster.
The status would be discarded at this point since the user is usually not allowed to set it, but the annotation is still there and would be used by the BareMetal Operator to set status again.
Once this is done, the operator will remove the status annotation.
In this situation you may also want to check the detached annotation for how to remove the BareMetalHost from the old cluster without going through deprovisioning.
Here is an example of a BareMetalHost, first without the annotation, but with status and spec, and then the other way around. This shows how the status field is turned into the annotation value.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-0
namespace: metal3
spec:
automatedCleaningMode: metadata
bmc:
address: redfish+http://192.168.111.1:8000/redfish/v1/Systems/febc9f61-4b7e-411a-ada9-8c722edcee3e
credentialsName: node-0-bmc-secret
bootMACAddress: 00:80:1f:e6:f1:8f
bootMode: legacy
online: true
status:
errorCount: 0
errorMessage: ""
goodCredentials:
credentials:
name: node-0-bmc-secret
namespace: metal3
credentialsVersion: "1775"
hardwareProfile: ""
lastUpdated: "2022-05-31T06:33:05Z"
operationHistory:
deprovision:
end: null
start: null
inspect:
end: null
start: "2022-05-31T06:33:05Z"
provision:
end: null
start: null
register:
end: "2022-05-31T06:33:05Z"
start: "2022-05-31T06:32:54Z"
operationalStatus: OK
poweredOn: false
provisioning:
ID: 8d566f5b-a28f-451b-a70f-419507c480cd
bootMode: legacy
image:
url: ""
state: inspecting
triedCredentials:
credentials:
name: node-0-bmc-secret
namespace: metal3
credentialsVersion: "1775"
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-0
namespace: metal3
annotations:
baremetalhost.metal3.io/status: |
{"errorCount":0,"errorMessage":"","goodCredentials":{"credentials":{"name":"node-0-bmc-secret","namespace":"metal3"},"credentialsVersion":"1775"},"hardwareProfile":"","lastUpdated":"2022-05-31T06:33:05Z","operationHistory":{"deprovision":{"end":null,"start":null},"inspect":{"end":null,"start":"2022-05-31T06:33:05Z"},"provision":{"end":null,"start":null},"register":{"end":"2022-05-31T06:33:05Z","start":"2022-05-31T06:32:54Z"}},"operationalStatus":"OK","poweredOn":false,"provisioning":{"ID":"8d566f5b-a28f-451b-a70f-419507c480cd","bootMode":"legacy","image":{"url":""},"state":"inspecting"},"triedCredentials":{"credentials":{"name":"node-0-bmc-secret","namespace":"metal3"},"credentialsVersion":"1775"}}
spec:
...
External inspection
Similar to the status annotation, external inspection makes it possible to skip the inspection step.
The difference is that the status annotation can only be used on the very first reconcile and allows setting all the fields under status
.
In contrast, external inspection limits the changes so that only HardwareDetails can be modified, and it can be used at any time when inspection is disabled (with the inspect.metal3.io: disabled
annotation) or when there is no existing HardwareDetails data.
External inspection is controlled through an annotation on the BareMetalHost.
The annotation key is inspect.metal3.io/hardwaredetails
and the value is a JSON representation of the BareMetalHosts status.hardware
field.
Here is an example with a BMH that has inspection disabled and is using the external inspection feature to add the HardwareDetails.
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node-0
namespace: metal3
annotations:
inspect.metal3.io: disabled
inspect.metal3.io/hardwaredetails: |
{"systemVendor":{"manufacturer":"QEMU", "productName":"Standard PC (Q35 + ICH9, 2009)","serialNumber":""}, "firmware":{"bios":{"date":"","vendor":"","version":""}},"ramMebibytes":4096, "nics":[{"name":"eth0","model":"0x1af4 0x0001","mac":"00:b7:8b:bb:3d:f6", "ip":"172.22.0.64","speedGbps":0,"vlanId":0,"pxe":true}], "storage":[{"name":"/dev/sda","rotational":true,"sizeBytes":53687091200, "vendor":"QEMU", "model":"QEMU HARDDISK","serialNumber":"drive-scsi0-0-0-0", "hctl":"6:0:0:0"}],"cpu":{"arch":"x86_64", "model":"Intel Xeon E3-12xx v2 (IvyBridge)","clockMegahertz":2494.224, "flags":["foo"],"count":4},"hostname":"hwdAnnotation-0"}
spec:
...
Why is this needed?
- It allows avoiding an extra reboot for live-images that include their own inspection tooling.
- It provides an arguably safer alternative to the status annotation in some cases.
Caveats:
- If both
baremetalhost.metal3.io/status
andinspect.metal3.io/hardwaredetails
are specified on BareMetalHost creation,inspect.metal3.io/hardwaredetails
will take precedence and overwrite any hardware data specified viabaremetalhost.metal3.io/status
. - If the BareMetalHost is in the
Available
state the controller will not attempt to match profiles based on the annotation.
Inspect annotation
The inspect annotation can be used to request the baremetal operator to (re-)inspect an Available
BareMetalHost.
This is useful in case there were hardware changes for example.
Note that it is only possible to do this when BareMetalHost is in Available
state.
If an inspection request is made while BareMetalHost is any other state than Available
, the request will be ignored.
To request a new inspection, simply annotating the host with inspect.metal3.io
is enough.
Once inspection is requested, you should see the BMH in inspecting state until inspection is completed, and by the end of inspection the inspect.metal3.io
annotation will be removed automatically.
Here is an example:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: example
annotations:
# The inspect annotation with no value
inspect.metal3.io: ""
spec:
online: true
bootMACAddress: 00:8a:b6:8e:ac:b8
bootMode: legacy
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: example-bmc-secret
...
Why is this needed?
- For re-inspecting BareMetalHosts after hardware changes.
Caveats:
- It is only possible to inspect a BareMetalHost when it is in
Available
state.
Note: For other use cases, like disabling inspection or providing externally gathered inspection data, see external inspection.
Reboot annotation
The reboot annotation can be used for rebooting BareMetalHosts in the provisioned
state.
The annotation key takes either of the following forms:
reboot.metal3.io
reboot.metal3.io/{key}
In its basic form (reboot.metal3.io
), the annotation will trigger a reboot of the BareMetalHost.
The controller will remove the annotation as soon as it has restored power to the host.
The advanced form (reboot.metal3.io/{key}
) includes a unique suffix (indicated with {key}
).
In this form the host will be kept in PoweredOff
state until the annotation has been removed.
This can be useful if some tasks needs to be performed while the host is in a known stable state.
The purpose of the {key}
is to allow multiple clients to use the API simultaneously in a safe way.
Each client chooses a key and touches only the annotations that has this key to avoid interfering with other clients.
If there are multiple annotations, the controller will wait for all of them to be removed (by the clients) before powering on the host.
Similarly, if both forms of annotations are used, the reboot.metal3.io/{key}
form will take precedence.
This ensures that the host stays powered off until all clients are ready (i.e. all annotations are removed).
The annotation value should be a JSON map containing the key mode
and a value hard
or soft
to indicate if a hard or soft reboot should be performed.
It is not necessary to specify the annotation value.
In case it is omitted, the default is to first try a soft reboot, and if that fails, do a hard reboot.
The exact behavior of hard
and soft
reboot depends on the Ironic configuration.
Please see the Ironic configuration reference for more details on this, e.g. the soft_power_off_timeout
variable is relevant.
Here are a few examples of the reboot annotation:
reboot.metal3.io
- immediate reboot via soft shutdown first, followed by a hard shutdown if the soft shutdown fails.reboot.metal3.io: {'mode':'hard'}
- immediate reboot via hard shutdown, potentially allowing for high-availability use-cases.reboot.metal3.io/{key}
- phased reboot, issued and managed by the client registered with the key, via soft shutdown first, followed by a hard reboot if the soft reboot fails.reboot.metal3.io/{key}: {'mode':'hard'}
- phased reboot, issued and managed by the client registered with the key, via hard shutdown.
And here is a “full” example showing a BareMetalHost with the annotation applied:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: example
annotations:
# The basic form with no value
reboot.metal3.io: ""
# Advanced form with value
reboot.metal3.io/my-unique-key: "{'mode':'soft'}"
spec:
online: true
bootMACAddress: 00:8a:b6:8e:ac:b8
bootMode: legacy
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: example-bmc-secret
...
Why is this needed?
- It enables controllers and users to perform reboots.
- It provides a way to remediate failed hosts. (“Have you tried turning it off and on again?”)
- It provides a stable state (powered off) where certain tasks can be performed without risk of interference from the powered off machine.
Caveats:
- Clients using this API must respect each other and clean up after themselves. Otherwise they will step on each others toes by for example, leaving an annotation indefinitely or removing someone else’s annotation before they were ready.
For more details please check the reboot interface proposal.
Ironic
Ironic is an open-source service for automating provisioning and lifecycle management of bare metal machines. Born as the Bare Metal service of the OpenStack cloud software suite, it has evolved to become a semi-autonomous project, adding ways to be deployed independently as a standalone service, for example using Bifrost, and integrates in other tools and projects, as in the case of Metal3.
Ironic nowadays supports the two main standard hardware management interfaces, Redfish and IPMI, and thanks to its large community of contributors, it can provide native support for many different bare-metal hardware vendors, such as Dell, Fujitsu, HPE, and Supermicro.
Why Ironic in Metal3
- Ironic is open source! This aligns perfectly with the philosophy behind Metal3.
- Ironic has a vendor agnostic interface provided by a robust set of RESTful APIs.
- Ironic has a vibrant and diverse community, including small and large operators, hardware and software vendors.
- Ironic provides features covering the whole hardware life-cycle: from bare metal machine registration and hardware specifications retrieval of newly discovered bare metal machines, configuration and provisioning with custom operating system images, up to machines reset, cleaning for re-provisionionig or end-of-life retirement.
How Metal3 uses Ironic
The Metal3 project adopted Ironic as the back-end that manages bare-metal hosts behind native Kubernetes API.
Bare Metal Operator is the main component that interfaces with the Ironic API for all operations needed to provision bare-metal hosts, such as hardware capabilites inspection, operating system installation, and re-initialization when restoring a bare-metal machine to its original status.
References
Install Ironic
Metal3 runs Ironic as a set of containers. Those containers can be deployed either in-cluster and out-of-cluster. In both scenarios, there are a couple of containers that must run in order to provision baremetal nodes:
- ironic (the main provisioning service)
- ironic-inspector (the auxiliary inspection service)
- ipa-downloader (init container to download and cache the deployment ramdisk image)
- httpd (HTTP server that serves cached images and iPXE configuration)
A few other containers are optional:
- ironic-endpoint-keepalived (to maintain a persistent IP address on the provisioning network)
- dnsmasq (to support DHCP on the provisioning network and to implement network boot via iPXE)
- ironic-log-watch (to provide access to the deployment ramdisk logs)
- mariadb (the provisioning service database; SQLite can be used as a lightweight alternative)
Prerequisites
Networking
A separate provisioning network is required when network boot is used.
The following ports must be accessible by the hosts being provisioned:
- TCP 6385 (Ironic API)
- TCP 5050 (Inspector API)
- TCP 80 (HTTP server; can be changed via the
HTTP_PORT
environment variable) - UDP 67/68/546/547 (DHCP and DHCPv6; when network boot is used)
- UDP 69 (TFTP; when network boot is used)
The main Ironic service must be able to access the hosts’ BMC addresses.
When virtual media is used, the hosts’ BMCs must be able to access HTTP_PORT
.
Environmental variables
The following environmental variables can be passed to configure the Ironic services:
HTTP_PORT
- port used by httpd server (default 6180)PROVISIONING_IP
- provisioning interface IP address to use for ironic, dnsmasq(dhcpd) and httpd (default 172.22.0.1)CLUSTER_PROVISIONING_IP
- cluster provisioning interface IP address (default 172.22.0.2)PROVISIONING_INTERFACE
- interface to use for ironic, dnsmasq(dhcpd) and httpd (default ironicendpoint)CLUSTER_DHCP_RANGE
- dhcp range to use for provisioning (default 172.22.0.10-172.22.0.100)DEPLOY_KERNEL_URL
- the URL of the kernel to deploy ironic-python-agentDEPLOY_RAMDISK_URL
- the URL of the ramdisk to deploy ironic-python-agentIRONIC_ENDPOINT
- the endpoint of the ironicIRONIC_INSPECTOR_ENDPOINT
- the endpoint of the ironic inspectorCACHEURL
- the URL of the cached imagesIRONIC_FAST_TRACK
- whether to enable fast_track provisioning or not (default true)IRONIC_KERNEL_PARAMS
- kernel parameters to pass to IPA (default console=ttyS0)IRONIC_INSPECTOR_VLAN_INTERFACES
- VLAN interfaces included in introspection, all - all VLANs on all interfaces, using LLDP information (default), interface all VLANs on an interface, using LLDP information, interface.vlan - a particular VLAN interface, not using LLDPIRONIC_BOOT_ISO_SOURCE
- where the boot iso image will be served from, possible values are: local (default), to download the image, prepare it and serve it from the conductor; http, to serve it directly from its HTTP URLIPA_DOWNLOAD_ENABLED
- enables the use of the Ironic Python Agent Downloader container to download IPA archive (default true)USE_LOCAL_IPA
- enables the use of locally supplied IPA archive. This condition is handled by BMO and this has effect only whenIPA_DOWNLOAD_ENABLED
is “false”, otherwiseIPA_DOWNLOAD_ENABLED
takes precedence. (default false)LOCAL_IPA_PATH
- this has effect only whenUSE_LOCAL_IPA
is set to “true”, points to the directory where the IPA archive is located. This variable is handled by BMO. The variable should contain an arbitrary path pointing to the directory that contains the ironic-python-agent.tarGATEWAY_IP
- gateway IP address to use for ironic dnsmasq (dhcpd)DNS_IP
- DNS IP address to use for ironic dnsmasq (dhcpd)
To know how to pass these variables, please see the sections below.
Ironic in-cluster installation
For in-cluster Ironic installation, we will run a set of containers within
a single pod in a Kubernetes cluster. You can enable TLS or basic auth or even
disable both for Ironic and Inspector communication. Below we will see kustomize
folders that will help us to install Ironic for each mentioned case. In each
of these deployments, a ConfigMap will be created and mounted to the Ironic pod.
The ConfigMap will be populated based on environment variables from
ironic-deployment/default/ironic_bmo_configmap.env. As such, update
ironic_bmo_configmap.env
with your custom values before deploying the Ironic.
WARNING: Ironic normally listens on the host network of the control plane nodes. If you do not enable authentication, anyone with access to this network can use it to manipulate your nodes. It’s also highly advised to use TLS to prevent eavesdropping.
Installing with Kustomize
We assume you are inside the local baremetal-operator path, if not you need to
clone it first and cd
to the root path.
git clone https://github.com/metal3-io/baremetal-operator.git
cd baremetal-operator
Basic authentication enabled:
kustomize build ironic-deployment/basic-auth | kubectl apply -f -
TLS enabled:
kustomize build ironic-deployment/basic-auth/tls | kubectl apply -f -
Ironic out-of-cluster installation
For out-of-cluster Ironic installation, we will run a set of docker containers outside of a Kubernetes cluster. To pass Ironic settings, you can export corresponding environmental variables on the current shell before calling run_local_ironic.sh installation script. This will start below containers:
- ironic
- ironic-inspector
- ironic-endpoint-keepalived
- ironic-log-watch
- ipa-downloader
- dnsmasq
- httpd
- mariadb; if
IRONIC_USE_MARIADB
= “true”
If in-cluster ironic installation, we used different manifests for TLS and basic auth, here we are exporting environment variables for enabling/disabling TLS & basic auth but use the same script.
TLS and Basic authentication disabled
export IRONIC_FAST_TRACK="false" # Example of manipulating Ironic settings
export IRONIC_TLS_SETUP="false" # Disable TLS
export IRONIC_BASIC_AUTH="false" # Disable basic auth
./tools/run_local_ironic.sh
Basic authentication enabled
export IRONIC_TLS_SETUP="false"
export IRONIC_BASIC_AUTH="true"
./tools/run_local_ironic.sh
TLS enabled
export IRONIC_TLS_SETUP="true"
export IRONIC_BASIC_AUTH="false"
./tools/run_local_ironic.sh
Ironic Python Agent (IPA)
IPA is a service written in python that runs within a ramdisk. It provides remote access to ironic
and ironic-inspector
services to perform various operations on the managed server. It also sends information about the server to Ironic
.
By default, we pull IPA images from Ironic upstream archive where an image is built on every commit to the master git branch.
However, another remote registry or a local IPA archive can be specified. ipa-downloader is responsible for downloading the IPA ramdisk image to a shared volume from where the nodes are able to retrieve it.
Data flow
IPA interacts with other components. The information exchanged and the component to which it is sent to or received from are described below. The communication between IPA and these components can be encrypted in-transit with SSL/TLS.
- Heartbeat: periodic message informing Ironic that the node is still running.
- Lookup: data sent to Ironic that helps it determine Ironic’s node UUID for the node.
- Introspection: data about hardware details, such as CPU, disk, RAM and network interfaces.
The above data is sent/received as follows.
- Lookup/heartbeats data is sent to Ironic.
- Introspection result is sent to ironic-inspector.
- User supplied boot image that will be written to the node’s disk is retrieved from HTTPD server
References
Ironic Container Images
The currently available ironic container images are listed below.
Name and link to repository | Content/Purpose |
---|---|
ironic-image | Ironic api and conductor / Ironic Inspector / Sushy tools / virtualbmc |
ironic-ipa-downloader | Distribute the ironic python agent ramdisk |
ironic-hardware-inventory-recorder-image | Ironic python agent hardware collector daemon |
ironic-static-ip-manager | Set and maintain IP for provisioning pod |
ironic-client | Ironic CLI utilities |
How to build a container image
Each repository mentioned in the list contains a Dockerfile that can be used to build the relative container. The build process is as easy as using the docker or podman command and point to the Dockerfile, for example in case of the ironic-image:
git clone https://github.com/metal3-io/ironic-image.git
cd ironic-image
docker build . -f Dockerfile
In some cases a make sub-command is provided to build the image using docker, usually make docker
Build ironic-image from source
The standard build command builds the container using RPMs taken from the RDO project, although an alternative build option has been provided for the ironic-image container to use source code instead.
Setting the argument INSTALL_TYPE to source in the build cli command triggers the build from source code:
docker build . -f Dockerfile --build-arg INSTALL_TYPE=source
When building the ironic image from source, it is also possible to specify a different source for ironic, ironic-inspector or the sushy library using the build arguments IRONIC_SOURCE, IRONIC_INSPECTOR_SOURCE, and SUSHY_SOURCE. The accepted formats are gerrit refs, like refs/changes/89/860689/2, commit hashes, like a1fe6cb41e6f0a1ed0a43ba5e17745714f206f1f, or a local directory that needs to be under the sources/ directory in the container context.
An example of a full command installing ironic from a gerrit patch is:
docker build . -f Dockerfile --build-arg INSTALL_TYPE=source --build-arg IRONIC_SOURCE="refs/changes/89/860689/2"
An example using the local directory sources/ironic:
docker build . -f Dockerfile --build-arg INSTALL_TYPE=source --build-arg IRONIC_SOURCE="ironic"
Work with patches in the ironic-image
The ironic-image allows testing patches for ironic projects building the container image directly including any patch using the patch-image.sh script at build time. To use the script we need to specify a text file containing the list of patches to be applied as the value of the build argument PATCH_LIST, for example:
docker build . -f Dockerfile --build-arg PATCH_LIST=patch-list.txt
At the moment, only patches coming from opendev.org gerrit are accepted. Include one patch per line in the PATCH_LIST file with the format:
project refspec
where:
- project is the last part of the project url including the org, for example openstack/ironic
- refspec is the gerrit refspec of the patch we want to test, for example refs/changes/67/759567/1
Special resources: sushy-tools and virtualbmc
In the ironic-image container repository, under the resources directory, we find the Dockerfiles needed to build sushy-tools and virtualbmc containers.
They can both be built exactly like the other containers using the docker build command.
Kubernetes Cluster API Provider Metal3
Kubernetes-native declarative infrastructure for Metal3.
What is the Cluster API Provider Metal3
The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers. Cluster API Provider Metal3 is one of the providers for Cluster API and enables users to deploy a Cluster API based cluster on top of bare metal infrastructure using Metal3.
Compatibility with Cluster API
CAPM3 version | Cluster API version | CAPM3 Release |
---|---|---|
v1alpha5 | v1alpha4 | v0.5.X |
v1beta1 | v1beta1 | v1.1.X |
v1beta1 | v1beta1 | v1.2.X |
Development Environment
There are multiple ways to setup a development environment:
- Using Tilt
- Other management cluster
- See metal3-dev-env for an
end-to-end development and test environment for
cluster-api-provider-metal3
and baremetal-operator.
Getting involved and contributing
Are you interested in contributing to Cluster API Provider Metal3? We, the maintainers and community, would love your suggestions, contributions, and help! Also, the maintainers can be contacted at any time to learn more about how to get involved.
To set up your environment checkout the development environment.
In the interest of getting more new people involved, we tag issues with good first issue. These are typically issues that have smaller scope but are good ways to start to get acquainted with the codebase.
We also encourage ALL active community participants to act as if they are maintainers, even if you don’t have “official” write permissions. This is a community effort, we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power! Don’t assume that the only people who can get things done around here are the “maintainers”.
We also would love to add more “official” maintainers, so show us what you can do!
All the repositories in the Metal3 project, including the Cluster API Provider Metal3 GitHub repository, use the Kubernetes bot commands. The full list of the commands can be found here. Note that some of them might not be implemented in metal3 CI.
Community
Community resources and contact details can be found here.
Github issues
We use Github issues to keep track of bugs and feature requests. There are two different templates to help ensuring that relevant information is included.
Bugs
If you think you have found a bug please follow the instructions below.
- Please spend a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
- Collect logs from relevant components and make sure to include them in the bug report you are going to open.
- Remember users might be searching for your issue in the future, so please give it a meaningful title to help others.
- Feel free to reach out to the metal3 community.
Tracking new features
We also use the issue tracker to track features. If you have an idea for a feature, or think you can help Cluster API Provider Metal3 become even more awesome, then follow the steps below.
- Open a feature request.
- Remember users might be searching for your feature request in the future, so please give it a meaningful title to help others.
- Clearly define the use case, using concrete examples. e.g.:
I type this and cluster-api-provider-metal3 does that.
- Some of our larger features will require proposals. If you would like to include a technical design for your feature please open a feature proposal in metal3-docs using this template.
After the new feature is well understood, and the design agreed upon we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) pull request, and happy coding.
Install Cluster-api-provider-metal3
You can either use clusterctl (recommended) to install Metal³ infrastructure provider or kustomize for manual installation. Both methods install provider CRDs, its controllers and Ip-address-manager. Please keep in mind that Baremetal Operator and Ironic are decoupled from CAPM3 and will not be installed when the provider is initialized. As such, you need to install them yourself.
Prerequisites
-
Install
clusterctl
, refer to Cluster API book for installation instructions. -
Install
kustomize
, refer to official instructions here. -
Install Ironic, refer to this page.
-
Install Baremetal Operator, refer to this page.
-
Install Cluster API core compoenents i.e., core, bootstrap and control-plane providers. This will also install cert-manager, if it is not already installed.
clusterctl init --core cluster-api:v1.1.4 --bootstrap kubeadm:v1.1.4 \ --control-plane kubeadm:v1.1.4 -v5
With clusterctl
This method is recommended. You can specify the CAPM3 version you want to install by appending a version tag, e.g. :v1.1.2
. If the version is not specified, the latest version available will be installed.
clusterctl init --infrastructure metal3:v1.1.2
With kustomize
To install specific version, edit the controller-manager image version in config/default/capm3/manager_image_patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller-manager
namespace: system
spec:
template:
spec:
containers:
# Change the value of image/tag to your desired image URL or version tag
- image: quay.io/metal3-io/cluster-api-provider-metal3:v1.1.2
name: manager
Apply the manifests
cd cluster-api-provider-metal3
kustomize build config/default | kubectl apply -f -
Remediation Controller and MachineHealthCheck
The Cluster API includes the remediation feature that implements an automated health checking of k8s nodes. It deletes unhealthy Machine and replaces with a healthy one. This approach can be challenging with cloud providers that are using hardware based clusters because of slower (re)provisioning of unhealthy Machines. To overcome this situation, CAPI remediation feature was extended to plug-in provider specific external remediation. It is also possible to plug-in Metal3 specific remediation strategies to remediate unhealthy nodes. In this case, the Cluster API MHC finds unhealthy nodes while the CAPM3 Remediation Controller remediates those unhealthy nodes.
CAPI Remediation
A MachineHealthCheck is a Cluster API resource, which allows users to define conditions under which Machines within a Cluster should be considered unhealthy. Users can also specify a timeout for each of the conditions that they define to check on the Machine’s Node. If any of these conditions are met for the duration of the timeout, the Machine will be remediated. CAPM3 will use the MachineHealthCheck to create remediation requests based on Metal3RemediationTemplate and Metal3Remediation CRDs to plug-in remediation solution. For more info, please read the CAPI MHClink.
External Remediation
External remediation provides remediation solutions other than deleting unhealthy Machine and creating healthy one. Environments consisting of hardware based clusters are slower to (re)provision unhealthy Machines. So there is a growing need for a remediation flow that includes external remediation which can significantly reduce the remediation process time. Normally the conditions based remediation doesn’t offer any other remediation than deleting an unhealthy Machine and replacing it with a new one. Other environments and vendors can also have specific remediation requirements, so there is a need to provide a generic mechanism for implementing custom remediation logic. External remediation integrates with CAPI MHC and support remediation based on power cycling the underlying hardware. It supports the use of BMO reboot API and CAPM3 unhealthy annotation as part of the automated remediation cycle. It is a generic mechanism for supporting externally provided custom remediation strategies. If no value for externalRemediationTemplate is defined for the MachineHealthCheck CR, the condition-based flow is continued. For more info: External Remediation proposal
Metal3 Remediation
The CAPM3 remediation controller reconciles Metal3Remediation objects created by CAPI MachineHealthCheck. It locates a Machine with the same name as the Metal3Remediation object and uses BMO and CAPM3 APIs to remediate associated unhealthy node. The remediation controller supports a reboot strategy specified in the Metal3Remediation CRD and uses the same object to store states of the current remediation cycle. The reboot strategy consists of three steps: power off the Machine, delete the related Node, and power the Machine on again. Deleting the Node indicates that the workloads on the Node are not running anymore, which results in quicker rescheduling and lower downtime of the affected workloads.
Enable remediation for worker nodes
Machines managed by a MachineSet (as identified by the nodepool
label) can be remediated. Here is an example MachineHealthCheck and Metal3Remediation for worker nodes:
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: worker-healthcheck
namespace: metal3
spec:
# clusterName is required to associate this MachineHealthCheck with a particular cluster
clusterName: test1
# (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
maxUnhealthy: 100%
# (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
# a Node to join the cluster, before considering a Machine unhealthy.
# Defaults to 10 minutes if not specified.
# Set to 0 to disable the node startup timeout.
# Disabling this timeout will prevent a Machine from being considered unhealthy when
# the Node it created has not yet registered with the cluster. This can be useful when
# Nodes take a long time to start up or when you only want condition based checks for
# Machine health.
nodeStartupTimeout: 0m
# selector is used to determine which Machines should be health checked
selector:
matchLabels:
nodepool: nodepool-0
# Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
unhealthyConditions:
- type: Ready
status: Unknown
timeout: 300s
- type: Ready
status: "False"
timeout: 300s
remediationTemplate: # added infrastructure reference
kind: Metal3RemediationTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
name: worker-remediation-request
Metal3RemediationTemplate for worker nodes:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3RemediationTemplate
metadata:
name: worker-remediation-request
namespace: metal3
spec:
template:
spec:
strategy:
type: "Reboot"
retryLimit: 2
timeout: 300s
Enable remediation for control plane nodes
Machines managed by a KubeadmControlPlane are remediated according to the KubeadmControlPlane proposal. It is necessary to have at least 2 control plane machines in order to use remediation feature. Control plane nodes are identified by the cluster.x-k8s.io/control-plane
label. Here is an example MachineHealthCheck and Metal3Remediation for control plane nodes:
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: controlplane-healthcheck
namespace: metal3
spec:
clusterName: test1
maxUnhealthy: 100%
nodeStartupTimeout: 0m
selector:
matchLabels:
cluster.x-k8s.io/control-plane: ""
unhealthyConditions:
- type: Ready
status: Unknown
timeout: 300s
- type: Ready
status: "False"
timeout: 300s
remediationTemplate: # added infrastructure reference
kind: Metal3RemediationTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
name: controlplane-remediation-request
Metal3RemediationTemplate for control plane nodes:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3RemediationTemplate
metadata:
name: controlplane-remediation-request
namespace: metal3
spec:
template:
spec:
strategy:
type: "Reboot"
retryLimit: 1
timeout: 300s
Limitations and caveats of Metal3 remediation
-
Machines owned by a MachineSet or a KubeadmControlPlane can be remediated by a MachineHealthCheck
-
If the Node for a Machine is removed from the cluster, CAPI MachineHealthCheck will consider this Machine unhealthy and remediates it immediately
-
If there is no Node joins the cluster for a Machine after the
NodeStartupTimeout
, the Machine will be remediated -
If a Machine fails for any reason and the
FailureReason
is set, the Machine will be remediated immediately
Node Reuse
This feature brings a possibility of re-using the same BaremetalHosts (referred to as a host later) during deprovisioning and provisioning mainly as a part of the rolling upgrade process in the cluster.
Importance of scale-in strategy
The logic behind the reusing of the hosts, solely relies on the scale-in upgrade strategy utilized by Cluster API objects, namely KubeadmControlPlane and MachineDeployment. During the upgrade process of above resources, the machines owned by KubeadmControlPlane or MachineDeployment are removed one-by-one before creating new ones (delete-create method). That way, we can fully ensure that, the intended host is reused when the upgrade is kicked in (picked up on the following provisioning for the new machine being created).
Note: To achieve the desired delete first and create after behavior in above-mentioned Cluster API objects, user has to modify:
- MaxSurge field in KubeadmControlPlane and set it to 0 with minimum number of 3 control plane machines replicas
- MaxSurge and MaxUnavailable fields in MachineDeployment set them to 0 & 1 accordingly
On the contrary, if the scale-out strategy is utilized by CAPI objects during the upgrade, usually create-swap-delete method is followed by CAPI objects, where new machine is created first and new host is picked up for that machine, breaking the node reuse logic right at the beginning of the upgrade process.
Workflow
Metal3MachineTemplate (M3MT) Custom Resource is the object responsible for enabling of the node reuse feature.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3MachineTemplate
metadata:
name: test1-controlplane
namespace: metal3
spec:
nodeReuse: True
template:
spec:
image:
...
There could be two Metal3MachineTemplate objects, one referenced by KubeadmControlPlane for control plane nodes, and the other by MachineDeployment for worker node. Before performing an upgrade, user must set nodeReuse
field to true in the desired Metal3MachineTemplate object where hosts targeted to be reused. If left unchanged, by default, nodeReuse
field is set to false resulting in no host reusing being performed in the workflow. If you would like to know more about the internals of controller logic, please check the original proposal for the feature here
Once nodeReuse
field is set to true, user has to make sure that scale-in feature is enabled as suggested above, and proceed with updating the desired fields in KubeadmControlPlane or MachineDeployment to start a rolling upgrade.
Note: If you are creating a new Metal3MachineTemplate object (for control-plane or worker), rather than using the existing one
created while provisioning, please make sure to reference it from the corresponding Cluster API object (KubeadmControlPlane or MachineDeployment). Also keep in mind that, already provisioned Metal3Machines were created from the old Metal3MachineTemplate
and they consume existing hosts, meaning even though nodeReuse
field is set to true in the new Metal3MachineTemplate,
it would have no effect. To use newly Metal3MachineTemplate in the workflow, user has to reprovision the nodes, which
should result in using new Metal3MachineTemplate referenced in Cluster API object and Metal3Machine created out of it.
CAPM3 Pivoting
What is pivoting
Cluster API Provider Metal3 (CAPM3) implements support for CAPI’s ‘move/pivoting’ feature.
CAPI Pivoting feature is a process of moving the provider components and declared Cluster API resources from a source
management cluster to a target management cluster by using the clusterctl
functionality called “move”.
More information about the general CAPI “move” functionality can be found here.
In Metal3, pivoting is performed by using the CAPI clusterctl
tool provided by Cluster-API project. clusterctl
recognizes pivoting as move.
During the pivot process clusterctl
pauses any reconciliation of CAPI objects and this gets propagated to CAPM3 objects as well.
Once all the objects are paused, the objects are created on the other side on the target cluster and deleted from the
bootstrap cluster.
Prerequisite
-
It is mandatory to use
clusterctl
for both the bootstrap and target cluster.If the provider components are not installed using
clusterctl
, it will not be able to identify the objects to move. Initializing the cluster usingclusterctl
essentially adds the following labels in the CRDs of each related object.labels: - clusterctl.cluster.x-k8s.io: "" - cluster.x-k8s.io/provider: "<provider-name>"
So if the clusters are not initialized using
clusterctl
, all the CRDS of the objects to be moved to target cluster needs to have these labels both in bootstrap cluster and target cluster before performing the move.Note: This is not recommended, since the way
clusterctl
identifies objects to manage might change in the future, so it’s always safe to install CRDs and controllers through theclusterctl init
sub-command. -
BareMetalHost objects have correct status annotation.
Since BareMetalHost (BMH) status holds important information regarding the BMH itself, BMH with status has to be moved and it has to be reconstructed with correct status in target cluster before it is being reconciled. This is now done through BMH status annotation in BMO.
-
Maintain connectivity towards provisioning network.
Baremetal machines boot over a network with a DHCP server. This requires maintaining a fixed IP end points towards the provisioning network. This is achieved through keepalived. A new container is added namely ironic-endpoint-keepalived in the ironic deployment which maintains the Ironic Endpoint using keepalived. The motivation behind maintaining Ironic Endpoint with Keepalived is to ensure that the Ironic Endpoint IP is also passed onto the target cluster control plane. This also guarantees that once moving is done and the management cluster is taken down, target cluster controlplane can re-claim the Ironic endpoint IP through keepalived. The end goal is to make Ironic endpoint reachable in the target cluster.
-
BMO is deployed as part of CAPM3.
If not, it has to be deployed before the
clusterctl init
and the BMH CRDs need to be labeled accordingly manually. Separate labeling for BMH CRDs is required because since CAPM3 release v0.5.0 BMO/BMH CRDs are not deplopyed as part of CAPM3 deployment anymore. This is a prerequisite for both the management and the target cluster. -
Objects should have a proper owner reference chain.
clusterctl move
moves all the objects to the target cluster following the owner reference chain. So, it is necessary to verify that all the desired objects that needs to be moved to the target cluster have a proper owner reference chain.
Important Notes
The following requirements are essential for the move process to run successfully:
-
The move process should be done when the BMHs are in a steady state. BMHs should not be moved while any operation is on-going i.e. BMH is in provisioning state. This will result in failure since the interaction between IPA and Ironic gets broken and as a result Ironic’s database might not be repopulated and eventually the cluster will end up in an erroneous state. Moreover, the IP of the BMH might change after the move and the DHCP-leases from the management cluster are not moved to target cluster.
-
Before the move process is initialized, it is important to delete the Ironic pod/Ironic containers. If Ironic is deployed in cluster the deployment is named
metal3-ironic
, if it is deployed locally outside the cluster then the user has to make sure that all of the ironic related containers are correctly deleted. If Ironic is not deleted before move, the old Ironic might interfere with the operations of the new Ironic deployed in target cluster since the database of the first Ironic instance is not cleaned when the BMHs are moved. Also there would be two dnsmasq existent in the deployment if there would be two Ironic deployment which is undesirable. -
The provisioning bridge where the
ironic-endpoint-IP
is supposed to be attached to should have a static IP assignment on it before the Ironic pod/containers start to operate in the target cluster. This is important sinceironic-endpoint-keepalived
container will only assign theironic-endpoint-IP
on the provisioning bridge in target cluster when it has an IP on it. Otherwise it will fail to attach the IP and Ironic will be unreachable. This is crucial because this interface is used to host the DHCP server and so it cannot be configured to use DHCP.
Step by step pivoting process
As described in clusterctl the whole process of bootstrapping a management cluster to moving objects to target cluster can be described as follows:
The move process can be bounded with the creation of a temporary bootstrap cluster used to provision a target management cluster.
This can now be achieved with the following procedure:
-
Create a temporary bootstrap cluster, the temporary bootstrap cluster could be created tools like e.g. using Kind or Minikube using and after the bootstrap cluster is up and running then the CAPI and provider components can be installed with
clusterctl
to the bootstrap cluster. -
Install Ironic components namely: ironic, ironic-inspector, ironic-endpoint-keepalived, httpd and dnsmasq.
-
Use clusterctl init to install the provider components
Example:
clusterctl init --infrastructure metal3:v1.1.0 --target-namespace metal3 --watching-namespace metal3
This command will create the necessary CAPI controllers (CAPI, CABPK, CAKCP) and CAPM3 as the infrastructure provider. All of the controllers will be installed on namespace
metal3
and they will be watching over objects in namespacemetal3
. -
Provision target cluster:
Example:
clusterctl config cluster ... | kubectl apply -f -
-
Wait for the target management cluster to be up and running and once it is up get the kubeconfig for the new target management cluster.
-
Use the new cluster’s kubeconfig to install the ironic-components in the target cluster.
-
Use
clusterctl
init with the new cluster’s kubeconfig to install the provider components.Example:
clusterctl init --kubeconfig target.yaml --infrastructure metal3:v1.1.0 --target-namespace metal3 --watching-namespace metal3
-
Use
clusterctl
move to move the Cluster API resources from the bootstrap cluster to the target management cluster.Example:
clusterctl move --to-kubeconfig target.yaml -n metal3 -v 10
-
Delete the bootstrap cluster
Automated Cleaning
Before reading this page, please see Baremetal Operator Automated Cleaning page.
If you are using only Metal3 Baremetal Operator, you can skip this page and refer to Baremetal Operator automated cleaning page instead.
For deployments following Cluster-api-provider-metal3 (CAPM3) workflow, automated cleaning can be (recommended) configured via CAPM3 custom resources (CR).
There are two automated cleaning modes available which can be set via automatedCleaningMode
field of a
Metal3MachineTemplate spec
or Metal3Machine spec
.
metadata
to enable the cleaningdisabled
to disable the cleaning
When enabled (metadata
), automated cleaning kicks off when a node is in the first provisioning and on every deprovisioning.
There is no default value for automatedCleaningMode
in Metal3MachineTemplate and Metal3Machine. If user doesn’t set any mode,
the field in the spec
will be omitted. Unsetting automatedCleaningMode
in the Metal3MachineTemplate will block the synchronization
of the cleaning mode between the Metal3MachineTemplate and Metal3Machines. This enables the selective operations described below.
Bulk operations
CAPM3 controller ensures to replicate automated cleaning mode to all Metal3Machines from their referenced Metal3MachineTemplate.
For example, one controlplane and one worker Metal3Machines have automatedCleaningMode
set to disabled
, because it is set to disabled
in the template that they both are referencing.
Note: CAPM3 controller replicates the cleaning mode from Metal3MachineTemplate to Metal3Machine only if automatedCleaningMode
is set (not empty) on the Metal3MachineTemplate resource. In other words, it synchronizes either disabled
or metadata
modes between Metal3MachineTemplate and Metal3Machines.
Selective operations
Normally automated cleaning mode is replicated from Metal3MachineTemplate spec
to its referenced Metal3Machines’ spec
and from Metal3Machines spec
to BareMetalHost spec
(if CAPM3 is used). However, sometimes you might want to have a different automated cleaning mode for one or more Metal3Machines than the others even though they are referencing the same Metal3MachineTemplate. For example, there is one worker and one controlplane Metal3Machine created from the same Metal3MachineTemplate, and we would like the automated cleaning to be enabled (metadata
) for the worker while disabled (disabled
) for the controlplane.
Here are the steps to achieve that:
- Unset
automatedCleaningMode
in the Metal3MachineTemplate. Then CAPM3 controller unsets it for referenced Metal3Machines. Although it is unset in the Metal3Machine, BareMetalHosts will get their default automated cleaning modemetadata
. As we mentioned earlier, CAPM3 controller replicates cleaning mode from Metal3MachineTemplate to Metal3Machine ONLY when it is eithermetadata
ordisabled
. As such, to block synchronization between Metal3MachineTemplate and Metal3Machine, unsetting the cleaning mode in the Metal3MachineTemplate is enough. - Set
automatedCleaningMode
todisabled
on the worker Metal3Machinespec
and tometadata
on the controlplane Metal3Machinespec
. Since we don’t have any mode set on the Metal3MachineTemplate, Metal3Machines can have different automated cleaning modes set even if they reference the same Metal3MachineTemplate. CAPM3 controller copies cleaning modes from Metal3Machines to their corresponding BareMetalHosts. As such, we end up with two nodes having different cleaning modes regardless of the fact that they reference the same Metal3MachineTemplate.
IPAM (IP Address Manager)
The IPAM project provides a controller to manage static IP address allocations in Cluster API Provider Metal3.
In CAPM3, the Network Data need to be passed to Ironic through the BareMetalHost. CAPI addresses the deployment of Kubernetes clusters and nodes, using the Kubernetes API. As such, it uses objects such as MachineDeployments (similar to deployments for pods) that takes care of creating the requested number of machines, based on templates. The replicas can be increased by the user, triggering the creation of new machines based on the provided templates. Considering the KubeadmControlPlane and MachineDeployment features in Cluster API, it is not possible to provide static IP addresses for each machine before the actual deployments.
In addition, all the resources from the source cluster must support the CAPI pivoting, i.e. being copied and recreated in the target cluster. This means that all objects must contain all needed information in their spec field to recreate the status in the target cluster without losing information. All objects must, through a tree of owner references, be attached to the cluster object, for the pivoting to proceed properly.
Moreover, there are use cases that the users want to specify multiple non-continuous ranges of IP addresses, use the same pool across multiple Template objects, or rule out some IP addresses that might be in use for any reason after the deployment.
The IPAM is introduced to manage the allocations of IP subnet according to the requests without handling any use of those addresses. The IPAM adds the flexibility by providing the address right before provisioning the node. It can share a pool across MachineDeployment or KubeadmControlPlane, allow non-continuous pools and external IP management by using IPAddress CRs, offer predictable IP addresses, and it is resilient to the clusterctl move operation.
In order to use IPAM, both the CAPI and IPAM controllers are required, since the IPAM controller has a dependency on Cluster API Cluster objects.
IPAM components
- IPPool: A set of IP addresses pools to be used for IP address allocations
- IPClaim: Request for an IP address allocation
- IPAddress: IP address allocation
IPPool
Example of IPPool:
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
name: pool1
namespace: default
spec:
clusterName: cluster1
namePrefix: test1-prov
pools:
- start: 192.168.0.10
end: 192.168.0.30
prefix: 25
gateway: 192.168.0.1
- subnet: 192.168.1.1/26
- subnet: 192.168.1.128/25
prefix: 24
gateway: 192.168.1.1
preAllocations:
claim1: 192.168.0.12
The spec field contains the following fields:
- clusterName: Name of the cluster to which this pool belongs, it is used to verify whether the resource is paused.
- namePrefix: The prefix used to generate the IPAddress.
- pools: List of IP address pools
- prefix: Default prefix for this IPPool
- gateway: Default gateway for this IPPool
- preAllocations: Default preallocated IP address for this IPPool
The prefix and gateway can be overridden per pool. Here is the pool definition:
- start: IP range start address and it can be omitted if subnet is set.
- end: IP range end address and can be omitted.
- subnet: Subnet for the allocation and can be omitted if start is set. It is used to verify that the allocated address belongs to this subnet.
- prefix: Override of the default prefix for this pool
- gateway: Override of the default gateway for this pool
IPClaim
An IPClaim is an object representing a request for an IP address allocation.
Example of IPClaim:
apiVersion: ipam.metal3.io/v1alpha1
kind: IPClaim
metadata:
name: test1-controlplane-template-0-pool1
namespace: default
spec:
pool:
name: pool1
namespace: default
The spec field contains the following:
- pool: This is a reference to the IPPool that is requested for
IPAddress
An IPAddress is an object representing an IP address allocation. It will be created by IPAM to fill an IPClaim, so that user does not have to create it manually.
Example IPAddress:
apiVersion: ipam.metal3.io/v1alpha1
kind: IPAddress
metadata:
name: test1-prov-192-168-0-13
namespace: default
spec:
pool:
name: pool1
namespace: default
claim:
name: test1-controlplane-template-0-pool1
namespace: default
address: 192.168.0.13
prefix: 24
gateway: 192.168.0.1
The spec field contains the following:
- pool: Reference to the IPPool this address is for
- claim: Reference to the IPClaim this address is for
- address: Allocated IP address
- prefix: Prefix for this address
- gateway: Gateway for this address
Installing IPAM as Deployment
This section will show how IPAM can be installed as a deployment in a cluster.
Deploying controllers
CAPI and IPAM controllers need to be deployed at the begining. The IPAM controller has a dependency on Cluster API Cluster objects. CAPI CRDs and controllers must be deployed and the cluster objects should exist for successful deployments.
Deployment
The user can create the IPPool object independently. It will wait for its cluster to exist before reconciling. If the user wants to create IPAddress objects manually, they should be created before any claims. It is highly recommended to use the preAllocations field itself or have the reconciliation paused.
After an IPClaim object creation, the controller will list all existing IPAddress objects. It will then select randomly an address that has not been allocated yet and is not in the preAllocations map. It will then create an IPAddress object containing the references to the IPPool and IPClaim and the address, the prefix from the address pool or the default prefix, and the gateway from the address pool or the default gateway.
Deploy IPAM
Deploys IPAM CRDs and IPAM controllers. We can run Makefile target from inside the cloned IPAM git repo.
make deploy
Run locally
Runs IPAM controller locally
kubectl scale -n capm3-system deployment.v1.apps/metal3-ipam-controller-manager \
--replicas 0
make run
Deploy an example pool
make deploy-examples
Delete the example pool
make delete-examples
Deletion
When deleting an IPClaim object, the controller will simply delete the associated IPAddress object. Once all IPAddress objects have been deleted, the IPPool object can be deleted. Before that point, the finalizer in the IPPool object will block the deletion.
References
- IPAM.
- IPAM deployment workflow.
- Custom resource (CR) examples in metal3-dev-env, in the templates.
Trying Metal3 on a development environment
Ready to start taking steps towards your first experience with metal3? Follow these commands to get started!
1. Environment Setup
info: “Naming” For the v1alpha3 release, the Cluster API provider for Metal3 was renamed from Cluster API provider BareMetal (CAPBM) to Cluster API provider Metal3 (CAPM3). Hence, from v1alpha3 onwards it is Cluster API provider Metal3.
1.1. Prerequisites
- System with CentOS 9 Stream or Ubuntu 22.04
- Bare metal preferred, as we will be creating VMs to emulate bare metal hosts
- Run as a user with passwordless sudo access
- Minimum resource requirements for the host machine: 4C CPUs, 16 GB RAM memory
For execution with VMs
- Setup passwordless sudo access
$ sudo visudo
- Include this line at the end of the sudoers file
username ALL=(ALL) NOPASSWD: ALL
- Save and exit
- Manually enable nested virtualization if you don’t have it enabled in your VM
1.2. Setup
info: “Information” If you need detailed information regarding the process of creating a Metal³ emulated environment using metal3-dev-env, it is worth taking a look at the blog post “A detailed walkthrough of the Metal³ development environment”.
This is a high-level architecture of the Metal³-dev-env. Note that for an Ubuntu-based setup, either Kind or Minikube can be used to instantiate an ephemeral cluster, while for a CentOS-based setup, only Minikube is currently supported. The ephemeral cluster creation tool can be manipulated with the EPHEMERAL_CLUSTER environment variable.
The short version is: clone metal³-dev-env and run
make
The Makefile
runs a series of scripts, described here:
-
01_prepare_host.sh
- Installs all needed packages. -
02_configure_host.sh
- Creates a set of VMs that will be managed as if they were bare metal hosts. It also downloads some images needed for Ironic. -
03_launch_mgmt_cluster.sh
- Launches a management cluster usingminikube
orkind
and runs thebaremetal-operator
on that cluster. -
04_verify.sh
- Runs a set of tests that verify that the deployment was completed successfully.
When the environment setup is completed, you should be able to see the BareMetalHost
(bmh
) objects in the Ready state.
1.3. Tear Down
To tear down the environment, run
make clean
info “Note” When redeploying metal³-dev-env with a different release version of CAPM3, you must set the
FORCE_REPO_UPDATE
variable inconfig_${user}.sh
to true. warning “Warning” If you see this error during the installation:error: failed to connect to the hypervisor \ error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied
You may need to log out then log in again, and run
make clean
andmake
again.
1.4. Using Custom Image
Whether you want to run target cluster Nodes with your own image, you can override the three following variables: IMAGE_NAME
,
IMAGE_LOCATION
, IMAGE_USERNAME
. If the requested image with the name IMAGE_NAME
does not
exist in the IRONIC_IMAGE_DIR
(/opt/metal3-dev-env/ironic/html/images) folder, then it will be automatically
downloaded from the IMAGE_LOCATION
value configured.
1.5. Setting environment variables
info “Environment variables” More information about the specific environment variables used to set up metal3-dev-env can be found here.
To set environment variables persistently, export them from the configuration file used by metal³-dev-env scripts:
cp config_example.sh config_$(whoami).sh
vim config_$(whoami).sh
2. Working with the Development Environment
2.1. BareMetalHosts
This environment creates a set of VMs to manage as if they were bare metal hosts.
There are two different host OSs that the metal3-dev-env setup process is tested on.
- Host VM/Server on CentOS, while the target can be Ubuntu or CentOS, Cirros, or FCOS.
- Host VM/Server on Ubuntu, while the target can be Ubuntu or CentOS, Cirros, or FCOS.
The way the k8s cluster is running in the above two scenarios is different. For CentOS minikube
cluster is used as the source cluster, for Ubuntu, a kind
cluster is being created.
As such, when the host (where the make
command was issued) OS is CentOS, there should be three libvirt VMs and one of them should be a minikube
VM.
In case the host OS is Ubuntu, the k8s source cluster is created by using kind
, so in this case the minikube
VM won’t be present.
To configure what tool should be used for creating source k8s cluster the EPHEMERAL_CLUSTER
environment variable is responsible.
The EPHEMERAL_CLUSTER
is configured to build minikube
cluster by default on a CentOS host and kind
cluster on a Ubuntu host.
VMs can be listed using virsh
cli tool.
In case the EPHEMERAL_CLUSTER
environment variable is set to kind
the list of
running virtual machines will look like this:
$ sudo virsh list
Id Name State
--------------------------
1 node_0 running
2 node_1 running
In case the EPHEMERAL_CLUSTER
environment variable is set to minikube
the list of
running virtual machines will look like this:
$ sudo virsh list
Id Name State
--------------------------
1 minikube running
2 node_0 running
3 node_1 running
Each of the VMs (aside from the minikube
management cluster VM) is
represented by BareMetalHost
objects in our management cluster. The yaml
definition file used to create these host objects is in ${WORKING_DIR}/bmhosts_crs.yaml
.
$ kubectl get baremetalhosts -n metal3 -o wide
NAME STATUS STATE CONSUMER BMC HARDWARE_PROFILE ONLINE ERROR AGE
node-0 OK available ipmi://192.168.111.1:6230 unknown true 58m
node-1 OK available redfish+http://192.168.111.1:8000/redfish/v1/Systems/492fcbab-4a79-40d7-8fea-a7835a05ef4a unknown true 58m
You can also look at the details of a host, including the hardware information gathered by doing pre-deployment introspection.
$ kubectl get baremetalhost -n metal3 -o yaml node-0
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"node-0","namespace":"metal3"},"spec":{"bmc":{"address":"ipmi://192.168.111.1:6230","credentialsName":"node-0-bmc-secret"},"bootMACAddress":"00:ee:d0:b8:47:7d","bootMode":"legacy","online":true}}
creationTimestamp: "2021-07-12T11:04:10Z"
finalizers:
- baremetalhost.metal3.io
generation: 1
name: node-0
namespace: metal3
resourceVersion: "3243"
uid: 3bd8b945-a3e8-43b9-b899-2f869680d28c
spec:
automatedCleaningMode: metadata
bmc:
address: ipmi://192.168.111.1:6230
credentialsName: node-0-bmc-secret
bootMACAddress: 00:ee:d0:b8:47:7d
bootMode: legacy
online: true
status:
errorCount: 0
errorMessage: ""
goodCredentials:
credentials:
name: node-0-bmc-secret
namespace: metal3
credentialsVersion: "1789"
hardware:
cpu:
arch: x86_64
clockMegahertz: 2694
count: 2
flags:
- aes
- apic
# There are many more flags but they are not listed in this example.
model: Intel Xeon E3-12xx v2 (Ivy Bridge)
firmware:
bios:
date: 04/01/2014
vendor: SeaBIOS
version: 1.13.0-1ubuntu1.1
hostname: node-0
nics:
- ip: 172.22.0.20
mac: 00:ee:d0:b8:47:7d
model: 0x1af4 0x0001
name: enp1s0
pxe: true
- ip: fe80::1863:f385:feab:381c%enp1s0
mac: 00:ee:d0:b8:47:7d
model: 0x1af4 0x0001
name: enp1s0
pxe: true
- ip: 192.168.111.20
mac: 00:ee:d0:b8:47:7f
model: 0x1af4 0x0001
name: enp2s0
- ip: fe80::521c:6a5b:f79:9a75%enp2s0
mac: 00:ee:d0:b8:47:7f
model: 0x1af4 0x0001
name: enp2s0
ramMebibytes: 4096
storage:
- hctl: "0:0:0:0"
model: QEMU HARDDISK
name: /dev/sda
rotational: true
serialNumber: drive-scsi0-0-0-0
sizeBytes: 53687091200
type: HDD
vendor: QEMU
systemVendor:
manufacturer: QEMU
productName: Standard PC (Q35 + ICH9, 2009)
hardwareProfile: unknown
lastUpdated: "2021-07-12T11:08:53Z"
operationHistory:
deprovision:
end: null
start: null
inspect:
end: "2021-07-12T11:08:23Z"
start: "2021-07-12T11:04:55Z"
provision:
end: null
start: null
register:
end: "2021-07-12T11:04:55Z"
start: "2021-07-12T11:04:44Z"
operationalStatus: OK
poweredOn: true
provisioning:
ID: 8effe29b-62fe-4fb6-9327-a3663550e99d
bootMode: legacy
image:
url: ""
rootDeviceHints:
deviceName: /dev/sda
state: ready
triedCredentials:
credentials:
name: node-0-bmc-secret
namespace: metal3
credentialsVersion: "1789"
2.2. Provision Cluster and Machines
This section describes how to trigger the provisioning of a cluster and hosts via
Machine
objects as part of the Cluster API integration. This uses Cluster API
v1beta1 and
assumes that metal3-dev-env is deployed with the environment variable
CAPM3_VERSION set to v1beta1. This is the default behaviour. The v1beta1 deployment can be done with
Ubuntu 22.04 or Centos 9 Stream target host images. Please make sure to meet
resource requirements for successful deployment:
See support version for more on CAPI compatibility
The following scripts can be used to provision a cluster, controlplane node and worker node.
./tests/scripts/provision/cluster.sh
./tests/scripts/provision/controlplane.sh
./tests//scripts/provision/worker.sh
At this point, the Machine
actuator will respond and try to claim a
BareMetalHost
for this Metal3Machine
. You can check the logs of the actuator.
First, check the names of the pods running in the baremetal-operator-system
namespace and the output should be something similar
to this:
$ kubectl -n baremetal-operator-system get pods
NAME READY STATUS RESTARTS AGE
baremetal-operator-controller-manager-5fd4fb6c8-c9prs 2/2 Running 0 71m
In order to get the logs of the actuator the logs of the baremetal-operator-controller-manager instance have to be queried with the following command:
$ kubectl logs -n baremetal-operator-system pod/baremetal-operator-controller-manager-5fd4fb6c8-c9prs -c manager
...
{"level":"info","ts":1642594214.3598707,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"metal3/node-1", "provisioningState":"provisioning","requeue":true,"after":10}
...
Keep in mind that the suffix hashes e.g. 5fd4fb6c8-c9prs
are automatically generated and change in case of a different
deployment.
If you look at the yaml representation of the Metal3Machine
object, you will see a
new annotation that identifies which BareMetalHost
was chosen to satisfy this
Metal3Machine
request.
First list the Metal3Machine
objects present in the metal3
namespace:
$ kubectl get metal3machines -n metal3
NAME PROVIDERID READY CLUSTER PHASE
test1-controlplane-jjd9l metal3://d4848820-55fd-410a-b902-5b2122dd206c true test1
test1-workers-bx4wp metal3://ee337588-be96-4d5b-95b9-b7375969debd true test1
Based on the name of the Metal3Machine
objects you can check the yaml representation of the object and
see from its annotation which BareMetalHost
was chosen.
$ kubectl get metal3machine test1-workers-bx4wp -n metal3 -o yaml
...
annotations:
metal3.io/BareMetalHost: metal3/node-1
...
You can also see in the list of BareMetalHosts
that one of the hosts is now
provisioned and associated with a Metal3Machines
by looking at the CONSUMER
output column of the following command:
$ kubectl get baremetalhosts -n metal3
NAME STATE CONSUMER ONLINE ERROR AGE
node-0 provisioned test1-controlplane-jjd9l true 122m
node-1 provisioned test1-workers-bx4wp true 122m
It is also possible to check which Metal3Machine
serves as the infrastructure for the ClusterAPI Machine
objects.
First list the Machine
objects:
$ kubectl get machine -n metal3
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
test1-6d8cc5965f-wvzms test1 test1-6d8cc5965f-wvzms metal3://7f51f14b-7701-436a-85ba-7dbc7315b3cb Running 53m v1.22.3
test1-nphjx test1 test1-nphjx metal3://14fbcd25-4d09-4aca-9628-a789ba3e175c Running 55m v1.22.3
As a next step you can check what serves as the infrastructure backend for e.g. test1-6d8cc5965f-wvzms
Machine
object:
$ kubectl get machine test1-6d8cc5965f-wvzms -n metal3 -o yaml
...
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3Machine
name: test1-workers-bx4wp
namespace: metal3
uid: 39362b32-ebb7-4117-9919-67510ceb177f
...
Based on the result of the query test1-6d8cc5965f-wvzms
ClusterAPI Machine
object is backed by
test1-workers-bx4wp
Metal3Machine
object.
You should be able to ssh into your host once provisioning is completed.
The default username for both CentOS & Ubuntu images is metal3
.
For the IP address, you can either use the API endpoint IP of the target cluster
which is - 192.168.111.249
by default or use the predictable IP address of the first
master node - 192.168.111.100
.
ssh metal3@192.168.111.249
2.3. Deprovision Cluster and Machines
Deprovisioning of the target cluster is done just by deleting Cluster
and Machine
objects or by executing the de-provisioning scripts in reverse order than provisioning:
./tests/scripts/deprovision/worker.sh
./tests/scripts/deprovision/controlplane.sh
./tests/scripts/deprovision/cluster.sh
Note that you can easily de-provision worker Nodes by decreasing the number of replicas in the MachineDeployment
object created when executing the provision/worker.sh
script:
kubectl scale machinedeployment test1 -n metal3 --replicas=0
warning “Warning” control-plane and cluster are very tied together. This means that you are not able to de-provision the control-plane of a cluster and then provision a new one within the same cluster. Therefore, in case you want to de-provision the control-plane you need to de-provision the cluster as well and provision both again.
Below, it is shown how the de-provisioning can be executed in a more manual way by just deleting the proper Custom Resources (CR).
The order of deletion is:
- Machine objects of the workers
- Metal3Machine objects of the workers
- Machine objects of the control plane
- Metal3Machine objects of the control plane
- The cluster object
An additional detail is that the Machine
object test1-workers-bx4wp
is controlled by the test1
MachineDeployment
the object thus in order to avoid reprovisioning of the Machine
object the MachineDeployment
has to be deleted instead of the Machine
object in the case of test1-workers-bx4wp
.
$ # By deleting the Machine or MachineDeployment object the related Metal3Machine object(s) should be deleted automatically.
$ kubectl delete machinedeployment test1 -n metal3
machinedeployment.cluster.x-k8s.io "test1" deleted
$ # The "machinedeployment.cluster.x-k8s.io "test1" deleted" output will be visible almost instantly but that doesn't mean that the related Machine
$ # object(s) has been deleted right away, after the deletion command is issued the Machine object(s) will enter a "Deleting" state and they could stay in that state for minutes
$ # before they are fully deleted.
$ kubectl delete machine test1-m77bn -n metal3
machine.cluster.x-k8s.io "test1-m77bn" deleted
$ # When a Machine object is deleted directly and not by deleting a MachineDeployment the "machine.cluster.x-k8s.io "test1-m77bn" deleted" will be only visible when the Machine and the
$ # related Metal3Machine object has been fully removed from the cluster. The deletion process could take a few minutes thus the command line will be unresponsive (blocked) for the time being.
$ kubectl delete cluster test1 -n metal3
cluster.cluster.x-k8s.io "test1" deleted
Once the deletion has finished, you can see that the BareMetalHosts
are offline and Cluster
object is not present anymore
$ kubectl get baremetalhosts -n metal3
NAME STATE CONSUMER ONLINE ERROR AGE
node-0 available false 160m
node-1 available false 160m
$ kubectl get cluster -n metal3
No resources found in metal3 namespace.
2.4. Running Custom Baremetal-Operator
The baremetal-operator
comes up running in the cluster by default, using an
image built from the metal3-io/baremetal-operator repository. If you’d like to test changes to the
baremetal-operator
, you can follow this process.
First, you must scale down the deployment of the baremetal-operator
running
in the cluster.
kubectl scale deployment baremetal-operator-controller-manager -n baremetal-operator-system --replicas=0
To be able to run baremetal-operator
locally, you need to install
operator-sdk. After that, you can run
the baremetal-operator
including any custom changes.
cd ~/go/src/github.com/metal3-io/baremetal-operator
make run
2.5. Running Custom Cluster API Provider Metal3
There are two Cluster API-related managers running in the cluster. One includes a set of generic controllers, and the other includes a custom Machine controller for Metal3.
Tilt development environment
Tilt setup can deploy CAPM3 in a local kind cluster. Since
Tilt is applied in the metal3-dev-env deployment, you can make changes inside
the cluster-api-provider-metal3
folder and Tilt will deploy the changes
automatically.
If you deployed CAPM3 separately and want to make changes to it, then
follow CAPM3 instructions. This will save you from
having to build all of the images for CAPI, which can take a while. If the
scope of your development will span both CAPM3 and CAPI, then follow the
CAPI and CAPM3 instructions.
2.6. Accessing Ironic API
Sometimes you may want to look directly at Ironic to debug something. The metal3-dev-env repository contains clouds.yaml file with connection settings for Ironic.
Metal3-dev-env will install the unified OpenStack and standalone OpenStack Ironic command-line clients on the provisioning host as part of setting up the cluster.
Note that currently, you can use either a unified OpenStack client or an Ironic client. In this example, we are using an Ironic client to interact with the Ironic API.
Please make sure to export
CONTAINER_RUNTIME
environment variable before you execute
commands.
Example:
[notstack@metal3 metal3-dev-env]$ export CONTAINER_RUNTIME=docker
[notstack@metal3 metal3-dev-env]$ baremetal node list
+--------------------------------------+---------------+--------------------------------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------------+--------------------------------------+-------------+--------------------+-------------+
| b423ee9c-66d8-48dd-bd6f-656b93140504 | metal3~node-1 | 7f51f14b-7701-436a-85ba-7dbc7315b3cb | power off | available | False |
| 882533c5-2f14-49f6-aa44-517e1e404fd8 | metal3~node-0 | 14fbcd25-4d09-4aca-9628-a789ba3e175c | power off | available | False |
+--------------------------------------+---------------+--------------------------------------+-------------+--------------------+-------------+
To view a particular node’s details, run the below command. The
last_error
, maintenance_reason
, and provisioning_state
fields are
useful for troubleshooting to find out why a node did not deploy.
[notstack@metal3 metal3-dev-env]$ baremetal node show b423ee9c-66d8-48dd-bd6f-656b93140504
+------------------------+------------------------------------------------------------+
| Field | Value |
+------------------------+------------------------------------------------------------+
| allocation_uuid | None |
| automated_clean | True |
| bios_interface | redfish |
| boot_interface | ipxe |
| chassis_uuid | None |
| clean_step | {} |
| conductor | 172.22.0.2 |
| conductor_group | |
| console_enabled | False |
| console_interface | no-console |
| created_at | 2022-01-19T10:56:06+00:00 |
| deploy_interface | direct |
| deploy_step | {} |
| description | None |
| driver | redfish |
| driver_info | {u'deploy_kernel': u'http://172.22.0.2:6180/images/ironic-python-agent.kernel', u'deploy_ramdisk': u'http://172.22.0.2:6180/images/ironic-python-agent.initramfs', u'redfish_address': u'http://192.168.111.1:8000', u'redfish_password': u'******', u'redfish_system_id': u'/redfish/v1/Systems/492fcbab-4a79-40d7-8fea-a7835a05ef4a', u'redfish_username': u'admin', u'force_persistent_boot_device': u'Default'} |
| driver_internal_info | {u'last_power_state_change': u'2022-01-19T13:04:01.981882', u'agent_version': u'8.3.1.dev2', u'agent_last_heartbeat': u'2022-01-19T13:03:51.874842', u'clean_steps': None, u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': True, u'agent_continue_if_secure_erase_failed': False, u'agent_continue_if_ata_erase_failed': False, u'agent_enable_nvme_secure_erase': True, u'disk_erasure_concurrency': 1, u'agent_erase_skip_read_only': False, u'hardware_manager_version': {u'generic_hardware_manager': u'1.1'}, u'agent_cached_clean_steps_refreshed': u'2022-01-19 13:03:47.558697', u'deploy_steps': None, u'agent_cached_deploy_steps_refreshed': u'2022-01-19 12:09:34.731244'} |
| extra | {} |
| fault | None |
| inspect_interface | inspector |
| inspection_finished_at | None |
| inspection_started_at | 2022-01-19T10:56:17+00:00 |
| instance_info | {u'capabilities': {}, u'image_source': u'http://172.22.0.1/images/CENTOS_8_NODE_IMAGE_K8S_v1.22.3-raw.img', u'image_os_hash_algo': u'md5', u'image_os_hash_value': u'http://172.22.0.1/images/CENTOS_8_NODE_IMAGE_K8S_v1.22.3-raw.img.md5sum', u'image_checksum': u'http://172.22.0.1/images/CENTOS_8_NODE_IMAGE_K8S_v1.22.3-raw.img.md5sum', u'image_disk_format': u'raw'} |
| instance_uuid | None |
| last_error | None |
| lessee | None |
| maintenance | False |
| maintenance_reason | None |
| management_interface | redfish |
| name | metal3~node-1 |
| network_data | {} |
| network_interface | noop |
| owner | None |
| power_interface | redfish |
| power_state | power off |
| properties | {u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages:true,boot_mode:bios', u'vendor': u'Sushy Emulator', u'local_gb': u'50', u'cpus': u'2', u'cpu_arch': u'x86_64', u'memory_mb': u'4096', u'root_device': {u'name': u's== /dev/sda'}} |
| protected | False |
| protected_reason | None |
| provision_state | available |
| provision_updated_at | 2022-01-19T13:03:52+00:00 |
| raid_config | {} |
| raid_interface | no-raid |
| rescue_interface | no-rescue |
| reservation | None |
| resource_class | None |
| retired | False |
| retired_reason | None |
| storage_interface | noop |
| target_power_state | None |
| target_provision_state | None |
| target_raid_config | {} |
| traits | [] |
| updated_at | 2022-01-19T13:04:03+00:00 |
| uuid | b423ee9c-66d8-48dd-bd6f-656b93140504 |
| vendor_interface | redfish |
+-------------------------------------------------------------------------------------+
Supported release versions
The Cluster API Provider Metal3 (CAPM3) team maintains the two most recent minor releases; older minor releases are immediately unsupported when a new major/minor release is available. Test coverage will be maintained for all supported minor releases and for one additional release for the current API version in case we have to do an emergency patch release. For example, if v1.4 and v1.5 are currently supported, we will also maintain test coverage for v1.3 for one additional release cycle. When v1.6 is released, tests for v1.3 will be removed.
Currently, in Metal³ organization only CAPM3 and IPAM follow CAPI release cycles. The supported versions (excluding release candidates) for CAPM3 and IPAM releases are as follows:
Cluster API Provider Metal3
Minor release | API version | Status |
---|---|---|
v1.5 | v1beta1 | Supported |
v1.4 | v1beta1 | Supported |
v1.3 | v1beta1 | Tested |
v1.2 | v1beta1 | EOL |
v1.1 | v1beta1 | EOL |
IP Address Manager
Minor release | API version | Status |
---|---|---|
v1.5 | v1beta1 | Supported |
v1.4 | v1beta1 | Supported |
v1.3 | v1beta1 | Tested |
v1.2 | v1alpha1 | EOL |
v1.1 | v1alpha1 | EOL |
The compatability of IPAM and CAPM3 API versions with CAPI is discussed here.
Baremetal Operator
Since capm3-v1.1.2
, BMO follows the semantic versioning scheme for its own
release cycle, the same way as CAPM3 and IPAM. Currently, we have release-0.4
and release-0.3 release branches for v0.4.x
and v0.3.x
release cycle
respectively and as such these two braches are maintained as supported releases.
Following table summarizes BMO release/test process:
Minor release | Status |
---|---|
v0.4 | Supported |
v0.3 | Tested |
v0.2 | Tested |
v0.1 | EOL |
Image tags
The Metal³ team provides container images for all the main projects and also
many auxilary tools needed for tests or otherwise useful. Some of these images
are tagged in a way that makes it easy to identify what version of Cluster API
provider Metal³ they are tested with. For example, we tag Ironic and MariaDB
container images with tags like capm3-v1.4.0
, where v1.4.0
would be the
CAPM3 release it was tested with. Prior to CAPM3 release v1.1.3 this also
applied to the Baremetal Operator.
All container images are published through the Metal³ organization in Quay. Here are some examples:
- quay.io/metal3-io/cluster-api-provider-metal3:v1.5.0
- quay.io/metal3-io/baremetal-operator:v0.4.0
- quay.io/metal3-io/ip-address-manager:v1.5.0
- quay.io/metal3-io/ironic:capm3-v1.5.0
- quay.io/metal3-io/mariadb:capm3-v1.5.0
CI Test Matrix
The table describes which branches/image-tags are tested in each periodic CI tests:
INTEGRATION TESTS | CAPM3 branch | IPAM branch | BMO branch/tag | Keepalived tag | MariaDB tag | Ironic tag |
---|---|---|---|---|---|---|
daily_main_integration_test_ubuntu/centos | main | main | main | latest | latest | latest |
daily_main_e2e_integration_test_ubuntu/centos | main | main | main | latest | latest | latest |
daily_release-1-5_integration_test_ubuntu/centos | release-1.5 | release-1.5 | release-0.4 | v0.4.0 | latest | latest |
daily_release-1-4_integration_test_ubuntu/centos | release-1.4 | release-1.4 | release-0.3 | v0.3.1 | latest | latest |
daily_release-1-3_integration_test_ubuntu/centos | release-1.3 | release-1.3 | v0.2.0 | v0.2.0 | latest | latest |
FEATURE AND E2E TESTS | CAPM3 branch | IPAM branch | BMO branch/tag | Keepalived tag | MariaDB tag | Ironic tag |
---|---|---|---|---|---|---|
daily_main_e2e_feature_test_ubuntu/centos | main | main | main | latest | latest | latest |
daily_release-1-5_e2e_feature_test_ubuntu/centos | release-1.5 | release-1.5 | release-0.4 | v0.4.0 | latest | latest |
daily_release-1-4_e2e_feature_test_ubuntu/centos | release-1.4 | release-1.4 | release-0.3 | v0.3.1 | latest | latest |
daily_release-1-3_e2e_feature_test_ubuntu/centos | release-1.3 | release-1.3 | v0.2.0 | v0.2.0 | latest | latest |
daily_main_feature_tests_ubuntu/centos | main | main | main | latest | latest | latest |
EPHEMERAL TESTS | CAPM3 branch | IPAM branch | BMO branch/tag | Keepalived tag | MariaDB tag | Ironic tag |
---|---|---|---|---|---|---|
daily_main_e2e_ephemeral_test_centos | main | main | main | latest | latest | latest |
All tests use latest images of VBMC and sushy-tools.
Metal3-io security policy
This document explains the general security policy for the whole
project thus it is applicable for all of its
active repositories and this file has to be referenced in each repository in
each repository’s SECURITY_CONTACTS
file.
Way to report a security issue
The Metal3 Community asks that all suspected vulnerabilities be disclosed by
reporting them to metal3-security@googlegroups.com
mailing list which will
forward the vulnerability report to the Metal3 security committee.
Security issue handling, severity categorization, fix process organization
The actions listed below should be completed within 7 days of the
security issue’s disclosure on the metal3-security@googlegroups.com
.
Security Lead (SL) of the Metal3 Security Committee (M3SC) is tasked to review the security issue disclosure and give the initial feedback to the reporter as soon as possible. Any disclosed security issue will be visible to all M3SC members.
For each reported vulnerability the SL will work quickly to identify committee members that are able work on a fix and CC those developers into the disclosure thread. These selected developers are the Fix Team. The Fix Team is also allowed to invite additional developers into the disclosure thread based on the repo’s OWNERS file. They will then also become members of the Fix Team but not the M3SC.
M3SC members are encouraged to volunteer to the Fix Teams even before the SL would contact them if they think they are ready to work on the issue. M3SC members are also encouraged to correct both the SL and each other on the disclosure threads even if they have not been selected to the Fix Team but after reading the disclosure thread they were able to find mistakes.
The Fix team will start working on the fix either on a private fork of the affected repo or in the public repo depending on the severity of the issue and the decision of the SL. The SL makes the final call about whether the issue can be fixed publicly or it should stay on a private fork until the fix is disclosed based on the issues’ severity level (discussed later in this document).
The SL and the Fix Team will create a CVSS score using the CVSS Calculator. The SL makes the final call on the calculated risk.
If the CVSS score is under ~4.0 (a low severity score) or the assessed risk is
low the Fix Team can decide to slow the release process down in the face of
holidays, developer bandwidth, etc. These decisions must be discussed on the
metal3-security@googlegroups.com
.
If the CVSS score is under ~7.0 (a medium severity score), the SL may choose to carry out the fix semi-publicly. Semi-publicly means that PRs are made directly in the public Metal3-io repositories, while restricting discussion of the security aspects to private channels. The SL will make the determination whether there would be user harm in handling the fix publicly that outweighs the benefits of open engagement with the community.
If the CVSS score is over ~7.0 (high severity score), fixes will typically receive an out-of-band release.
More information can be found about severity scores here.
Note: CVSS is convenient but imperfect. Ultimately, the SL has discretion on classifying the severity of a vulnerability.
No matter the CVSS score, if the vulnerability requires User Interaction, or otherwise has a straightforward, non-disruptive mitigation, the SL may choose to disclose the vulnerability before a fix is developed if they determine that users would be better off being warned against a specific interaction.
Fix Disclosure Process
With the Fix Development underway the SL needs to come up with an overall communication plan for the wider community. This Disclosure process should begin after the Fix Team has developed a Fix or mitigation so that a realistic timeline can be communicated to users. Emergency releases for critical and high severity issues or fixes for issues already made public may affect the below timelines for how quickly or far in advance notifications will occur.
The SL will lead the process of creating a GitHub security advisory for the repository that is affected by the issue. In case the SL has no administrator privileges the advisory will be created in cooperation with a repository admin. SL will have to request a CVE number for the security advisory. As GitHub is a CVE Numbering authority (CNA) there is an option to either use an existing CVE number or request a new one from GitHub. More about the GitHub security advisory and the CVE numbering process can be found here.
The original reporter(s) of the security issue has to be notified about the release date of the fix and the advisory and about both the content of the fix and the advisory as soon as the SL has decided a date for the fix disclosure.
If a repository that has a release process requires a high severity fix then the fix has to be released as a patch release for all supported release branches where the fix is relevant as soon as possible.
In case the repository does not have a release process, but it needs a critical fix then the fix has to be merged to the main branch as soon as possible.
In repositories that have a release process Medium and Low severity vulnerability fixes will be released as part of the next upcoming minor or major release whichever happens sooner. Simultaneously with the upcoming release the fix also has to be released to all supported release branches as a patch release if the fix is relevant for given release.
In case the fix was developed on a private repository either the SL or someone designated by the SL has to cherry-pick the fix and push it to the public repository. The SL and the Fix Team has to be able to push the PR through the public repo’s review process as soon as possible and merge it.
Metal3 security committee members
Name | GitHub ID | Affiliation |
---|---|---|
Dmitry Tantsur | dtantsur | Red Hat |
Riccardo Pittau | elfosardo | Red Hat |
Zane Bitter | zaneb | Red Hat |
Furkat Gofurov | furkatgofurov7 | Ericsson Software Technology |
Kashif Khan | kashifest | Ericsson Software Technology |
Lennart Jern | lentzi90 | Ericsson Software Technology |
Tuomo Tanskanen | tuminoid | Ericsson Software Technology |
Adam Rozman | Rozzii | Ericsson Software Technology |
Please don’t report any security vulnerability to the committee members directly.