Upgrade from 1.4.x

This is a guide to upgrade Burrito Aster to Burrito Begonia.

I assume Burrito Aster 1.4.x is already installed and running. This guide will show you how to upgrade it to Burrito Begonia 2.0.9.

Here is the example node ip address table.

Hostname

management IP

control1

192.168.21.111

control2

192.168.21.112

control3

192.168.21.113

compute1

192.168.21.114

compute2

192.168.21.115

  • KeepAlived VIP: 192.168.21.110

This is a k8s cluster version table.

Components

Aster 1.4.x

Begonia 2.0.9

containerd

v1.7.7

v1.7.13

kubernetes

v1.28.3

v1.29.2

kubeadm

v1.28.3

v1.29.2

etcd

v3.5.9

v3.5.10

calico

v3.26.3

v3.26.4

coredns

v1.10.1

v1.11.1

helm

v3.13.1

v3.14.2

nerdctl

1.6.0

1.7.1

crictl

v1.28.0

v1.29.0

nodelocaldns

1.22.20

1.22.28

cert-manager

v1.11.1

v1.13.2

metallb

v0.13.9

v0.13.9

registry

2.8.2

2.8.3

Modify kube-apiserver manifest

First, we need to modify kube-apiserver manifest to upgrade kubernetes,

Modify –anonymous-auth to true in /etc/kubernetes/manifests/kube-apiserver.yaml on every control node.:

$ sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
...
    - --anonymous-auth=true

Wait until kube-apiserver is restarted on each control node.

Check if we can connect to each kube-apiserver.:

$ curl -sk https://192.168.21.111:6443/healthz
ok
$ curl -sk https://192.168.21.112:6443/healthz
ok
$ curl -sk https://192.168.21.113:6443/healthz
ok

Prepare Begonia iso

We will use Burrito Begonia 2.0.9 iso to upgrade the existing Burrito Aster cluster.

Mount burrito-2.0.9_8.9.iso in /mnt.:

$ sudo mount -o loop,ro burrito-2.0.9_8.9.iso /mnt

Unarchive burrito-2.0.9 tarball from the iso.:

$ tar xzf /mnt/burrito-2.0.9.tar.gz

Back up localrepo.cfg and registry.cfg in /etc/haproxy/conf.d/.:

$ sudo mv /etc/haproxy/conf.d/localrepo.cfg \
            /etc/haproxy/conf.d/localrepo.cfg.bak
$ sudo mv /etc/haproxy/conf.d/registry.cfg \
            /etc/haproxy/conf.d/registry.cfg.bak

Reload haproxy.service on the first control node.:

$ sudo systemctl reload haproxy.service

Run prepare.sh script.:

$ cd burrito-2.0.9
$ ./prepare.sh offline

Copy files from the existing burrito dir (e.g. $HOME/burrito-1.4.x).:

$ cp $HOME/burrito-1.4.x/.vaultpass .
$ cp $HOME/burrito-1.4.x/group_vars/all/vault.yml group_vars/all/
$ cp $HOME/burrito-1.4.x/.offline_flag .

Edit hosts and vars.yml for your environment.:

$ vi hosts
$ vi vars.yml

Edit group_vars/all/*.yml if you modified them for your environment.:

$ vi group_vars/all/netapp_vars.yml
$ vi group_vars/all/ceph_vars.yml

Check the node connectivity.:

$ ./run.sh ping

Check if keepalived_vip(192.168.21.110) is on the first control node.:

$ ip -br a s dev eth1
eth1             UP             192.168.21.111/24 192.168.21.110/32 fe80::5054:ff:feeb:2b8b/64

If it is not, move keepalived_vip to the first control node by restarting keepalived service. For example, if keepalived_vip is on the second control node, restart keepalived service on the second control node.:

$ sudo systemctl restart keepalived.service

Then the keepalived_vip will be moved to the first control node.

Remove registry, localrepo, and asklepios pods.:

$ sudo kubectl delete deploy registry localrepo asklepios -n kube-system
deployment.apps "registry" deleted
deployment.apps "localrepo" deleted
deployment.apps "asklepios" deleted

These pods will be recreated while upgrading.

Run preflight playbook.:

$ ./run.sh preflight

You are ready to upgrade kubernetes cluster now.

Upgrade kubernetes

Run k8s playbook with upgrade_cluster_setup=true.:

$ ./run.sh k8s -e upgrade_cluster_setup=true

It will take a long time. It took about 52 minutes in my VM environment.

Check if the kubernetes version is v1.29.2.:

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Run patch playbook.:

$ ./run.sh patch

Run registry playbook.:

$ ./run.sh registry

Check the new images(e.g. kube-apiserver:v1.29.2) are added to the local registry.:

$ curl -s 192.168.21.110:32680/v2/kube-apiserver/tags/list
{"name":"kube-apiserver","tags":["v1.28.3","v1.29.2"]}

Run landing playbook.:

$ ./run.sh landing

Check the new images(e.g. kube-apiserver:v1.29.2) are added to the genesis registry.:

$ curl -s 192.168.21.110:6000/v2/kube-apiserver/tags/list
{"name":"kube-apiserver","tags":["v1.28.3","v1.29.2"]}

Kubernetes upgrade is done!

Upgrade OpenStack

We will upgrade each openstack component one by one.

Here is a version table.

Components

Aster 1.4.x

Begonia 2.0.9

ingress

v1.1.3

v1.8.2

mariadb

10.6.16

10.11.7

rabbitmq

3.11.28

3.12.11

memcached

1.6.17

1.6.22

libvirt

6.0.0

8.0.0

keystone

21.0.2.dev3

23.0.2.dev10

glance

24.2.2.dev1

26.0.1.dev2

placement

7.0.1

9.0.1

neutron

20.5.1.dev28

22.1.1.dev110

nova

25.3.1.dev1

27.2.1.dev19

cinder

20.3.3.dev2

22.1.2.dev10

horizon

22.1.0

23.1.1.dev14

btx

1.2.3

2.0.2

Before upgrading openstack

Before upgrade, we need to do the following tasks.

Stop all VM instances.:

root@btx-0:/# o server stop <VM_NAME> [<VM_NAME> ...]

Get all compute node id.:

root@btx-0:/# o hypervisor list
+--------------------------------------+---------------------+-----------------+----------------+-------+
| ID                                   | Hypervisor Hostname | Hypervisor Type | Host IP        | State |
+--------------------------------------+---------------------+-----------------+----------------+-------+
| 5febbf97-71dc-4ae0-a902-2217ee97cd3b | aster-compute       | QEMU            | 192.168.21.114 | up    |
+--------------------------------------+---------------------+-----------------+----------------+-------+

Create /var/lib/nova/compute_id on each compute node.:

$ echo 5febbf97-71dc-4ae0-a902-2217ee97cd3b | sudo -u nova tee /var/lib/nova/compute_id

(For netapp nfs) Preserve nova-instances PVC if NetApp NFS is the default storage backend.

Patch nova-instances PVC.:

$ NOVA_INSTANCES_PVC=$(kubectl get pvc nova-instances -n openstack \
    -o jsonpath='{.spec.volumeName}')
$ echo $NOVA_INSTANCES_PVC
pvc-cc0d533d-eaaf-4a8f-81a0-3e11d9720944
$ sudo kubectl patch pv $NOVA_INSTANCES_PVC -p \
    '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
persistentvolume/pvc-cc0d533d-eaaf-4a8f-81a0-3e11d9720944 patched

Uninstall the following components. These components cannot be upgraded while they are running.:

$ ./scripts/burrito.sh uninstall nova
$ ./scripts/burrito.sh uninstall ingress
$ ./scripts/burrito.sh uninstall mariadb
$ ./scripts/burrito.sh uninstall rabbitmq

(For netapp nfs) Patch nova-instances PVC to nullify claim.:

$ sudo kubectl patch pv $NOVA_INSTANCES_PVC -p \
    '{"spec":{"claimRef": {"resourceVersion": null, "uid": null}}}'
persistentvolume/pvc-cc0d533d-eaaf-4a8f-81a0-3e11d9720944 patched

(For netapp nfs) Add volumeName in openstack-helm/nova/templates/pvc-instances.yaml.:

spec:
  accessModes: [ "ReadWriteMany" ]
  resources:
    requests:
      storage: {{ .Values.volume.size }}
  storageClassName: {{ .Values.volume.class_name }}
  volumeName: pvc-cc0d533d-eaaf-4a8f-81a0-3e11d9720944
{{- end }}

(For netapp nfs) Detach all volumes that ingress, mariadb, and rabbitmq are using.

remove_volumeattachment script.:

$ chmod +x remove_volumeattachment.sh
$ ./remove_volumeattachment.sh
ingress-0 pvc: pvc-28fb7360-cef8-4d64-9edb-08636f6c2e6b
ingress-0 volumeattachment id: csi-63fe882673981ce326e6eb7fbf1da194300e1bed9a30a2a5f7366172f5247887
volumeattachment.storage.k8s.io "csi-63fe882673981ce326e6eb7fbf1da194300e1bed9a30a2a5f7366172f5247887" deleted
...
mariadb-0 pvc: pvc-9b95c69f-2e8e-47ec-ad79-2bc1ca5c0dde
mariadb-0 volumeattachment id: csi-1d8d1834daf738654f4f79f587745f9c5469a3f7c0329b235b368f8b46f5c529
volumeattachment.storage.k8s.io "csi-1d8d1834daf738654f4f79f587745f9c5469a3f7c0329b235b368f8b46f5c529" deleted
...
rabbitmq-2 pvc: pvc-c201cc59-63fb-4cb2-8149-bfafe91d0d14
rabbitmq-2 volumeattachment id: csi-b52b24c2da084ea89f816b2d9bb9417836937784ef9c65d0c36292b3302288bf
volumeattachment.storage.k8s.io "csi-b52b24c2da084ea89f816b2d9bb9417836937784ef9c65d0c36292b3302288bf" deleted

(For netapp nfs) Unmount /var/lib/nova/instances in every compute node if netapp NFS is the default storage backend.:

$ sudo umount /var/lib/nova/instances

Upgrade openstack infra components

Run burrito playbook with system tag.:

$ ./run.sh burrito --tags=system

Upgrade ingress (v1.1.3 -> v1.8.2).:

$ ./scripts/burrito.sh install ingress

Check if ingress pods are running and ready.:

root@btx-0:/# k get po -l application=ingress,component=server
NAME        READY   STATUS    RESTARTS   AGE
ingress-0   1/1     Running   0          2m
ingress-1   1/1     Running   0          96s
ingress-2   1/1     Running   0          57s

Upgrade mariadb (10.6.16 -> 10.11.7).:

$ ./scripts/burrito.sh install mariadb

Check if mariadb server pods are running and ready.:

root@btx-0:/# k get po -l application=mariadb,component=server
NAME               READY   STATUS    RESTARTS   AGE
mariadb-server-0   1/1     Running   0          4m53s
mariadb-server-1   1/1     Running   0          4m53s
mariadb-server-2   1/1     Running   0          4m53s

Upgrade rabbitmq (3.11.28 -> 3.12.11).:

$ ./scripts/burrito.sh install rabbitmq

Check if rabbitmq server pods are running and ready.:

root@btx-0:/# k get po -l application=rabbitmq,component=server
NAME                  READY   STATUS    RESTARTS      AGE
rabbitmq-rabbitmq-0   1/1     Running   0             4m25s
rabbitmq-rabbitmq-1   1/1     Running   0             4m25s
rabbitmq-rabbitmq-2   1/1     Running   1 (60s ago)   4m25s

Upgrade memcached (1.6.17 -> 1.6.22).:

$ ./scripts/burrito.sh install memcached

Check if memcached pod is running and ready.:

root@btx-0:/# k get po -l application=memcached
NAME                                   READY   STATUS    RESTARTS   AGE
memcached-memcached-5cd8fc7496-cpx5m   1/1     Running   0          22s

Upgrade libvirt (6.0.0 -> 8.0.0).:

$ ./scripts/burrito.sh install libvirt

Check if libvirt pods are running and ready.:

root@btx-0:/# k get po -l application=libvirt
NAME                            READY   STATUS    RESTARTS   AGE
libvirt-libvirt-default-rq85p   1/1     Running   0          74s

Upgrade openstack components

Upgrade keystone (21.0.2.dev3 -> 23.0.2.dev10).:

$ ./scripts/burrito.sh install keystone

Check if keystone pods are running and ready.:

root@btx-0:/# k get po -l application=keystone,component=api
NAME                            READY   STATUS    RESTARTS   AGE
keystone-api-786c7866d6-crzsz   1/1     Running   0          2m9s
keystone-api-786c7866d6-gzc9g   1/1     Running   0          2m9s

If glance-api is a type of statefulset (i.e. if the default storage is not netapp), uninstall glance first.:

$ k get po -l application=glance,component=api
glance-api-0                 2/2     Running     0          72m
glance-api-1                 2/2     Running     0          72m
$ ./scripts/burrito.sh uninstall glance

Upgrade glance (24.2.2.dev1 -> 26.0.1.dev2).:

$ ./scripts/burrito.sh install glance

One of glance-api pods can be stuck in init state (netapp nfs only).:

root@btx-0:/# k get po -l application=glance,component=api
NAME                          READY   STATUS     RESTARTS   AGE
glance-api-596d7cf6c8-5srzp   2/2     Running    0          7m22s
glance-api-596d7cf6c8-z4lnr   0/2     Init:0/2   0          7m22s

Just delete the Init-state pod. Then it will be running okay.

Check if glance pods are running and ready.

If glance-api is a type of deployment (i.e. if the default storage is netapp):

root@btx-0:/# k delete po glance-api-596d7cf6c8-z4lnr
root@btx-0:/# k get po -l application=glance,component=api
NAME                          READY   STATUS    RESTARTS   AGE
glance-api-596d7cf6c8-5srzp   2/2     Running   0          9m23s
glance-api-596d7cf6c8-g7s7c   2/2     Running   0          94s

If glance-api is a type of statefulset (i.e. if the default storage is not netapp):

root@btx-0:/# k get po -l application=glance,component=api
NAME           READY   STATUS    RESTARTS   AGE
glance-api-0   2/2     Running   0          25m
glance-api-1   2/2     Running   0          25m

Upgrade placement (7.0.1 -> 9.0.1).:

$ ./scripts/burrito.sh install placement

Check if placement pods are running and ready.:

root@btx-0:/# k get po -l application=placement,component=api
NAME                             READY   STATUS    RESTARTS   AGE
placement-api-6d7948d754-9lfrg   1/1     Running   0          2m41s

Upgrade neutron (20.5.1.dev28 -> 22.1.1.dev110).:

$ ./scripts/burrito.sh install neutron

Some pods(neutron-{dhcp,l3,meata}-agent) will be in Init state. That’s okay since they are waiting for nova pods.

(For Aster 1.4.3 and earlier) Before upgrading nova, we need to preserve images_type variable value. It is qcow2 in Aster 1.4.3 and earlier while it is rbd in Begonia.

Warning

The images_type in Aster 1.4.4 is already rbd so you do not need to change the value if your burrito system is Aster 1.4.4.

Set images_type to qcow2 in roles/burrito.openstack/templates/osh/nova.yml.j2.:

libvirt:
  volume_use_multipath: {{ enable_multipath }}
  connection_uri: "qemu+tcp://127.0.0.1/system"
  images_type: "qcow2"

Upgrade nova (25.3.1.dev1 -> 27.2.1.dev19).:

$ ./scripts/burrito.sh install nova

Check if nova pods are running and ready.:

root@btx-0:/# k get po -l application=nova,component=compute
NAME                         READY   STATUS    RESTARTS        AGE
nova-compute-default-2vn7r   1/1     Running   0               10m

If cinder-volume is a type of statefulset (i.e. if the default storage is not netapp), uninstall cinder first.:

$ k get po -l application=cinder,component=volume
cinder-volume-0                   1/1     Running     0          3h16m
cinder-volume-1                   1/1     Running     0          3h16m
$ ./scripts/burrito.sh uninstall cinder

Upgrade cinder (20.3.3.dev2 -> 22.1.2.dev10).:

$ ./scripts/burrito.sh install cinder

Check if cinder pods are running and ready.

If cinder-volume is a type of deployment (i.e. if the default storage is netapp):

root@btx-0:/# k get po -l application=cinder,component=volume
NAME                            READY   STATUS     RESTARTS   AGE
cinder-volume-86cf778db-2469r   1/1     Running    0          6m2s
cinder-volume-86cf778db-mhg7b   0/1     Init:0/4   0          6m2s

One of cinder-volume is stuck at Init state. This is the same problem as glance-api. Just delete the cinder-volume and it will be running.:

root@btx-0:/# k delete po cinder-volume-86cf778db-mhg7b
root@btx-0:/# k get po -l application=cinder,component=volume
NAME                            READY   STATUS    RESTARTS   AGE
cinder-volume-86cf778db-2469r   1/1     Running   0          7m33s
cinder-volume-86cf778db-pxf5v   1/1     Running   0          28s

If cinder-volume is a type of statefulset (i.e. if the default storage is not netapp):

root@btx-0:/# k get po -l application=cinder,component=volume
NAME              READY   STATUS    RESTARTS   AGE
cinder-volume-0   1/1     Running   0          5m20s
cinder-volume-1   1/1     Running   0          5m20s

Upgrade horizon (22.1.0 -> 23.1.1.dev14).:

$ ./scripts/burrito.sh install horizon

Check if horizon pods are running and ready.:

root@btx-0:/# k get po -l application=horizon,component=server
NAME                      READY   STATUS    RESTARTS   AGE
horizon-bfdcc7bd6-4n655   1/1     Running   0          84s
horizon-bfdcc7bd6-w9wqh   0/1     Running   0          84s

Sometimes One of horizon pods could not be ready. Just delete it and it’s all good.:

root@btx-0:/# k get po -l application=horizon,component=server
NAME                      READY   STATUS    RESTARTS   AGE
horizon-bfdcc7bd6-4n655   1/1     Running   0          5m59s
horizon-bfdcc7bd6-pg589   1/1     Running   0          76s

Last but not least, upgrade btx (1.2.3 -> 2.0.1).:

$ ./scripts/burrito.sh uninstall btx
$ ./scripts/burrito.sh install btx

Check btx is running.:

$ kubectl get po btx-0 -n openstack
NAME    READY   STATUS    RESTARTS   AGE
btx-0   1/1     Running   0          79s

Go to btx shell and check openstack services.:

$ bts
root@btx-0:/# o compute service list
root@btx-0:/# o volume service list
root@btx-0:/# o network agent list

If everything is okay, start the previously stopped VM instances.:

root@btx-0:/# o server start <VM_NAME> [<VM_NAME> ...]

OpenStack upgrade is Done!!!

Upgrade Ceph

We will upgrade Burrito Aster (ceph quincy 17.2.7) to Burrito Begonia (ceph reef 18.2.1).

Here is the example ceph node table

Host

Role

IP address

aster-storage1

mon,mgr,osd,rgw

192.168.24.116

aster-storage2

mon,mgr,osd,rgw

192.168.24.117

aster-storage3

mon,mgr,osd,rgw

192.168.24.118

  • ceph public/cluster network: 192.168.24.0/24

  • osd devices on each osd node: /dev/sdb, /dev/sdc, /dev/sdd

Preparation

Install cephadm on each ceph node.:

$ sudo dnf -y install cephadm

Register the local registry on each ceph node.:

$ cat <<EOF|sudo tee /etc/containers/registries.conf.d/999-local-registry.conf
[[registry]]
location = "192.168.21.110:5000"
insecure = true
EOF

Prepare each ceph node for use by cephadm.:

$ sudo cephadm prepare-host
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/bin/podman) version 4.6.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

Set the default container image in ceph configuration.:

aster-storage1$ sudo ceph config set global container_image \
                    192.168.21.110:5000/ceph/ceph:v18.2.1

Adoption process

Run the following command on all ceph nodes to migrate ceph configuration to the cluster’s central config database.:

$ sudo ceph config assimilate-conf -i /etc/ceph/ceph.conf

Adopt each Monitor on each monitor node.:

aster-storage1$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mon.aster-storage1
Pulling container image 192.168.21.110:5000/ceph/ceph:v18.2.1...
Stopping old systemd unit ceph-mon@aster-storage1...
Disabling old systemd unit ceph-mon@aster-storage1...
Moving data...
Chowning content...
Moving logs...
Creating new units...
aster-storage2$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mon.aster-storage2
aster-storage3$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mon.aster-storage3

Adopt each manager on each manager node.:

aster-storage1$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mgr.aster-storage1
aster-storage2$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mgr.aster-storage2
aster-storage3$ sudo cephadm \
                --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                adopt --style legacy --name mgr.aster-storage3

Enable cephadm orchestration:

aster-storage1$ sudo ceph mgr module enable cephadm
aster-storage1$ sudo ceph orch set backend cephadm

Generate an SSH key for cephadm:

aster-storage1$ sudo ceph cephadm generate-key
aster-storage1$ sudo ceph cephadm get-pub-key |sudo tee /etc/ceph/ceph.pub

Configure ceph ssh user to clex user.:

aster-storage1$ sudo ceph cephadm set-user clex
ssh user set to clex. sudo will be used

Install the cephadm SSH key on each host in the cluster:

aster-storage1$ ssh-copy-id -f -i /etc/ceph/ceph.pub clex@aster-storage1
aster-storage1$ ssh-copy-id -f -i /etc/ceph/ceph.pub clex@aster-storage2
aster-storage1$ ssh-copy-id -f -i /etc/ceph/ceph.pub clex@aster-storage3

Tell cephadm which hosts to manage.:

aster-storage1$ sudo ceph orch host add aster-storage1 192.168.24.116 _admin
aster-storage1$ sudo ceph orch host add aster-storage2 192.168.24.117 _admin
aster-storage1$ sudo ceph orch host add aster-storage3 192.168.24.118 _admin

Verify that the adopted monitor and manager daemons are running:

$ sudo ceph orch ps
NAME                HOST            PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
mgr.aster-storage1  aster-storage1         running (7m)    48s ago    -     604M        -  18.2.1   5be31c24972a  04fe0b65df40
mgr.aster-storage2  aster-storage2         running (7m)    37s ago    -     561M        -  18.2.1   5be31c24972a  c6025cccbe80
mgr.aster-storage3  aster-storage3         running (7m)    25s ago    -     561M        -  18.2.1   5be31c24972a  0421411066f9
mon.aster-storage1  aster-storage1         running (9m)    48s ago    -     108M    2048M  18.2.1   5be31c24972a  860cc72485ea
mon.aster-storage2  aster-storage2         running (8m)    37s ago    -     102M    2048M  18.2.1   5be31c24972a  626eb0f7aa47
mon.aster-storage3  aster-storage3         running (8m)    25s ago    -    99.4M    2048M  18.2.1   5be31c24972a  76092ce8ec1c

Get each osd name on each node.:

$ sudo ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                STATUS  REWEIGHT  PRI-AFF
-1         0.43918  root default
-3         0.14639      host aster-storage1
 1    hdd  0.04880          osd.1                up   1.00000  1.00000
 5    hdd  0.04880          osd.5                up   1.00000  1.00000
 7    hdd  0.04880          osd.7                up   1.00000  1.00000
-7         0.14639      host aster-storage2
 0    hdd  0.04880          osd.0                up   1.00000  1.00000
 4    hdd  0.04880          osd.4                up   1.00000  1.00000
 8    hdd  0.04880          osd.8                up   1.00000  1.00000
-5         0.14639      host aster-storage3
 2    hdd  0.04880          osd.2                up   1.00000  1.00000
 3    hdd  0.04880          osd.3                up   1.00000  1.00000
 6    hdd  0.04880          osd.6                up   1.00000  1.00000

Adopt all OSDs in the cluster.:

aster-storage1$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.1
aster-storage1$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.5
aster-storage1$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.7
aster-storage2$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.0
aster-storage2$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.4
aster-storage2$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.8
aster-storage3$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.2
aster-storage3$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.3
aster-storage3$ sudo cephadm \
                    --image 192.168.21.110:5000/ceph/ceph:v18.2.1 \
                    adopt --style legacy --name osd.6

Wait a minute while all adoptions are done by cephadm.

Check all mon, mgr, and osd daemons are running. Never proceed if there is a problem with this check.:

aster-storage1$ sudo ceph orch ps
NAME                HOST            PORTS  STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
mgr.aster-storage1  aster-storage1         running (15m)     3m ago  13m     562M        -  18.2.1   5be31c24972a  1f61f2142f30
mgr.aster-storage2  aster-storage2         running (15m)     2m ago  12m     564M        -  18.2.1   5be31c24972a  c5f5e7b394ee
mgr.aster-storage3  aster-storage3         running (15m)     2m ago  12m     615M        -  18.2.1   5be31c24972a  0218e3f2efed
mon.aster-storage1  aster-storage1         running (16m)     3m ago  13m     109M    2048M  18.2.1   5be31c24972a  d28f9dac2bb9
mon.aster-storage2  aster-storage2         running (16m)     2m ago  12m     105M    2048M  18.2.1   5be31c24972a  ea8790060a7c
mon.aster-storage3  aster-storage3         running (15m)     2m ago  12m     107M    2048M  18.2.1   5be31c24972a  6b0c70e79a38
osd.0               aster-storage1         running (10m)     3m ago    -     127M    4096M  18.2.1   5be31c24972a  6de3a1807f0c
osd.1               aster-storage2         running (9m)      2m ago    -     128M    4096M  18.2.1   5be31c24972a  c96016cc7eb4
osd.2               aster-storage3         running (8m)      2m ago    -     124M    4096M  18.2.1   5be31c24972a  cb75f73f4dd8
osd.3               aster-storage1         running (10m)     3m ago    -    97.0M    4096M  18.2.1   5be31c24972a  9360a39117a5
osd.4               aster-storage2         running (9m)      2m ago    -    92.1M    4096M  18.2.1   5be31c24972a  b77b6a3274c4
osd.5               aster-storage3         running (8m)      2m ago    -    87.0M    4096M  18.2.1   5be31c24972a  4172a2a0f02c
osd.6               aster-storage1         running (9m)      3m ago    -    89.4M    4096M  18.2.1   5be31c24972a  35a70ae40ea4
osd.7               aster-storage3         running (8m)      2m ago    -    92.0M    4096M  18.2.1   5be31c24972a  3b585827aad5
osd.8               aster-storage2         running (9m)      2m ago    -    90.8M    4096M  18.2.1   5be31c24972a  092c67a861f4

Disable ceph crash service on each node.:

$ sudo systemctl disable --now ceph-crash.service

Disable RGW daemon on each RGW node.:

$ sudo systemctl disable --now ceph-radosgw.target
$ sudo rm -fr /var/lib/ceph/radosgw/ceph-*

Now you will create each service specification file. Edit hostname, networks, and device names for your environment.

Create crash.yml and apply it.:

$ cat <<EOF > crash.yml
service_type: crash
placement:
  host_pattern: '*'
EOF
$ sudo ceph orch apply -i crash.yml

Check crash containers are running on all ceph nodes.:

aster-storage1$ sudo ceph orch ls crash
NAME   PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
crash             3/3  12s ago    21s  *

Create mon.yml and apply it.:

$ cat <<EOF > mon.yml
service_type: mon
placement:
  hosts:
    - aster-storage1
    - aster-storage2
    - aster-storage3
networks:
  - 192.168.24.0/24
unmanaged: false
EOF
$ sudo ceph orch apply -i mon.yml

Check mon containers are running on all monitor nodes.:

$ sudo ceph orch ls mon
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mon              3/3  82s ago    6s   aster-storage1;aster-storage2;aster-storage3

Create mgr.yml and apply it.:

$ cat <<EOF > mgr.yml
service_type: mgr
placement:
  hosts:
    - aster-storage1
    - aster-storage2
    - aster-storage3
networks:
  - 192.168.24.0/24
unmanaged: false
EOF
$ sudo ceph orch apply -i mgr.yml

Check mgr containers are running on all mgr nodes.:

$ sudo ceph orch ls mgr
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr              3/3  2m ago     7s   aster-storage1;aster-storage2;aster-storage3

Create osd.yml and apply it.:

$ cat <<EOF > osd.yml
service_type: osd
placement:
  hosts:
    - aster-storage1
    - aster-storage2
    - aster-storage3
data_devices:
  paths: [{'path': '/dev/sdb'}, {'path': '/dev/sdc'}, {'path': '/dev/sdd'}]
unmanaged: false
EOF
$ sudo ceph orch apply -i osd.yml

Check osd services are running on all osd nodes.:

$ sudo ceph orch ls osd
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd                9  3m ago     4s   aster-storage1;aster-storage2;aster-storage3

Create rgw.yml and apply it.:

$ cat <<EOF > rgw.yml
service_type: rgw
service_id: default_rgw
placement:
  count_per_host: 1
  hosts:
    - aster-storage1
    - aster-storage2
    - aster-storage3
networks:
  - 192.168.24.0/24
spec:
  rgw_frontend_type: "beast"
  rgw_frontend_port: 7480
EOF
$ sudo ceph orch apply -i rgw.yml

Check

Check ceph health status.:

$ sudo ceph -s
  cluster:
    id:     ba907ff3-fbe7-4c77-8cf3-11e4fc79410a
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum aster-storage1,aster-storage2,aster-storage3 (age 26m)
    mgr: aster-storage3(active, since 24m), standbys: aster-storage1, aster-storage2
    osd: 9 osds: 9 up (since 18m), 9 in (since 34m)
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    pools:   9 pools, 257 pgs
    objects: 198 objects, 584 KiB
    usage:   327 MiB used, 450 GiB / 450 GiB avail
    pgs:     257 active+clean

Check ceph versions.:

$ sudo ceph versions
{
    "mon": {
        "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)": 3
    },
    "mgr": {
        "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)": 3
    },
    "osd": {
        "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)": 9
    },
    "rgw": {
        "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)": 3
    },
    "overall": {
        "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)": 18
    }
}

Last but not least, upgrade ceph client package on all client nodes.:

$ sudo dnf upgrade ceph-common --allowerasing

Ceph upgrade is done!