What's needed to deploy ONAP

...

Openstack API is not present on Internet and thus all calls must be made via a jumphost (rebond.opnfv.fr)

Specificities of ONAP gating on Azure

As Azure has no OpenStack APIs, a small openstack instance using devstack (using DevStack Automatic Installation) is created near each worker.

Gating

View file

name	ONAP-gerrit-2-gitlab.pdf
height	250

View file

name	Azure_ONAP_Use.pptx
height	250

Gating is built on top of "automatic deployment" seen before.

As for daily deployments, two chains in chained ci are created per gating environment (2 gating environment today):

Infrastructure deployment (Virtual Machines + Kubernetes + Platform services + Dedicated OpenStack)
ONAP deployment and test

One of the difference is that first one will not trigger the second one.

Infrastructure deployment chain is meant to be performed once in a while (after ~100 days, artifacts are too old in gitlab and it must be reinstalled)

ONAP deployment and test chain is meant to be performed anytime a gate is ready to be launched.

As we have a limited number of platform and potentially a bigger number of gates to be performed, a queue system needs to be put in front.

At the time of creation of this gating system, no "out of the box" queue system was found (or understood, we never understood how to use zuul for example)

So the decision was made to create 4 μservices using a MQTT broker named mosquitto as messenging system:

Gerrit 2 MQTT : it will create topics / message for every event sent by Gerrit (via SSH)
MQTT 2 Gerrit : it will send comments (optionally with score) to a specific Gerrit review when a message is sent in a specific topic
Chained CI MQTT Trigger (master mode) : will listen to message on specific topics and queue them when they belongs to a wanted topic. Will resend them when a worker ask for a job
Chained CI MQTT Trigger (worker mode) : when free, will listen to message on specific topics and launch a gate (if elected) when receiving one. Will ask for Job every xx seconds when free

Some details are given in the but this is how it's done in the two "main" cases:

Workers are free

A new patchset is created on a watched repo (OOM for example)
Gerrit2MQTT create a message on /onap/oom/patchset-created
Chained CI MQTT Trigger Master reads the message and put it in internal queue
Worker is free and propose to use
Master will acknowledge and remove the message from the queue
Worker will start a chained ci and wait for completion. According to the completion status, it will retrieve failed jobs and abstract messages
Worker will send them to gerrit notification topic
MQTT 2 Gerrit will see the message, retrieve Gerrit number and Patchset number and upload the message

Workers are not free

A new patchset is created on a watched repo (OOM for example)
Gerrit2MQTT create a message on /onap/oom/patchset-created
Chained CI MQTT Trigger Master reads the message and put it in internal queue
Later, a worker is free and send a message to its master to announce it can take a job
Master dequeues the oldest message and resend it
Worker proposes to use
Master acknowledges and removes the message from the queue
Worker starts a chained ci and wait for completion. According to the completion status, it retrieves failed jobs and abstract messages
Worker sends abstract and failed job list gerrit notification topic
MQTT 2 Gerrit will see the message, retrieve Gerrit number and Patchset number and upload the message
Worker announces it's free

Current deployments

All Gating μservices are deployed on Azure

...

TBDONAP "gating" kubernetes (alongside with a nexus)

Each gating system has a Chained CI MQTT Trigger worker μs.

One Chained CI MQTT Trigger master is created (we can have several that would monitor different repos / have different workers)

Maintenance work on gating

Access to gating systems

You need to have given your ssh key to one of the admins

then, put in your .ssh/config for access to gating systems:

Code Block

language	text

Host rebond.francecentral.cloudapp.azure.com
  User cloud

Host azure*.onap.eu *.integration.onap.eu
  User cloud
  StrictHostKeyChecking no
  CheckHostIP no
  UserKnownHostsFile /dev/null
  ProxyJump rebond.francecentral.cloudapp.azure.com

# Networks used in Azure cloud
Host 192.168.64.* 192.168.65.* 192.168.53.* 192.168.66.* 192.168.67.*
  ProxyJump rebond.francecentral.cloudapp.azure.com
  StrictHostKeyChecking no
  CheckHostIP no
  UserKnownHostsFile /dev/null
  User cloud

Access to weeklies/dailies in Orange can be done by adding this to your .ssh/config (if granted):

Code Block

language	text

Host rebond.opnfv.fr
 User user

Host master*.onap.eu *.daily.onap.eu *.internal.onap.eu *-weekly.onap.eu istanbul.onap.eu
  User debian
  StrictHostKeyChecking no
  CheckHostIP no
  UserKnownHostsFile /dev/null
  ProxyJump rebond.opnfv.fr

Full filesystem

filesystem of jumphost is full and therefore no tests can be launched

check (on jumphost):

Code Block

language	bash

df -h

remediation: clean docker

Code Block

language	bash

docker system prune -a

Full OpenStack

Clean on OpenStack has not been perfomed and we cannot start new tests on gate

check (on jumphost):

Code Block

language	bash

openstack --os-cloud admin stack list

it should be blank or at least with very recent creation time

remediation: clean openstack

Code Block

language	bash

openstack --os-cloud admin stack delete UUIDs_OF_STACKS

Lost integration images

ONAP images are not anymore on nexus3.onap.org

check: verify that images are present on nexus3

remediation: create a ticket

Too old system

Gate is quite old and cannot work again

check (on jumphost): uptime on jumphost is more than 100 days and gates are behaving weirdly

Code Block

language	bash

uptime

remediation : reinstall gate

Gating reinstallation

To reinstall ONAP Gating instance two pipelines are used. Both are saved as "Scheduled" (but disabled) on Pipeline Schedules · Orange-OpenSource / lfn / ci_cd / chained-ci · GitLab repo. The schedules are:

ONAP Gating Azure 3 - to recreate Gating 3,
ONAP Gating Azure 4 - to recreate Gating 4.

It's needed only to run these pipelines (if user is allowed) and wait for finish. It's also required to disable gating system before reinstallation. To do it it's needed to login on Gating bastion (rebond.francecentral.cloudapp.azure.com) and scale one of the required deployments running on "onap-gating" kubernetes namespace:

chained-ci-mqtt-trigger-worker-7 - Gating 3 deployment,
chained-ci-mqtt-trigger-worker-8 - Gating 4 deployment.

To scale it down (disable it) call

Code Block
$ kubectl -n onap-gating scale deployment/<deployment-you-want-to-scale-down> --replicas=0

After successful recreation of gating lab bring it back to work using

Code Block
$ kubectl -n onap-gating scale deployment/<deployment-you-want-to-scale-down> --replicas=1

command.

To log in gating system you need to call (being on bastion [rebond.francecentral.cloudapp.azure.com])

Gating 3

Code Block
$ ssh azure3.onap.eu

Gating 4

Code Block
$ ssh azure4.onap.eu

Certificate issues

cert-manager is responsible for handling certificates (issued by Let's Encrypt). In case of issues with certificates (like outdated ones) start with cert-manager logs analysis.

Up to now two issues occurred. After transferring ownership of onap.eu, cert-manager was unable to issue new certificates due to DNS challenge failing. This was solved by changing challenge method to DNS. It can be done in Issuer resource that is responsible for requesting new certificates from Let's Encrypt. After changes solver section looks like this:

Code Block

language	yml

solvers:
- http01:
    ingress:
      class:  nginx

Another issue that occurred was caused by two ingresses that are responsible for different subdomains using the same TLS secret. In this case solution was very simple - changing name of secret in ingress. After that cert-manager will automatically request for new certificate from Let's Encrypt and save it under new name. In order to make it work Ingress also needs following annotations to be present (in metadata section):

Code Block

language	yml

metadata:
  annotations:
    kubernetes.io/ingress.class: "nginx"    
    cert-manager.io/issuer: "{{ name_of_responsible_issuer }}"

Obviously {{ name_of_responsible_issuer }} should be changed to appropriate name if Issuer resource.

Version	Old Version 3	New Version Current
Changes made by	Sylvain Desbureaux	Maciej Wereski
Saved on	Mar 08, 2022	Dec 14, 2022

Content Comparison

Versions Compared

Key

Specificities of ONAP gating on Azure

Gating

Workers are free

Workers are not free

Current deployments

Maintenance work on gating

Access to gating systems

Full filesystem

Full OpenStack

Lost integration images

Too old system

Gating reinstallation

Certificate issues