What's needed to deploy ONAP
...
- Infrastructure deployment (Virtual Machines + Kubernetes + Platform services + Dedicated OpenStack)
- ONAP deployment and test
...
One Chained CI MQTT Trigger master is created (we can have several that would monitor different repos / have different workers)
Maintenance work on gating
Access to gating systems
You need to have given your ssh key to one of the admins
then, put in your .ssh/config for access to gating systems:
Code Block | ||
---|---|---|
| ||
Host rebond.francecentral.cloudapp.azure.com
User cloud
Host azure*.onap.eu *.integration.onap.eu
User cloud
StrictHostKeyChecking no
CheckHostIP no
UserKnownHostsFile /dev/null
ProxyJump rebond.francecentral.cloudapp.azure.com
# Networks used in Azure cloud
Host 192.168.64.* 192.168.65.* 192.168.53.* 192.168.66.* 192.168.67.*
ProxyJump rebond.francecentral.cloudapp.azure.com
StrictHostKeyChecking no
CheckHostIP no
UserKnownHostsFile /dev/null
User cloud |
Access to weeklies/dailies in Orange can be done by adding this to your .ssh/config (if granted):
Code Block | ||
---|---|---|
| ||
Host rebond.opnfv.fr
User user
Host master*.onap.eu *.daily.onap.eu *.internal.onap.eu *-weekly.onap.eu istanbul.onap.eu
User debian
StrictHostKeyChecking no
CheckHostIP no
UserKnownHostsFile /dev/null
ProxyJump rebond.opnfv.fr |
Full filesystem
filesystem of jumphost is full and therefore no tests can be launched
check (on jumphost):
Code Block | ||
---|---|---|
| ||
df -h |
remediation: clean docker
Code Block | ||
---|---|---|
| ||
docker system prune -a |
Full OpenStack
Clean on OpenStack has not been perfomed and we cannot start new tests on gate
check (on jumphost):
Code Block | ||
---|---|---|
| ||
openstack --os-cloud admin stack list |
it should be blank or at least with very recent creation time
remediation: clean openstack
Code Block | ||
---|---|---|
| ||
openstack --os-cloud admin stack delete UUIDs_OF_STACKS |
Lost integration images
ONAP images are not anymore on nexus3.onap.org
check: verify that images are present on nexus3
remediation: create a ticket
Too old system
Gate is quite old and cannot work again
check (on jumphost): uptime on jumphost is more than 100 days and gates are behaving weirdly
Code Block | ||
---|---|---|
| ||
uptime |
remediation : reinstall gate
Gating reinstallation
To reinstall ONAP Gating instance two pipelines are used. Both are saved as "Scheduled" (but disabled) on Pipeline Schedules · Orange-OpenSource / lfn / ci_cd / chained-ci · GitLab repo. The schedules are:
- ONAP Gating Azure 3 - to recreate Gating 3,
- ONAP Gating Azure 4 - to recreate Gating 4.
It's needed only to run these pipelines (if user is allowed) and wait for finish. It's also required to disable gating system before reinstallation. To do it it's needed to login on Gating bastion (rebond.francecentral.cloudapp.azure.com) and scale one of the required deployments running on "onap-gating" kubernetes namespace:
- chained-ci-mqtt-trigger-worker-7 - Gating 3 deployment,
- chained-ci-mqtt-trigger-worker-8 - Gating 4 deployment.
To scale it down (disable it) call
Code Block |
---|
$ kubectl -n onap-gating scale deployment/<deployment-you-want-to-scale-down> --replicas=0 |
After successful recreation of gating lab bring it back to work using
Code Block |
---|
$ kubectl -n onap-gating scale deployment/<deployment-you-want-to-scale-down> --replicas=1 |
command.
To log in gating system you need to call (being on bastion [rebond.francecentral.cloudapp.azure.com])
Gating 3
Code Block |
---|
$ ssh azure3.onap.eu |
Gating 4
Code Block |
---|
$ ssh azure4.onap.eu |
Certificate issues
cert-manager is responsible for handling certificates (issued by Let's Encrypt). In case of issues with certificates (like outdated ones) start with cert-manager logs analysis.
Up to now two issues occurred. After transferring ownership of onap.eu, cert-manager was unable to issue new certificates due to DNS challenge failing. This was solved by changing challenge method to DNS. It can be done in Issuer resource that is responsible for requesting new certificates from Let's Encrypt. After changes solver section looks like this:
Code Block | ||
---|---|---|
| ||
solvers:
- http01:
ingress:
class: nginx |
Another issue that occurred was caused by two ingresses that are responsible for different subdomains using the same TLS secret. In this case solution was very simple - changing name of secret in ingress. After that cert-manager will automatically request for new certificate from Let's Encrypt and save it under new name. In order to make it work Ingress also needs following annotations to be present (in metadata section):
Code Block | ||
---|---|---|
| ||
metadata:
annotations:
kubernetes.io/ingress.class: "nginx"
cert-manager.io/issuer: "{{ name_of_responsible_issuer }}" |
Obviously {{ name_of_responsible_issuer }}
should be changed to appropriate name if Issuer resource.