OOM Guilin release plan



TL;DR;

  • If you don't retrieve your certificates automatically, this is the first thing you must do. See https://github.com/onap/oom/tree/master/kubernetes/nbi as an example. certInit template (a.k.a aafInit) is the strongly preffered way to do that.

  • No more than 1 main process per container. If you have more, it'll be a blocker for updates

  • All logs to STDOUT

  • No direct commit to Frankfurt but master then cherry-pick

  • All upstream components should use an upstream (dockerhub, googlehub) version

  • Common chart version bump

    • Mariadb common chart will be upgraded to 10.4.12

    • PostgreSQL common chart will be upgraded to 12.2

    • Cassandra common chart will be upgraded to 3.11.6

    • MongoDB common chart will be upgraded to 4.2.2

    • ElasticSearch common chart may be upgraded, waiting for SECCOM proposed version

  • AAF is an optional requirement, meaning your component must work without AAF (certificates and RBAC), even on degraded mode

  • MSB is an optional requirement, meaning your component must work without MSB

  • Your component can use http as server and client

  • Password removal will continue on common charts (postgreSQL at least) and start on your component, be prepared to receive so call for help

  • Commit messages must be meaningful and follow the format shown below.

  • Proper crash (if your component fails, it must exit with code > 0, and not wait or exit with code 0)

  • Ingress will be the default deployment option (via Nginx Ingress). No more access via NodePort per default

  • New code will be submitted only if pods + healthchecks + basic tests are OK

  • No root access to any Database from application container

  • No configuration generation using sed in the application container

Certificates

Certificates in Docker are not allowed in Francfurt release per SECCOM recommendation. They won't be allowed either in helm chart starting branching of Frankfurt. This means you must move to automatic retrieval at boot. You'll either have to:

  • use certInit template (formerly known as aafInit template) (see NBI as an example)

  • do it by your own + explain why you can't use certInit template (will be subject to acceptance / not acceptance from both SECCOM and OOM team)

No More than 1 main process per container

Several containers uses several process (main component + database(s)) in the same docker.

Each container should have only one concern. Decoupling applications into multiple containers makes it easier to scale horizontally and reuse containers. For instance, a web application stack might consist of three separate containers, each with its own unique image, to manage the web application, database, and an in-memory cache in a decoupled manner.

Limiting each container to one process is a good rule of thumb, but it is not a hard and fast rule. For example, not only can containers be spawned with an init process, some programs might spawn additional processes of their own accord. For instance, Celery can spawn multiple worker processes, and Apache can create one process per request.

This will not be accetped anymore as it's a very bad pattern (https://docs.docker.com/develop/develop-images/dockerfile_best-practices/).

All logs to STDOUT

As per SECCOM (and testers ;) ) requests, all the logs must go to STDOUT. It'll easier troubleshooting and future centralized log work.

Smart Healthchecks

If your healtchecks just verify that /status is answering 200, there are not useful as readiness/liveness probes can do it. So create meaningful healtchecks or remove them if they can be done via readiness/liveness probes.

Relevant commit messages:

the commit message (on OOM) must follow this form:

[NAME_OF_COMPONENT|DOC|COMMON|GENERIC] Meaningful title (from OOM side) at least one sentence explaining the change done in this patch, cause and consequences and possibly more of course Issue-ID: AS_WE_ARE_FORCED_BUT_MEANINGLESS Change-ID: xxx Sign-off: xxx

Commit message will be the last stuff that will stay with our code so it must clearly explain the changes, the "why" and the consequences. If it change OOM behavior in any way, documentation must be also updated.

Starting Frankfurt branching, merge requests which are not following this pattern will not be merged.

Please read the following pages and follow the guidelines for writing commit message contained therein.

  • http://bit.ly/goodcommitmessages

  • http://who-t.blogspot.com/2009/12/on-commit-messages.html

  • http://dep.debian.net/deps/dep3/

No root access to any Database from application container

If you need to create users, tables etc do it from init container. You can either:

  • use common.mariadb-init template for MariaDB (will be extended for PostgreSQL)

  • use your init container + explain why you can't use the common chart

No configuration generation using sed in the application container

It can be delivered as a config map or if it has to be somehow processed this should happen in the init container.

Kubernetes upgrade to 1.18

In order to move to Kuebernetes 1.18, significant changes will be done on helm charts, using templates we've made. It shouldn't be harmful on your side.

Moving to 1.16+ will allow use to use startupProbe, if you've have slow starting components, please tell us!

Databases upgrades

Databases will move to follow SECCOM proposition, please advise if your component is not compatible with the recommended version.

If you're using a specific (out of common) version of a component listed in the page, you'll either have to:

  • move to the common chart (preferred)

  • deal with the upgrade + explain why you can't use the common chart

All upstream components should use an upstream (dockerhub, googlehub) version

All upstream components directly used (databases, kafka, zookeeper, nginx, haproxy, ...) should use directly the upstream version and not embed it into its own docker

AAF component is optional

AAF is used for certificates and RBAC. It's a non mandatory requirement, meaning your component must be able to not use it for both features (but this is enabled per default). The mode can be obviously degraded (no HTTPs, no rbac but basicAuth) when AAF is disabled.

Please bear in mind that we're working on removing needs for AAF for certificates AND need for AAF for RBAC so this will be tested. (see AAF Removal

MSB component is optional

MSB is interesting when used on non Kubernetes deployment. But the features proposed are an overlap of basic Kubernetes features. Furthermore, it prevents to correctly traces dependencies between ONAP components.

As such, your component must be able to not use MSB

Password removal

Full status: https://wiki.lfnetworking.org/display/LN/2020+April+Technical+Event+Schedule?preview=/34605773/34606179/password_removal_update.pptx

Mariadb password removal is done. The work will continue on postgresql and other common charts. Work will be started on component (Policy is underway), be prepared to receive so call for help.

Pods requests/limits

Every container (including init containers) should have requests/limits with "good" values (not too high, not too low). A check will be done and some values will be changed.

A "simple" rule for setting the limits requests:

  • request should be the mean use of the container

  • limits you should be somewhat higher to expected MAX use.

    • For memory, if you can set MAX values in the underlying program (like in Java), set the max to 1.2 times this value. If not, set 1.5 times of max ever used.

    • For CPU, set 2 or 3 times of max ever used as some CPU may be significantly slower.

Ingress use as default deployment (instead of node ports)

Instead of using NodePorts, we want to push the use by default of Ingress. Significative changes will be made when this is ready (especially removing "NodePort" as default service type from most of ONAP services).

If your component is using "hardcoded" access to other components via NodePort (like Portal), please advise as it won't be possible anymore (but access will still be possible via ClusterIP for internal, Ingress for external).

Using node ports will still possible, by configuring the different services.

Dynamic PVC as default deployment (instead of /dockerdata-nfs)

Instead of pushing everything to /dockerdata-nfs, we'll default using storage class(es). It will still be possible to use /dockerdata-nfs but won't be the default way.

Service Mesh PoC

Wiki page: https://wiki.onap.org/display/DW/ONAP+Security+Model

A PoC using service mesh (Istio), Ingress (component tbd) and RBAC management (keycloak) will be started. For that, we must be able for your components to:

  • use HTTP (and not HTTPs) as Istio handles the TLS part for both server and client part

  • disable RBAC / basicAuth as keycloak will do the RBAC for your component (meaning if a request is coming on your component, it has already been granted by keycloak) for both server and client part

Some call for help may be launched if we don't understand your component behavior. This work will be done starting by "Core" component (DMAAP, AAI, SDC, SO, SDNC) and be extended.