The new Beijing release scalability, resiliency, and manageablity are described here. These capabilities apply to the OOM/Kubernetes installation.
Installation
Follow the OOM installation instructions at
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/index.html
TLab specific installation
Below are the notes specific for the TLab environment.
- Find Rancher IP in R_Control-Plane tenant - we use “onap_dev” key and “Ubuntu” user to SSH. For example: “ssh -i onap_dev ubuntu@192.168.31.245”.
- Login as ubuntu, then run "sudo -i" to login as root. The “oom” git repo is in the rancher vm's root directory, under “/root/oom”.
- Edit portal files at /root/oom/kubernetes
“make portal”
“make onap”
Run below command from "/root” folder to do helm upgrade
"helm upgrade -i dev local/onap -f integration-override.yaml"
- Rancher gui is at 192.168.31.245:8080
There is an interactive cli as well in the rancher gui where you can run kubectl commands. Below command will show the list of portal services.
"kubectl get services --namespace=onap | grep portal”
Overview of the running system
Healthchecks
Verify that the portal healtcheck passes by the robot framework:
portal-app Scaling
To scale a new portal-app, set the replica count appropriately.
In our tests below, we are going to work with the OOM portal component in isolation. In this exercise, we scale the portal-app with 2 new replicas.
portal-app Resiliency
A portal-app container failure can be simulated by stopping the portal-app container. The kubernetes liveness operation will detect that the ports are down, inferring there's a problem with the service, and in turn, will restart the container.
Here is where one of the running instances was deleted. And another is starting.
Here the new instance has started and the old one is still terminating
After the deleted instance has terminated the 3 instances are all running normally.
During this time there was no issues with the Portal Website.
Sanity Tests
Check Portal UI and perform sanity tests.
- After 3 instances of Portal are up, edit IP in /etc/hosts file, and logon as demo user on http://portal.api.simpledemo.onap.org:30215/ONAPPORTAL/login.htm
- Then killed 1 instance, I am able to continue on Portal page seamlessly
- Another test on failover timing, when killed all 3 instances, the new Portal processes are coming up within 30 seconds