Cloud Native Deployment

Cloud Native Deployment

Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-SingleVM

ONAP on

deployed by
or RKE managed by
on

 

 

 

 

 

VMs

Microsoft Azure

Google Cloud
Platform

OpenStack

Managed

 

 

Sponsor

Amazon (384G/m - 201801 to 201808) - thank you

@Michael O'Brien (201705-201905)

Amdocs - 201903+
@michael - 201905+

 

Microsoft (201801+)

Amdocs

@Michael O'Brien

@michael 201905+

@Michael O'Brien  2022+

Intel/Windriver

(2017-)

This is a private page under daily continuous modification to keep it relevant as a live reference (don't edit it unless something is really wrong)

https://twitter.com/_mikeobrien | https://www.linkedin.com/in/michaelobrien-developer/
http://wiki.obrienlabs.cloud/display/DEV/Architecture

For general support consult the official documentation at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html and https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_cloud_setup_guide.html and raise DOC JIRA's for any modifications required to them.

This is a private page under daily continuous modification to keep it relevant as a live reference (don't edit it unless something is really wrong)

https://twitter.com/_mikeobrien | https://www.linkedin.com/in/michaelobrien-developer/
http://wiki.obrienlabs.cloud/display/DEV/Architecture

For general support consult the official documentation at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html and https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_cloud_setup_guide.html and raise DOC JIRA's for any modifications required to them.

This page details deployment of ONAP on any environment that supports Kubernetes based containers.

Chat:  http://onap-integration.eastus.cloudapp.azure.com:3000/group/onap-integration

Separate namespaces - to avoid the 1MB configmap limit - or just helm install/delete everything (no helm upgrade)

OOM Helm (un)Deploy plugins

https://kubernetes.slack.com/messages/C09NXKJKA/?

https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf

Deployment Profile

28 pods, 196 pods including vvp without the filebeat sidecars - 20181130 - this number is when all replicaSets and DaemonSets are set to 1 - which is 241 instances in the clustered case 

Docker images currently size up to 75G as of 20181230

After a docker_prepull.sh

After a docker_prepull.sh

/dev/sda1      389255816 77322824 311916608  20% /

 

Type

VMs

Total
RAM
vCores
HD

VM Flavor

K8S/Rancher Idle RAM

Deployed
Total RAM

Deployed
ONAP RAM

Pods

Containers

Max vCores

Idle vCores

HD/VM

HD
NFS
only

IOPS

Date

Cost

branch

Notes
deployment post 75min

Type

VMs

Total
RAM
vCores
HD

VM Flavor

K8S/Rancher Idle RAM

Deployed
Total RAM

Deployed
ONAP RAM

Pods

Containers

Max vCores

Idle vCores

HD/VM

HD
NFS
only

IOPS

Date

Cost

branch

Notes
deployment post 75min

Full Cluster (14 + 1) - recommended

15

224G
112 vC
100G/VM

16G, 8 vCores

C5.2xLarge

 

187Gb

102Gb

28

248 total
241 onap
217 up
0 error
24 config

 

18

6+G master
14 to 50 +G slave(s)

8.1G

 

20181106

$1.20 US/hour

using the spot market

C

 

Single VM (possible - not recommended)

1

432G
64 vC
180G

256G+ 32+ vCores

Rancher: 13G
Kubernetes: 8G
Top: 10G

165Gb (after 24h)

141Gb

28

240 total
233 onap
200 up
6 error
38 config

196 if RS and DS are set to 1

55

22

131G

(including 75G dockers)

n/a

Max: 550/sec
Idle: 220/sec

20181105

20180101

 

C

Tested on 432G/64vCore azure VM - R 1.6.22 K8S 1.11

updated 20190101

Developer 1-n pods

1

16G
4 vC
100G

16/32G 4-16 vCores

 

14Gb

10Gb

3+

 

 

 

120+G

n/a

 

 

 

C

AAI+robot only

Security

The VM should be open with no CIDR rules - but lock down 10249-10255 with RBAC

If you get an issue connecting to your rancher server "dial tcp 127.0.0.1:8880: getsockopt: connection refused" - this is usually security related - this line is the first to fail for example

https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh#n117 

check the server first - either of these - but if the helm version hangs on "server" - the ports have an issue - run with all tcp/udp ports open 0.0.0.0/0 and ::/0 - and lock down the API on 10249-10255 via oauth github security from the rancher console to keep out crypto miners.

Example 15 node (1 master + 14 nodes) OOM Deployment

Rancher 1.6.25, Kubernetes 1.11.5, Docker 17.03, Helm 2.9.1

empty

With ONAP deployed

Throughput and Volumetrics

Cloudwatch CPU Average

Specific to logging - we have a problem on any VM that contains AAI - the logstash container is being saturated there - see the 30+ percent VM - https://lf-onap.atlassian.net/browse/LOG-376

NFS Throughput for /dockerdata-nfs
Cloudwatch Network In Max
Cost

Using the spot market on AWS - we ran a bill of $10 for 8 hours of 15 VM's of C5.2xLarge - (includes EBS but not DNS, EFS/NFS)

 

Details: 20181106:1800 EDT master

ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | wc -l 248 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | wc -l 241 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 217 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '0/|1/2' | wc -l 24 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces -o wide | grep onap | grep -E '0/|1/2' onap onap-aaf-aaf-sms-preload-lvqx9 0/1 Completed 0 4h 10.42.75.71 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-distcenter-ql5f8 0/1 Completed 0 4h 10.42.75.223 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-testca-7rzcd 0/1 Completed 0 4h 10.42.18.37 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-aai-aai-graphadmin-create-db-schema-26pfs 0/1 Completed 0 4h 10.42.14.14 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aai-aai-traversal-update-query-data-qlk7w 0/1 Completed 0 4h 10.42.88.122 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-gmmvj 0/1 Completed 0 4h 10.42.111.99 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-n6fw4 0/1 Error 0 4h 10.42.21.12 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-nc8ww 0/1 Error 0 4h 10.42.109.156 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-xcxds 0/1 Error 0 4h 10.42.152.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-node-6496d8f55b-jfvrm 0/1 Init:0/1 28 4h 10.42.95.32 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-prov-86f79c47f9-tldsp 0/1 CrashLoopBackOff 59 4h 10.42.76.248 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-oof-music-cassandra-job-config-7mb5f 0/1 Completed 0 4h 10.42.38.249 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-oof-oof-has-healthcheck-rpst7 0/1 Completed 0 4h 10.42.241.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-oof-oof-has-onboard-5bd2l 0/1 Completed 0 4h 10.42.205.75 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-qshzn 0/2 Completed 0 4h 10.42.112.46 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-rk4m2 0/2 Init:Error 0 4h 10.42.57.79 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-2vw2q 0/1 Completed 0 4h 10.42.87.181 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-k57lh 0/1 Init:Error 0 4h 10.42.148.79 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-sdc-sdc-cs-config-cassandra-vgnz2 0/1 Completed 0 4h 10.42.111.187 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-sdc-sdc-es-config-elasticsearch-lkb9m 0/1 Completed 0 4h 10.42.20.202 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-onboarding-be-cassandra-init-7zv5j 0/1 Completed 0 4h 10.42.218.1 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-sdc-sdc-wfd-be-workflow-init-q8t7z 0/1 Completed 0 4h 10.42.255.91 ip-172-31-41-30.us-east-2.compute.internal <none> onap onap-vid-vid-galera-config-4f274 0/1 Completed 0 4h 10.42.80.200 ip-172-31-33-223.us-east-2.compute.internal <none> onap onap-vnfsdk-vnfsdk-init-postgres-lf659 0/1 Completed 0 4h 10.42.238.204 ip-172-31-38-194.us-east-2.compute.internal <none> ubuntu@ip-172-31-40-250:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-223.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.222.148.116 18.222.148.116 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-111.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 3.16.37.170 3.16.37.170 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-207.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.225.32.201 18.225.32.201 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-36-163.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.58.189.251 13.58.189.251 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-24.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.224.180.26 18.224.180.26 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-59.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.191.248.14 18.191.248.14 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-194.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.217.45.91 18.217.45.91 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-95.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 52.15.39.21 52.15.39.21 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-39-138.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.224.199.40 18.224.199.40 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-110.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.223.151.180 18.223.151.180 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-229.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.218.252.13 18.218.252.13 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-30.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 3.16.113.3 3.16.113.3 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-42-33.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.59.2.86 13.59.2.86 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-45-152.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.219.56.50 18.219.56.50 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ubuntu@ip-172-31-40-250:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-33-223.us-east-2.compute.internal 852m 10% 13923Mi 90% ip-172-31-34-111.us-east-2.compute.internal 1160m 14% 11643Mi 75% ip-172-31-34-207.us-east-2.compute.internal 1101m 13% 7981Mi 51% ip-172-31-36-163.us-east-2.compute.internal 656m 8% 13377Mi 87% ip-172-31-37-24.us-east-2.compute.internal 401m 5% 8543Mi 55% ip-172-31-37-59.us-east-2.compute.internal 711m 8% 10873Mi 70% ip-172-31-38-194.us-east-2.compute.internal 1136m 14% 8195Mi 53% ip-172-31-38-95.us-east-2.compute.internal 1195m 14% 9127Mi 59% ip-172-31-39-138.us-east-2.compute.internal 296m 3% 10870Mi 70% ip-172-31-41-110.us-east-2.compute.internal 2586m 32% 10950Mi 71% ip-172-31-41-229.us-east-2.compute.internal 159m 1% 9138Mi 59% ip-172-31-41-30.us-east-2.compute.internal 180m 2% 9862Mi 64% ip-172-31-42-33.us-east-2.compute.internal 1573m 19% 6352Mi 41% ip-172-31-45-152.us-east-2.compute.internal 1579m 19% 10633Mi 69%

 

Quickstart

Undercloud Install - Rancher/Kubernetes/Helm/Docker

Ubuntu 16.04 Host VM Configuration

key

value

key

value

 

 


Redhat 7.6 Host VM Configuration

see https://gerrit.onap.org/r/#/c/77850/

key

value

key

value

firewalld off

systemctl disable firewalld

git, make, python

yum install git

yum groupinstall 'Development Tools'

IPv4 forwarding

add to /etc/sysctl.conf
net.ipv4.ip_forward = 1

Networking enabled

sudo vi /etc/sysconfig/network-scripts/ifcfg-ens33 with ONBOOT=yes"

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_kubernetes/index

General Host VM Configuration

Follow https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh

Run the following script on a clean Ubuntu 16.04 or Redhat RHEL 7.x (7.6) VM anywhere - it will provision and register your kubernetes system as a collocated master/host.

Ideally you install a clustered set of hosts away from the master VM - you can do this by deleting the host from the cluster after it is installed below and run the (docker, nfs and the rancher agent docker on each host)/

vm.max_map_count 64 to 256kb limit

The cd.sh script will fix your VM for this limitation first found in https://lf-onap.atlassian.net/browse/LOG-334.  If you don't run the cd.sh script - run the following command manually on each VM so that any elasticsearch container comes up properly - this is a base OS issue.

https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n49

# fix virtual memory for onap-log:elasticsearch under Rancher 1.6.11 - OOM-431 sudo sysctl -w vm.max_map_count=262144

Scripted RKE Kubernetes Cluster install

OOM RKE Kubernetes Deployment

Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - Single VM

Prerequisites

Create a single VM - 256G+

See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS

Create a 0.0.0.0/0 ::/O open security group

Use github to OAUTH authenticate your cluster just after installing it.

Last test 20190305 using 3.0.1-ONAP

ONAP Development#Changemax-podsfromdefault110podlimit

# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # to be save edit/make sure dns resolution is setup to the host ubuntu@ld:~$ sudo cat /etc/hosts 127.0.0.1 cd.onap.info # 1 - configure combined master/host VM - 26 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # to deploy more than 110 pods per vm before the environment (1a7) is created from the kubernetes template (1pt2) - at the waiting 3 min mark - edit it via https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Changemax-podsfromdefault110podlimit --max-pods=900 https://lists.onap.org/g/onap-discuss/topic/oom_110_kubernetes_pod/25213556?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25213556 in "additional kubelet flags" --max-pods=500 # on a 244G R4.8xlarge vm - 26 min later k8s cluster is up NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-5pq45 1/1 Running 0 10m kube-system kube-dns-75c8cb4ccb-7dlsh 3/3 Running 0 10m kube-system kubernetes-dashboard-6f4c8b9cd5-v625c 1/1 Running 0 10m kube-system monitoring-grafana-76f5b489d5-zhrjc 1/1 Running 0 10m kube-system monitoring-influxdb-6fc88bd58d-9494h 1/1 Running 0 10m kube-system tiller-deploy-8b6c5d4fb-52zmt 1/1 Running 0 2m # 3 - secure via github oauth the master - immediately to lock out crypto miners http://cd.onap.info:8880 # check the master cluster ubuntu@ip-172-31-14-89:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-8-245.us-east-2.compute.internal 179m 2% 2494Mi 4% ubuntu@ip-172-31-14-89:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-8-245.us-east-2.compute.internal Ready <none> 13d v1.10.3-rancher1 172.17.0.1 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 # 7 - after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script - a clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp dev.yaml dev0.yaml sudo vi dev0.yaml sudo cp dev0.yaml dev1.yaml sudo cp logging-analytics/deploy/cd.sh . # this does a prepull (-p), clones 3.0.0-ONAP, managed install -f true sudo ./cd.sh -b 3.0.0-ONAP -e onap -p true -n nexus3.onap.org:10001 -f true -s 300 -c true -d true -w false -r false # check around 55 min (on a 256G single node - with 32 vCores) pods/failed/up @ min and ram 161/13/153 @ 50m 107g @55 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 152 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-deployment-handler-5789b89d4b-s6fzw 1/2 Running 0 8m onap dep-service-change-handler-76dcd99f84-fchxd 0/1 ContainerCreating 0 3m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 53m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 9 53m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 53m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 7 53m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 9 53m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 9 53m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 7 52m Note: DCAE has 2 sets of orchestration after the initial k8s orchestration - another at 57 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-prh-6b5c6ff445-pr547 0/2 ContainerCreating 0 2m onap dep-dcae-tca-analytics-7dbd46d5b5-bgrn9 0/2 ContainerCreating 0 1m onap dep-dcae-ves-collector-59d4ff58f7-94rpq 0/2 ContainerCreating 0 1m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 57m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 57m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 57m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 57m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 57m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 Error 10 57m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 57m at 1 hour ubuntu@ip-172-31-20-218:~$ free total used free shared buff/cache available Mem: 251754696 111586672 45000724 193628 95167300 137158588 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | wc -l 164 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 155 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l 8 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-ves-collector-59d4ff58f7-94rpq 1/2 Running 0 4m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 59m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 59m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 59m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 59m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 59m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 10 59m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 59m ubuntu@ip-172-31-20-218:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 125869392 0 125869392 0% /dev tmpfs 25175472 54680 25120792 1% /run /dev/xvda1 121914320 91698036 30199900 76% / tmpfs 125877348 30312 125847036 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 125877348 0 125877348 0% /sys/fs/cgroup tmpfs 25175472 0 25175472 0% /run/user/1000 todo: verify the release is there after a helm install - as the configMap size issue is breaking the release for now


Prerequisites

Create a single VM - 256G+

20181015

ubuntu@a-onap-dmz-nodelete:~$ ./oom_deployment.sh -b master -s att.onap.cloud -e onap -r a_ONAP_CD_master -t _arm_deploy_onap_cd.json -p _arm_deploy_onap_cd_z_parameters.json # register the IP to DNS with route53 for att.onap.info - using this for the ONAP academic summit on the 22nd 13.68.113.104 = att.onap.cloud


Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - clustered

Prerequisites

Add an NFS (EFS on AWS) share

Create a 1 + N cluster

See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS

Create a 0.0.0.0/0 ::/O open security group

Use github to OAUTH authenticate your cluster just after installing it.

Last tested on ld.onap.info 20181029

# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # 1 - configure master - 15 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # on a 64G R4.2xlarge vm - 23 min later k8s cluster is up kubectl get pods --all-namespaces kube-system heapster-76b8cd7b5-g7p6n 1/1 Running 0 8m kube-system kube-dns-5d7b4487c9-jjgvg 3/3 Running 0 8m kube-system kubernetes-dashboard-f9577fffd-qldrw 1/1 Running 0 8m kube-system monitoring-grafana-997796fcf-g6tr7 1/1 Running 0 8m kube-system monitoring-influxdb-56fdcd96b-x2kvd 1/1 Running 0 8m kube-system tiller-deploy-54bcc55dd5-756gn 1/1 Running 0 2m # 2 - secure via github oauth the master - immediately to lock out crypto miners http://ld.onap.info:8880 # 3 - delete the master from the hosts in rancher http://ld.onap.info:8880 # 4 - create NFS share on master https://us-east-2.console.aws.amazon.com/efs/home?region=us-east-2#/filesystems/fs-92xxxxx # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # 5 - create NFS share and register each node - do this for all nodes sudo git clone https://gerrit.onap.org/r/logging-analytics # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # it takes about 1 min to run the script and 1 minute for the etcd and healthcheck containers to go green on each host # check the master cluster kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-19-9.us-east-2.compute.internal 9036m 56% 53266Mi 43% ip-172-31-21-129.us-east-2.compute.internal 6840m 42% 47654Mi 38% ip-172-31-18-85.us-east-2.compute.internal 6334m 39% 49545Mi 40% ip-172-31-26-114.us-east-2.compute.internal 3605m 22% 25816Mi 21% # fix helm on the master after adding nodes to the master - only if the server helm version is less than the client helm version (rancher 1.6.18 does not have this issue) ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"} ubuntu@ip-172-31-14-89:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} # 7a - manual: follow the helm plugin page # https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins sudo git clone https://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm cd oom/kubernetes sudo helm serve & sudo make all sudo make onap sudo helm deploy onap local/onap --namespace onap fetching local/onap release "onap" deployed release "onap-aaf" deployed release "onap-aai" deployed release "onap-appc" deployed release "onap-clamp" deployed release "onap-cli" deployed release "onap-consul" deployed release "onap-contrib" deployed release "onap-dcaegen2" deployed release "onap-dmaap" deployed release "onap-esr" deployed release "onap-log" deployed release "onap-msb" deployed release "onap-multicloud" deployed release "onap-nbi" deployed release "onap-oof" deployed release "onap-policy" deployed release "onap-pomba" deployed release "onap-portal" deployed release "onap-robot" deployed release "onap-sdc" deployed release "onap-sdnc" deployed release "onap-sniro-emulator" deployed release "onap-so" deployed release "onap-uui" deployed release "onap-vfc" deployed release "onap-vid" deployed release "onap-vnfsdk" deployed # 7b - automated: after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script # clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w true # rerun install - no delete of oom repo sudo ./cd.sh -b master -e onap -c false -d true -w true


Deployment Integrity based on Pod Dependencies

20181213 running 3.0.0-ONAP

Links

https://lf-onap.atlassian.net/browse/LOG-899

https://lf-onap.atlassian.net/browse/LOG-898

https://lf-onap.atlassian.net/browse/OOM-1547

https://lf-onap.atlassian.net/browse/OOM-1543

Patches

                             Windriver openstack heat template 1+13 vms

                             https://gerrit.onap.org/r/#/c/74781/

 

                             docker prepull script – run before cd.sh - https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh

 

                             https://gerrit.onap.org/r/#/c/74780/

                             Not merged with the heat template until the following nexus3 slowdown is addressed

                             https://lists.onap.org/g/onap-discuss/topic/nexus3_slowdown_10x_docker/28789709?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28789709

                             https://jira.onap.org/browse/TSC-79

Base Platform First

Bring up dmaap and aaf first and the rest of the pods in the following order.

Every 2.0s: helm list Fri Dec 14 15:19:49 2018 NAME REVISION UPDATED STATUS CHART NAMESPACE onap 2 Fri Dec 14 15:10:56 2018 DEPLOYED onap-3.0.0 onap onap-aaf 1 Fri Dec 14 15:10:57 2018 DEPLOYED aaf-3.0.0 onap onap-dmaap 2 Fri Dec 14 15:11:00 2018 DEPLOYED dmaap-3.0.0 onap onap onap-aaf-aaf-cm-5c65c9dc55-snhlj 1/1 Running 0 10m onap onap-aaf-aaf-cs-7dff4b9c44-85zg2 1/1 Running 0 10m onap onap-aaf-aaf-fs-ff6779b94-gz682 1/1 Running 0 10m onap onap-aaf-aaf-gui-76cfcc8b74-wn8b8 1/1 Running 0 10m onap onap-aaf-aaf-hello-5d45dd698c-xhc2v 1/1 Running 0 10m onap onap-aaf-aaf-locate-8587d8f4-l4k7v 1/1 Running 0 10m onap onap-aaf-aaf-oauth-d759586f6-bmz2l 1/1 Running 0 10m onap onap-aaf-aaf-service-546f66b756-cjppd 1/1 Running 0 10m onap onap-aaf-aaf-sms-7497c9bfcc-j892g 1/1 Running 0 10m onap onap-aaf-aaf-sms-preload-vhbbd 0/1 Completed 0 10m onap onap-aaf-aaf-sms-quorumclient-0 1/1 Running 0 10m onap onap-aaf-aaf-sms-quorumclient-1 1/1 Running 0 8m onap onap-aaf-aaf-sms-quorumclient-2 1/1 Running 0 6m onap onap-aaf-aaf-sms-vault-0 2/2 Running 1 10m onap onap-aaf-aaf-sshsm-distcenter-27ql7 0/1 Completed 0 10m onap onap-aaf-aaf-sshsm-testca-mw95p 0/1 Completed 0 10m onap onap-dmaap-dbc-pg-0 1/1 Running 0 17m onap onap-dmaap-dbc-pg-1 1/1 Running 0 15m onap onap-dmaap-dbc-pgpool-c5f8498-fn9cn 1/1 Running 0 17m onap onap-dmaap-dbc-pgpool-c5f8498-t9s27 1/1 Running 0 17m onap onap-dmaap-dmaap-bus-controller-59c96d6b8f-9xsxg 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-db-557c66dc9d-gvb9f 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-node-6496d8f55b-ffgfr 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-prov-86f79c47f9-zb8p7 1/1 Running 0 17m onap onap-dmaap-message-router-5fb78875f4-lvsg6 1/1 Running 0 17m onap onap-dmaap-message-router-kafka-7964db7c49-n8prg 1/1 Running 0 17m onap onap-dmaap-message-router-zookeeper-5cdfb67f4c-5w4vw 1/1 Running 0 17m onap-msb 2 Fri Dec 14 15:31:12 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 4m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 4m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 4m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 4m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 4m onap-aai 1 Fri Dec 14 15:30:59 2018 DEPLOYED aai-3.0.0 onap onap onap-aai-aai-54b7bf7779-bfbmg 1/1 Running 0 2m onap onap-aai-aai-babel-6bbbcf5d5c-sp676 2/2 Running 0 13m onap onap-aai-aai-cassandra-0 1/1 Running 0 13m onap onap-aai-aai-cassandra-1 1/1 Running 0 12m onap onap-aai-aai-cassandra-2 1/1 Running 0 9m onap onap-aai-aai-champ-54f7986b6b-wql2b 2/2 Running 0 13m onap onap-aai-aai-data-router-f5f75c9bd-l6ww7 2/2 Running 0 13m onap onap-aai-aai-elasticsearch-c9bf9dbf6-fnj8r 1/1 Running 0 13m onap onap-aai-aai-gizmo-5f8bf54f6f-chg85 2/2 Running 0 13m onap onap-aai-aai-graphadmin-9b956d4c-k9fhk 2/2 Running 0 13m onap onap-aai-aai-graphadmin-create-db-schema-s2nnw 0/1 Completed 0 13m onap onap-aai-aai-modelloader-644b46df55-vt4gk 2/2 Running 0 13m onap onap-aai-aai-resources-745b6b4f5b-rj7lm 2/2 Running 0 13m onap onap-aai-aai-search-data-559b8dbc7f-l6cqq 2/2 Running 0 13m onap onap-aai-aai-sparky-be-75658695f5-z2xv4 2/2 Running 0 13m onap onap-aai-aai-spike-6778948986-7h7br 2/2 Running 0 13m onap onap-aai-aai-traversal-58b97f689f-jlblx 2/2 Running 0 13m onap onap-aai-aai-traversal-update-query-data-7sqt5 0/1 Completed 0 13m onap-msb 5 Fri Dec 14 15:51:42 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 18m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 18m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 18m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 18m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 18m onap-esr 3 Fri Dec 14 15:51:40 2018 DEPLOYED esr-3.0.0 onap onap onap-esr-esr-gui-6c5ccd59d6-6brcx 1/1 Running 0 2m onap onap-esr-esr-server-5f967d4767-ctwp6 2/2 Running 0 2m onap-robot 2 Fri Dec 14 15:51:48 2018 DEPLOYED robot-3.0.0 onap onap onap-robot-robot-ddd948476-n9szh 1/1 Running 0 11m onap-multicloud 1 Fri Dec 14 15:51:43 2018 DEPLOYED multicloud-3.0.0 onap

 

Tiller requires wait states between deployments

There is a patch going into 3.0.1 to delay deployments to not overload tiller 3+ seconds

sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm sudo vi ~/.helm/plugins/deploy/deploy.sh

Use public-cloud.yaml override

Note: your HD/SSD, ram and cpu configuration will drastically affect deployment.  For example if you are cpu starved - the idle state of onap will delay pods as more come in - additionally network bandwidth to pull docker containers will be significant - and PV creation is sensitive to FS throughput/lag.

Some of the internal pod timings are optimized for certain azure deployment

https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml

Optimizing Docker Image Pulls

https://lists.onap.org/g/onap-discuss/topic/onap_helpdesk_65794_nexus3/28794221?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28794221

Verify if the integration docker csv manifest is the truth or the oom repo values.yaml (no override required?)

https://lf-onap.atlassian.net/browse/TSC-86

https://lists.onap.org/g/onap-discuss/topic/oom_onap_deployment/28883609?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28883609

Nexus Proxy

@Soleil, Alain (Deactivated) pointed out the proxy page (was using commercial nexus3) - ONAP OOM Beijing - Hosting docker images locally - I had about 4 jiras on this and forgot about them.

20190121: 

Answered John Lotoski for EKS and his other post on nexus3 proxy failures - looks like an issue with a double proxy between dockerhub - or an issue specific to the dockerhub/registry:2 container - https://lists.onap.org/g/onap-discuss/topic/registry_issue_few_images/29285134?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29285134

 

Running

https://lf-onap.atlassian.net/browse/LOG-355

nexus3.onap.info:5000 - my private AWS nexus3 proxy of nexus3.onap.org:10001

nexus3.onap.cloud:5000 - azure public proxy - filled with casablanca (will retire after Jan 2)

nexus4.onap.cloud:5000 - azure public proxy - filled with master - and later casablanca

nexus3windriver.onap.cloud:5000 - windriver/openstack lab inside the firewall to use only for the lab - access to public is throttled

Nexus3 proxy setup - host
# from a clean ubuntu 16.04 VM # install docker sudo curl https://releases.rancher.com/install-docker/17.03.sh | sh sudo usermod -aG docker ubuntu # install nexus mkdir -p certs openssl req -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key -x509 -days 365 -out certs/domain.crt Common Name (e.g. server FQDN or YOUR name) []:nexus3.onap.info sudo nano /etc/hosts sudo docker run -d --restart=unless-stopped --name registry -v `pwd`/certs:/certs -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 -p 5000:5000 registry:2 sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7f9b0e97eb7f registry:2 "/entrypoint.sh /e..." 8 seconds ago Up 7 seconds 0.0.0.0:5000->5000/tcp registry # test it sudo docker login -u docker -p docker nexus3.onap.info:5000 Login Succeeded # get images from https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=casablanca # use for example the first line onap/aaf/aaf_agent,2.1.8 # or the prepull script in https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 819d6de9e493: Downloading [======================================> ] 770.7 kB/1.012 MB # list sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry 2 2e2f252f3c88 3 months ago 33.3 MB # prepull to cache images on the server - in this case casablanca branch sudo wget https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo chmod 777 docker_prepull.sh # prep - same as client vms - the cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp certs/domain.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 # prepull sudo nohup ./docker_prepull.sh -b casablanca -s nexus3.onap.cloud:5000 &
Nexus3 proxy usage per cluster node

Cert is on https://lf-onap.atlassian.net/browse/TSC-79

# on each host # Cert is on TSC-79 sudo wget https://jira.onap.org/secure/attachment/13127/domain_nexus3_onap_cloud.crt # or if you already have it scp domain_nexus3_onap_cloud.crt ubuntu@ld3.onap.cloud:~/ # to avoid sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Error response from daemon: Get https://nexus3.onap.cloud:5000/v1/users/: x509: certificate signed by unknown authority # cp cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp domain_nexus3_onap_cloud.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Login Succeeded # testing # vm with the image existing - 2 sec ubuntu@ip-172-31-33-46:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # vm with layers existing except for last 5 - 5 sec ubuntu@a-cd-master:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Already exists .. 20 49e90af50c7d: Already exists .... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # clean AWS VM (clean install of docker) - no pulls yet - 45 sec for everything ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 0addb6fece63: Pulling fs layer 78e58219b215: Pulling fs layer eb6959a66df2: Pulling fs layer 321bd3fd2d0e: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 ubuntu@ip-172-31-14-34:~$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE nexus3.onap.cloud:5000/onap/aaf/aaf_agent 2.1.8 090b326a7f11 5 weeks ago 1.14 GB # going to test a same size image directly from the LF - with minimal common layers nexus3.onap.org:10001/onap/testsuite 1.3.2 c4b58baa95e8 3 weeks ago 1.13 GB # 5 min in we are still at 3% - numbers below are a min old ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.org:10001/onap/testsuite:1.3.2 1.3.2: Pulling from onap/testsuite 32802c0cfa4d: Downloading [=============> ] 8.416 MB/32.1 MB da1315cffa03: Download complete fa83472a3562: Download complete f85999a86bef: Download complete 3eca7452fe93: Downloading [=======================> ] 8.517 MB/17.79 MB 9f002f13a564: Downloading [=========================================> ] 8.528 MB/10.24 MB 02682cf43e5c: Waiting .... 754645df4601: Waiting # in 5 min we get 3% 35/1130Mb - which comes out to 162 min for 1.13G for .org as opposed to 45 sec for .info - which is a 200X slowdown - some of this is due to the fact my nexus3.onap.info is on the same VPC as my test VM - testing on openlab # openlab - 2 min 40 sec which is 3.6 times slower - expected than in AWS - (25 min pulls vs 90min in openlab) - this makes nexus.onap.org 60 times slower in openlab than a proxy running from AWS (2 vCore/16G/ssd VM) ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 #pulling smaller from nexus3.onap.info 2 min 20 - for 36Mb = 0.23Mb/sec - extrapolated to 1.13Gb for above is 5022 sec or 83 min - half the rough calculation above ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.org:10001/onap/aaf/sms:3.0.1 3.0.1: Pulling from onap/aaf/sms c67f3896b22c: Pull complete ... 76eeb922b789: Pull complete Digest: sha256:d5b64947edb93848acacaa9820234aa29e58217db9f878886b7bafae00fdb436 Status: Downloaded newer image for nexus3.onap.org:10001/onap/aaf/sms:3.0.1 # conclusion - nexus3.onap.org is experiencing a routing issue from their DC outbound causing a 80-100x slowdown over a proxy nexus3 - since 20181217 - as local jenkins.onap.org builds complete faster # workaround is to use a nexus3 proxy above

 

and adding to values.yaml

global: #repository: nexus3.onap.org:10001 repository: nexus3.onap.cloud:5000 repositoryCred: user: docker password: docker

windriver lab also has a network issue (for example if i pull from nexus3.onap.cloud:5000 (azure) into an aws EC2 instance - 45 sec for 1.1G - If I pull the same in an openlab VM - on the order of 10+ min) - therefore you need a local nexus3 proxy if you are inside the openstack lab - I have registered nexus3windriver.onap.cloud:5000 to a nexus3 proxy in my logging tenant - cert above

Docker Prepull

https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh

using

https://git.onap.org/integration/tree/version-manifest/src/main/resources/docker-manifest.csv?h=casablanca

via

https://gerrit.onap.org/r/#/c/74780/

https://lf-onap.atlassian.net/browse/LOG-905

git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics cd logging-analytics git pull ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics refs/changes/80/74780/1 ubuntu@onap-oom-obrien-rancher-e0:~$ sudo nohup ./docker_prepull.sh & [1] 14488 ubuntu@onap-oom-obrien-rancher-e0:~$ nohup: ignoring input and appending output to 'nohup.out'


POD redeployment/undeploy/deploy

If you need to redeploy a pod due to a job timeout, failure or to pickup a config/code change - delete the /dockerdata-nfs/*-aai for example subdirectory - so that a db restart for example does not run into existing data issues.

sudo chmod -R 777 /dockerdata-nfs sudo rm -rf /dockerdata-nfs/onap-aai

 

Casablanca Deployment Examples

Deploy to 13+1 cluster

Deploy as one with deploy.sh delays and public.cloud.yaml - single 500G server AWS

sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml where dev.yaml is the same as in resources but with all components turned on and IfNotPresent instead of Always

Deploy in sequence with validation on previous pod before proceeding - single 500G server AWS

we are not using the public-cloud.yaml override here - to verify just timing between deploys in this case - each pod waits for the previous to complete so resources are not in contention

see update to 

https://git.onap.org/logging-analytics/tree/deploy/cd.sh

https://gerrit.onap.org/r/#/c/75422

DEPLOY_ORDER_POD_NAME_ARRAY=('robot consul aaf dmaap dcaegen2 msb aai esr multicloud oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba vvp contrib sniro-emulator') # don't count completed pods DEPLOY_NUMBER_PODS_DESIRED_ARRAY=(1 4 13 11 13 5 15 2 6 17 10 12 11 2 8 6 3 18 2 5 5 5 1 11 11 3 1) # account for podd that have varying deploy times or replicaset sizes # don't count the 0/1 completed pods - and skip most of the ResultSet instances except 1 # dcae boostrap is problematic DEPLOY_NUMBER_PODS_PARTIAL_ARRAY=(1 2 11 9 13 5 11 2 6 16 10 12 11 2 8 6 3 18 2 5 5 5 1 9 11 3 1)

Deployment in sequence to Windriver Lab

Note: the Windriver Openstack lab requires that host registration occurs against the private network 10.0.0.0/16 not the 10.12.0.0/16 public network - this is fine in Azure/AWS but not in openstack

The docs will be adjusted https://lf-onap.atlassian.net/browse/OOM-1550

This is bad - public IP based cluster

This is good - private IP based cluster

Openstack/Windriver HEAT template for 13+1 kubernetes cluster

https://jira.onap.org/secure/attachment/13010/logging_openstack_13_16g.yaml

https://lf-onap.atlassian.net/browse/LOG-324

see

https://gerrit.onap.org/r/74781

obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM20181216-13 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | ed6aa689-2e2a-4e75-8868-9db29607c3ba | | stack_name | OOM20181216-13 | | description | Heat template to install OOM components | | creation_time | 2018-12-16T19:42:27Z | | updated_time | 2018-12-16T19:42:27Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack server list +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | 7695cf14-513e-4fea-8b00-6c2a25df85d3 | onap-oom-obrien-rancher-e13 | ACTIVE | oam_onap_RNa3=10.0.0.23, 10.12.7.14 | ubuntu-16-04-cloud-amd64 | | 1b70f179-007c-4975-8e4a-314a57754684 | onap-oom-obrien-rancher-e7 | ACTIVE | oam_onap_RNa3=10.0.0.10, 10.12.7.36 | ubuntu-16-04-cloud-amd64 | | 17c77bd5-0a0a-45ec-a9c7-98022d0f62fe | onap-oom-obrien-rancher-e2 | ACTIVE | oam_onap_RNa3=10.0.0.9, 10.12.6.180 | ubuntu-16-04-cloud-amd64 | | f85e075f-e981-4bf8-af3f-e439b7b72ad2 | onap-oom-obrien-rancher-e9 | ACTIVE | oam_onap_RNa3=10.0.0.6, 10.12.5.136 | ubuntu-16-04-cloud-amd64 | | 58c404d0-8bae-4889-ab0f-6c74461c6b90 | onap-oom-obrien-rancher-e6 | ACTIVE | oam_onap_RNa3=10.0.0.19, 10.12.5.68 | ubuntu-16-04-cloud-amd64 | | b91ff9b4-01fe-4c34-ad66-6ffccc9572c1 | onap-oom-obrien-rancher-e4 | ACTIVE | oam_onap_RNa3=10.0.0.11, 10.12.7.35 | ubuntu-16-04-cloud-amd64 | | d9be8b3d-2ef2-4a00-9752-b935d6dd2dba | onap-oom-obrien-rancher-e0 | ACTIVE | oam_onap_RNa3=10.0.16.1, 10.12.7.13 | ubuntu-16-04-cloud-amd64 | | da0b1be6-ec2b-43e6-bb3f-1f0626dcc88b | onap-oom-obrien-rancher-e1 | ACTIVE | oam_onap_RNa3=10.0.0.16, 10.12.5.10 | ubuntu-16-04-cloud-amd64 | | 0ffec4d0-bd6f-40f9-ab2e-f71aa5b9fbda | onap-oom-obrien-rancher-e5 | ACTIVE | oam_onap_RNa3=10.0.0.7, 10.12.6.248 | ubuntu-16-04-cloud-amd64 | | 125620e0-2aa6-47cf-b422-d4cbb66a7876 | onap-oom-obrien-rancher-e8 | ACTIVE | oam_onap_RNa3=10.0.0.8, 10.12.6.249 | ubuntu-16-04-cloud-amd64 | | 1efe102a-d310-48d2-9190-c442eaec3f80 | onap-oom-obrien-rancher-e12 | ACTIVE | oam_onap_RNa3=10.0.0.5, 10.12.5.167 | ubuntu-16-04-cloud-amd64 | | 7c248d1d-193a-415f-868b-a94939a6e393 | onap-oom-obrien-rancher-e3 | ACTIVE | oam_onap_RNa3=10.0.0.3, 10.12.5.173 | ubuntu-16-04-cloud-amd64 | | 98dc0aa1-e42d-459c-8dde-1a9378aa644d | onap-oom-obrien-rancher-e11 | ACTIVE | oam_onap_RNa3=10.0.0.12, 10.12.6.179 | ubuntu-16-04-cloud-amd64 | | 6799037c-31b5-42bd-aebf-1ce7aa583673 | onap-oom-obrien-rancher-e10 | ACTIVE | oam_onap_RNa3=10.0.0.13, 10.12.6.167 | ubuntu-16-04-cloud-amd64 | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+
# 13+1 vms on openlab available as of 20181216 - running 2 separate clusters # 13+1 all 16g VMs # 4+1 all 32g VMs # master undercloud sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s 10.12.7.13 -e onap # master nfs sudo wget https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh sudo chmod 777 master_nfs_node.sh sudo ./master_nfs_node.sh 10.12.5.10 10.12.6.180 10.12.5.173 10.12.7.35 10.12.6.248 10.12.5.68 10.12.7.36 10.12.6.249 10.12.5.136 10.12.6.167 10.12.6.179 10.12.5.167 10.12.7.14 #sudo ./master_nfs_node.sh 10.12.5.162 10.12.5.198 10.12.5.102 10.12.5.4 # slaves nfs sudo wget https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh sudo chmod 777 slave_nfs_node.sh sudo ./slave_nfs_node.sh 10.12.7.13 #sudo ./slave_nfs_node.sh 10.12.6.125 # test it ubuntu@onap-oom-obrien-rancher-e4:~$ sudo ls /dockerdata-nfs/ test.sh # remove client from master node ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e0 Ready <none> 5m v1.11.5-rancher1 ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-7b48b696fc-2z47t 1/1 Running 0 5m kube-system kube-dns-6655f78c68-gn2ds 3/3 Running 0 5m kube-system kubernetes-dashboard-6f54f7c4b-sfvjc 1/1 Running 0 5m kube-system monitoring-grafana-7877679464-872zv 1/1 Running 0 5m kube-system monitoring-influxdb-64664c6cf5-rs5ms 1/1 Running 0 5m kube-system tiller-deploy-6f4745cbcf-zmsrm 1/1 Running 0 5m # after master removal from hosts - expected no nodes ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes error: the server doesn't have a resource type "nodes" # slaves rancher client - 1st node # register on the private network not the public IP # notice the CATTLE_AGENT sudo docker run -e CATTLE_AGENT_IP="10.0.0.7" --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.11 http://10.0.16.1:8880/v1/scripts/5A5E4F6388A4C0A0F104:1514678400000:9zpsWeGOsKVmWtOtoixAUWjPJs ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 0s v1.11.5-rancher1 # add the other nodes # the 4 node 32g = 128g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 4m v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 3m v1.11.5-rancher1 # the 13 node 16g = 208g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% onap-oom-obrien-rancher-e1 208m 2% 2693Mi 16% onap-oom-obrien-rancher-e10 38m 0% 1083Mi 6% onap-oom-obrien-rancher-e11 36m 0% 1104Mi 6% onap-oom-obrien-rancher-e12 57m 0% 1070Mi 6% onap-oom-obrien-rancher-e13 116m 1% 1017Mi 6% onap-oom-obrien-rancher-e2 73m 0% 1361Mi 8% onap-oom-obrien-rancher-e3 62m 0% 1099Mi 6% onap-oom-obrien-rancher-e4 74m 0% 1370Mi 8% onap-oom-obrien-rancher-e5 37m 0% 1104Mi 6% onap-oom-obrien-rancher-e6 55m 0% 1125Mi 7% onap-oom-obrien-rancher-e7 42m 0% 1102Mi 6% onap-oom-obrien-rancher-e8 53m 0% 1090Mi 6% onap-oom-obrien-rancher-e9 52m 0% 1072Mi 6%
Installing ONAP via cd.sh

The cluster hosting kubernetes is up with 13+1 nodes and 2 network interfaces (the private 10.0.0.0/16 subnet and the 10.12.0.0/16 public subnet)

 

Verify kubernetes hosts are ready

ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e10 Ready <none> 25m v1.11.5-rancher1 onap-oom-obrien-rancher-e11 Ready <none> 20m v1.11.5-rancher1 onap-oom-obrien-rancher-e12 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e13 Ready <none> 1m v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e5 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e6 Ready <none> 46m v1.11.5-rancher1 onap-oom-obrien-rancher-e7 Ready <none> 40m v1.11.5-rancher1 onap-oom-obrien-rancher-e8 Ready <none> 37m v1.11.5-rancher1 onap-oom-obrien-rancher-e9 Ready <none> 26m v1.11.5-rancher1

Openstack parameter overrides

# manually check out 3.0.0-ONAP (script is written for branches like casablanca) sudo git clone -b 3.0.0-ONAP http://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm # fix tiller bug sudo nano ~/.helm/plugins/deploy/deploy.sh # modify dev.yaml with logging-rc file openstack parameters - appc, sdnc and sudo cp logging-analytics/deploy/cd.sh . sudo cp oom/kubernetes/onap/resources/environments/dev.yaml . sudo nano dev.yaml ubuntu@onap-oom-obrien-rancher-0:~/oom/kubernetes/so/resources/config/mso$ echo -n "Whq..jCLj" | openssl aes-128-ecb -e -K `cat encryption.key` -nosalt | xxd -c 256 -p bdaee....c60d3e09 # so server configuration config: openStackUserName: "michael_o_brien" openStackRegion: "RegionOne" openStackKeyStoneUrl: "http://10.12.25.2:5000" openStackServiceTenantName: "service" openStackEncryptedPasswordHere: "bdaee....c60d3e09"

 

Deploy all or a subset of ONAP

# copy dev.yaml to dev0.yaml # bring up all onap in sequence or adjust the list for a subset specific for the vFW - assumes you already cloned oom sudo nohup ./cd.sh -b 3.0.0-ONAP -e onap -p false -n nexus3.onap.org:10001 -f true -s 900 -c false -d true -w false -r false & #sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml

The load is distributed across the cluster even for individual pods like dmaap

Verify the ONAP installation

do Accessingtheportal

 

 

vFW  vFirewall Workarounds

From @Alexis Chiarello currently verifying

20190125 - these are for the heat environment - not the kubernetes one - following Casablanca Stability Testing Instructions currently

From @Alexis Chiarello currently verifying

20190125 - these are for the heat environment - not the kubernetes one - following Casablanca Stability Testing Instructions currently

20181213 - thank you Alexis and @Beejal Shah

Something else I forgot to mention, I did change the heat templates to adapt for our Ubuntu images in our env (to enable additional NICs, eth2 / eth3) and also disable gateway by default on the 2 additional subnets created.

See attached for the modified files.

Cheers,

Alexis.

sudo chmod 777 master_nfs_node.sh 

I reran the vFWCL use case in my re-installed Casablanca lab and here is what I had to manually do post-install :

- fix Robot "robot-eteshare-configmap" config map and adjust values that did not my match my env (onap_private_subnet_id, sec_group, dcae_collector_ip, Ubuntu image names, etc...).
- fix DEFAULT_KEYSTONE entry in identity_services in SO catalog DB for proper identity_url, mso_id, mso_pass; note that those are populated based on parsing the "so-openstack-adapter-app-configmap" config map, however it seems the config map is not populated with the entries from the kubernetes/onap/values.yaml file. It might be something I do wrong when installing, though I followed steps from Wiki.

Other than that, for closed-loop to work, policies need to be pushed :

- make sure to push the policies from pap (PRELOAD_POLICIES=true then run config/push-policies.sh from /tmp/policy-install folder)

(the following are for heat not kubernetes)

For the Robot execution :

- ran "demo.sh <namespace> init"
- ran "ete-k8s.sh [namespace] instantiateDemoVFWCL"

Finally, for Policy to actually parse the proper model ID from the AAI reponse on the named-query, policy-engine needs to be restarted manually; the robot script fails at doing and need to do it manually after the Robot test ends (I did not investigate the robot part, but basically looks like an ssh is done and fails)

docker exec -t -u policy drools bash -c "source /opt/app/policy/etc/profile.d/env.sh; policy stop docker exec -t -u policy drools bash -c "source /opt/app/policy/etc/profile.d/env.sh; policy start

That's it, in my case, with the above the vFWCL closed loop works just fine and able to see APP-C processing the modifyConfig event and change the number of streams using netconf to the packet generator.

Cheers,

Alexis.

 

 

 

Full Entrypoint Install

Two choices, run the single oom_deployment.sh via your ARM, CloudFormation, Heat template wrapper as a oneclick or use the 2 step procedure above.

entrypoint

aws/azure/openstack

Ubuntu 16

rancher install

oom deployment

CD script

 

entrypoint

aws/azure/openstack

Ubuntu 16

rancher install

oom deployment

CD script

 

 

 

 

 

Remove a Deployment

https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57

see also https://lf-onap.atlassian.net/browse/OOM-1463

https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57

required for a couple pods that leave left over resources and for the secondary cloudify out-of-band orchestration in DCAEGEN2 

https://lf-onap.atlassian.net/browse/OOM-1089

https://lf-onap.atlassian.net/browse/DCAEGEN2-1067

https://lf-onap.atlassian.net/browse/DCAEGEN2-1068

sudo helm undeploy $ENVIRON --purge kubectl delete namespace onap sudo helm delete --purge onap kubectl delete pv --all kubectl delete pvc --all kubectl delete secrets --all kubectl delete clusterrolebinding --all sudo rm -rf /dockerdata-nfs/onap-<pod> # or for a single pod kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force

Using ONAP

Accessing the portal

Access the ONAP portal via the 8989 LoadBalancer @Mandeep Khinda merged in for https://lf-onap.atlassian.net/browse/OOM-633 and documented at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_user_guide.html#accessing-the-onap-portal-using-oom-and-a-kubernetes-cluster

ubuntu@a-onap-devopscd:~$ kubectl -n onap get services|grep "portal-app" portal-app LoadBalancer 10.43.145.94 13.68.113.105 8989:30215/TCP,8006:30213/TCP,8010:30214/TCP,8443:30225/TCP 20h In the case of connecting to openlab through the vpn from your mac - you would need the 2nd number - which will be something like 10.0.0.12 - but the public IP corresponding to this private network IP - which only for this case is the e1 instance with 10.12.7.7 as the external routable IP

add the following and prefix with the IP above to your client's /etc/hosts

in this case I am using the public 13... ip (elastic or generated public ip) - AWS in this example 13.68.113.105 portal.api.simpledemo.onap.org 13.68.113.105 vid.api.simpledemo.onap.org 13.68.113.105 sdc.api.fe.simpledemo.onap.org 13.68.113.105 portal-sdk.simpledemo.onap.org 13.68.113.105 policy.api.simpledemo.onap.org 13.68.113.105 aai.api.sparky.simpledemo.onap.org 13.68.113.105 cli.api.simpledemo.onap.org 13.68.113.105 msb.api.discovery.simpledemo.onap.org

launch

http://portal.api.simpledemo.onap.org:8989/ONAPPORTAL/login.htm

login with demo user

Accessing MariaDB portal container

kubectl n onap exec -it dev-portal-portal-db-b8db58679-q9pjq - mysql -D mysql -h localhost -e 'select * from user'