Manual mount volume
Persistence: manually add the volume part in deployment, NFS mode
Code Block spec: containers: - image: hub.baidubce.com/duanshuaixing/tools:v3 imagePullPolicy: IfNotPresent name: test-volume resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /root/ name: nfs-test dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - name: nfs-test nfs: path: /dockerdata-nfs/test-volume/ server: 10.0.0.7
- Restart the node to check the nfs automount
Restart node and see whether nfs client auto-mounts nfs or not, if not, you should munually mount it.
df -Th |grep nfs
sudo mount $MASTER_IP:/dockerdata-nfs /dockerdata-nfs/
Reinstall One Project
Code Block 1、Delete a module(Take so as an example) helm delete dev-so --purge 2、 If delete failed, you can manually delete pvc、pv、deployment、configmap、statefulset、job 3、Install a module cd oom/kubernests make so meke onap helm install local/so --namespace onap --name dev-so or(under the circumstance that use docker proxy repository) helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001 Use a proxy repository when installing a module or define a mirror policy for a module helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001 --set so.pullPolicy=IfNotPresent 4、Clear /dockerdata-nfs/dev-so file( can mv to /bak directory)
- Helm hasn't deploy parameter
helm has no deploy parameter problem
cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm/
Helm list show no release
cp /root/oom/kubernetes/onap/values.yaml /root/integration-override.yaml
helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap --verbose
- Forced to delete all pods
$(kubectl get pod -n onap |awk '{print $1}') -n onap --grace-period=0 --force - Copy file to pod
Copy from local to pod, problem about specifying the path
This can be temporarily resolved by installing the LRZSZ command, or by executing the docker cp command within the node
Check the port exposed by the pod
Code Block 1. Check the node where pod belongs to kubectl get pod -n onap -o wide|grep uui-server 2、Check the type of pod controller (ReplicaSet corresponds to deployment, statefulset corresponds to statefulset) kubectl -n onap describe pod dev-uui-uui-server-67fc49b6d9-szr7t|grep Controlled 3、Check the deployment corresponding to pod kubectl get svc -n onap |grep uui-server 4、 Access pod according to floating ip where the pod is located and 30000+ port
Check pod through the port
Code Block title Manual mount volume 1、Check the corresponding service according to the exposed port(take 30399 port as an example) kubectl -n onap get svc |grep 30399 2、Check the backend pod ip corresponding with this service kubectl get ep uui-server -n onap 3、Check the corresponding pod and node through pod ip kubectl get pod -n onap -o wide|grep 10.42.67.201 4、cat /etc/hosts |grep node4
- Can't start ansible-serveransible problem is caused by the unanalysis of dns, you can solve the problem by deploy configmap.
kubectl replace -f kube-dns-configmap.yaml - Close the health check to avoid restarting
Delete or comment the following code in deployment or statefulset , it will restart pod after the operation.
Code Block title Manual mount volume livenessProbe: failureThreshold: 3 httpGet: path: /manage/health port: 8084 scheme: HTTP initialDelaySeconds: 600 periodSeconds: 60 successThreshold: 1 timeoutSeconds: 10 readinessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 tcpSocket: port: 8482 timeoutSeconds: 1
- Restart the container in node to check if the new file is missing when the pod health check open/close
a. In the case that the health check is enabled in the deployment, add a test file to the pod and restart the container in the node.
Conclusion: After restarts the container in the node, a new container will be created, and the original test file in the pod will be lost.
b. In the case that the health check is not enabled in the deployment, add the test file in the pod and restart the container in the node.Conclusion: when container restart, stop and start, the data in the pod will not lost
500 error when SDC distribute package
try to restart/reinstall dmaap,before you restart or reinstall, you should delete the dev-dmaap file in nfsIf the error still happen , try to restart/reinstall SDC
- SDC pod can't start
There are dependencies between podsThe pod that ultimately affects the other pods is dev-sdc-sdc-cs
If SDC is redeployed, manually remove /dockerdata-nfs/dev-sdc/
Sdnc-dgbuilder pod can't start
Code Block title Manual mount volume pod state is running state,server does not start # registry module image npm set registry https://registry.npm.taobao.org # node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist # clean cahce: npm cache clean ./start.sh sdnc1.0 && wait & pod state is running state,server does not start # registry module image npm set registry https://registry.npm.taobao.org # node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist # clean cahce npm cache clean ./start.sh sdnc1.0 && wait &
homles don't install automatically
Code Block HOMELES doesn't have a auto-deploy problem manually deploy holmes 1) enter into dcae bootstrap's pod >>kubectl exec -it -n onap dev-dcaegen2-dcae-bootstrap-776cf86d49-mxzq6 /bin/bash 2) Delete holmes components >>cfy uninstall holmes_rules >>cfy deployments delete -f holmes_rules >>cfy blueprints delete holmes_rulescfy blueprints validate k8s-holmes-rules.yaml >>cfy uninstall holmes_engine >>cfy deployments delete -f holmes_engine >>cfy blueprints delete holmes_engine 3) Restall holmes components >>cfy blueprints upload -b holmes_rules /blueprints/k8s-holmes-rules.yaml >>cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules >>cfy executions start -d holmes_rules install >>cfy blueprints upload -b holmes_engine /blueprints/k8s-holmes-engine.yaml >>cfy deployments create -b holmes_engine -i /inputs/k8s-holmes_engine-inputs.yaml holmes_engine >>cfy executions start -d holmes_engine install If restalling holmes fail, following bugs will occur : [root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules Creating new deployment from blueprint holmes_rules... Deployment created. The deployment's id is holmes_rules [root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy executions start -d holmes_rules install Executing workflow install on deployment holmes_rules [timeout=900 seconds] 2018-11-19 10:34:28.961 CFY <holmes_rules> Starting 'install' workflow execution 2018-11-19 10:34:29.541 CFY <holmes_rules> [pgaasvm_p1aax2] Creating node 2018-11-19 10:34:30.550 CFY <holmes_rules> [pgaasvm_p1aax2.create] Sending task 'pgaas.pgaas_plugin.create_database' 2018-11-19 10:34:30.550 CFY <holmes_rules> [pgaasvm_p1aax2.create] Task started 'pgaas.pgaas_plugin.create_database' 2018-11-19 10:34:31.232 LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: create_database(holmes) 2018-11-19 10:34:32.237 LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:32.237 LOG <holmes_rules> [pgaasvm_p1aax2.create] ERROR: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:33.241 LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:32.237 LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last): File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 337, in getclusterinfo with open(fn, 'r') as f: IOError: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:33.241 LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last): File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 441, in create_database info = dbgetinfo(ctx) File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 424, in dbgetinfo ret = getclusterinfo(wfqdn, True, '', '', []) File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 344, in getclusterinfo raiseNonRecoverableError('Cluster must be deployed when using an existing cluster. Check your domain name: fqdn={0}, err={1}'.format(safestr(wfqdn),e)) File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 167, in raiseNonRecoverableError raise NonRecoverableError(msg) NonRecoverableError: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:33.238 CFY <holmes_rules> [pgaasvm_p1aax2.create] Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' 2018-11-19 10:34:33.553 CFY <holmes_rules> 'install' workflow execution failed: RuntimeError: Workflow failed: Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap' Execution of workflow install for deployment holmes_rules failed. [error=Traceback (most recent call last): File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 548, in _remote_workflow_child_thread Need to execute following steps first: 1) Delete holmes components as the step 2 above 2)Reset the Postgres instance a. Uninstall the pg initialization blueprint cfy uninstall pgaas_initdb cfy deployments delete -f pgaas_initdb cfy blueprints delete pgaas_initdb b. Reset the password of PG via psql kubectl exec -it -n onap dev-dcaegen2-dcae-db-0 /bin/sh bash-4.2$ psql postgres=# ALTER ROLE "postgres" WITH PASSWORD 'onapdemodb'; ALTER ROLE postgres-# \q c. Deploy PG initialization blueprint cfy blueprints upload -b pgaas_initdb /blueprints/k8s-pgaas-initdb.yaml cfy deployments create -b pgaas_initdb -i /inputs/k8s-pgaas-initdb-inputs.yaml pgaas_initdb cfy executions start -d pgaas_initdb install 3)Restall holmes as step 3 above
- dmaap restart sequenceStart dmaap, zookeeper, Kafka, msg, router in sequence, each interval is 1 minute
- dev-consul-consul take up a lot of disk space
Problem: Node disk alarm
Troubleshooting: through du-hs * troubleshooting /var/lib/docker/ disk occupancy, the problem is caused by the relatively large disk occupancy under this directory
/var/lib/docker/aufs/diff/b759b23cb79cff6cecdf0e44f7d9a1fb03db018f0c5c48696edcf7e23e2d045b/home/consul/.kube/http-cache/.diskv-temp/
By kubectl -n onap get pod -o wide|grep consul , confirm the pod is dev-consul-consul-6d7675f5b5-sxrmq,and reconfirm according to kubectl exec this pod
Solution:Delete all the document in /home/consul/.kube/http-cache/.diskv-temp/ in pod
Following next is part of the file in the example Can't delete statefulset
If it is kubectl1.8.0,it need to be upgrated to kubecto version 1.9.0 or above
Rollback after image update
Code Block 1、Check the update history of deployment kubectl rollout history deployment nginx-deployment 2、Rollback to the last version kubectl rollout undo deployment nginx-deployment 3、Rollback to the specified version kubectl rollout undo deployment nginx-deployment --to-revision=2
Update image in oom with the docker-manifest.csv under integration repo
Code Block #!/usr/bin/env bash cd $HOME git clone -b casablanca https://gerrit.onap.org/r/integration cp $HOME/integration/version-manifest/src/main/resources/docker-manifest.csv $HOME/oom/ version_new="$HOME/oom/docker-manifest.csv" for line in $(tail -n +2 $version_new); do image=$(echo $line | cut -d , -f 1) tag=$(echo $line | cut -s -d , -f 2) perl -p -i -e "s|$image(:.*$\|$)|$image:$tag|g" $(find $HOME/oom/ -name values.yaml) done
Delete ONAP
Code Block 1、Delete using helm helm delete $(helm list|tail -n +2|awk '{print $1}') --purge & 2、 Delete the rest api objects in onap in k8s kubectl -n onap get deployments|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete deployments kubectl -n onap get statefulset|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete statefulset kubectl -n onap get jobs|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete jobs kubectl -n onap get pvc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete pvc kubectl -n onap get secrets|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete secrets kubectl -n onap get configmaps|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete configmaps kubectl -n onap get svc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete svc 3、Delete data in nfs
Missing svc or yaml configuration file
Code Block 1、Export from the current environment, take cm in multicloud as an example kubectl -n onap get cm dev-vfc-vfc-catalog-logging-configmap --export -o yaml >>dev-vfc-vfc-catalog-logging-configmap.yaml 2、kubectl -n onap applay -f dev-vfc-vfc-catalog-logging-configmap 3、Restart pod The nodeport field specified in the service deployed by the helm is not exported. The reason is being checked.
- Calling multiple k8s api objects at one time and occurs a stuck problem
Problem:Build 60 service at a time,kubectl command stucks during execution
Reason: java progress occur oom problem in pod in rancher server
Temporary solution:mulnipulate api objects in batches
Permanent solution:Modify memory limits in Java parameters
Specify the Xmx number when installing, the default is 4096M, you can increase it to 8192M.
docker run -d --restart=unless-stopped -e JAVA_OPTS="-Xmx8192m" -p 8080:8080 --name rancher_server rancher/server:v$RANCHER_VERSION Filter image version
Filter the image version of oom in kubenetes (take VFC as an example)
grep -r -E 'image|Image:' ~/oom/kubernetes/|awk '{print $2}'|grep onap|grep vfc
Service Port configuration
Code Block ports: - port: 9090 protocol: TCP targetPort: 80 nodePort: 32000 targetPort is the port where docker provides service Port is the port that the service in the cluster providing access NodePort is to specify the exposed port through the nodePort mode. By default, it is randomly assigned. The port range is 30,000-32767
Page Comparison
General
Content
Integrations