CPS-438 Make DB Schema Updates & Data Population More Robust for Kubernetes Environments

CPS-438 Make DB Schema Updates & Data Population More Robust for Kubernetes Environments

References

https://lf-onap.atlassian.net/browse/CPS-438

Background

cps container attempts restart if it takes too long. Liquibase does not release the changelog lock on the data table if it gets restarted. 

Possible Fixes:

Name

Description

Cost/Maintainability

Agnostic of database technology

Separation of NCMP Data

Upgradability/

Rollback

Additional Pros/Cons

One instance does initialisation

Name

Description

Cost/Maintainability

Agnostic of database technology

Separation of NCMP Data

Upgradability/

Rollback

Additional Pros/Cons

One instance does initialisation

1

Liquibase init container

  1. Add init container to oom

    1. Restart issues are only happening with kubernetes restarting liquibase

  2. Add "depends on" container to cps

    1. cps standalone deployment is not effected by kubernetes issue

  1. Double changelog dependency? one in oom, one in cps

  2. Changes to oom and cps projects

Yes including Neo4J

Is possible but needs some refactoring (Labeling)

Good

Good control of database versioning

Yes

2

Change/Add liveness probes?

Liquibase container is restarted by Kubernetes as it does not read a readiness probe within a certain amount of time. We could extend the time limit, change the restart condition etc... 

  1. Timing updates

  2. Changes to oom only

No change

No change

Good

 

Yes

3

Start up probe?

Using the start up probe we can define a worst case start up time which kubernetes will wait for before restarting the container

  1. Timing updates

  2. Changes to oom only

No change

No change

Good

 

Yes

4

Remove Liquibase and replace with similar technology

Replace Liquibase with Flyway

  1. Migrate from Liquibase 

  2. Flyway only does sql changelogs, would have to change from yaml > sql

Flyway does not support NoSQL: Neo4J

Possible

Good

Would solve https://lf-onap.atlassian.net/browse/CPS-963

Might come with same issue as Liquibase as is more of a Kubernetes issue?

Yes

5

Use cps-core API

Trigger code/script triggered by springboot that will persist the required data

  1. Migrate from Liquibase 

Yes including Neo4J

Easy

Requires some code

Would solve https://lf-onap.atlassian.net/browse/CPS-963

Do we need database migration technologies? Rollback etc

Yes

6

Use Session lock instead of transaction lock for Liquibase

https://mvnrepository.com/artifact/com.github.blagerweij/liquibase-sessionlock/1.2.5

Changeloglock will be dropped once session is dropped by Liquibase container

  1. No change

Yes including Neo4J

Is possible but needs some refactoring (Labeling)

Good

  1. Depends on privately owned 3pp

  2. Doesn't stop kubernetes from restarting the container, just makes it so that the changeloglock will not be retained. 

No

7

Execute Liquibase logic in Spring Boot Service Start Up

Solution 3: https://localcoder.org/how-to-solve-liquibase-waiting-for-changelog-lock-problem-in-several-pods-in-ope

https://docs.spring.io/spring-boot/docs/current/reference/html/howto.html#howto.data-initialization.migration-tool.liquibase

Liquibase start up is contained within CPS start up so can avoid kubernetes Liquibase setup

  1. Change to how Liquibase is started

Yes including Neo4J

Is possible but needs some refactoring (Labeling)

Good

Springboot supported solution

Yes

8

Pre stop hook?

Remove Changeloglock before CPS container restart occurs

  1. No change

Yes including Neo4J

Is possible but needs some refactoring (Labeling)

Good

  1. Doesn't stop kubernetes from restarting the container, just makes it so that the changeloglock will not be retained. 

No

9

Move liveness probes before liquibase

Start the liveness probes before Liquibase starts 

  1. No change

Yes including Neo4J

Is possible but needs some refactoring (Labeling)

Good

 

Yes

Resolution

  1. Agreed on implementation of solution #3 resulting in https://lf-onap.atlassian.net/browse/CPS-1011

  2. This involves updates to the oom project pending its upgrade to Kubernetes 1.20+

  3. Also an update of our documentation to demonstrate how to implement this change for Kubernetes. A recommended startup time should be proposed based on Liquibase start times. https://lf-onap.atlassian.net/browse/CPS-1013

  4. Liquibase performance will be reviewed and table above may be referred back to for solutions. https://lf-onap.atlassian.net/browse/CPS-1012