OSDF Image optimization

This documentation will be a space for discussion on the strategy for building OSDF docker images. When we switch to Integration base image, which is an alpine based image, the requirements are taking longer to install. In specific, the Scikit-learn package is taking forever to install.

Possible Options

Increasing the build timeout in the Jenkins

We just need to update the existing Dockerfile to use the integration base image and update other relavant steps and increase the build timeout in the jenkins. This includes all the jobs which builds OSDF image(new CSIT verification jobs, all merge jobs, staging jobs)

Drawbacks

  1. This will increase the development lifecycle time. The contributors have to wait for longer time for the verify jobs.

Creating a seperate base image with requirements.txt and use it as a base image

This requires creating a seperate repository whcih will have the base image Dockerfile and the requirements file. The docker image built from that repo will be used as the base image in the OSDF repo.

Drawbacks

  1. A new base image needs to be built everyrtime we update the requirements file with a new library or a new version for an existing library

  2. Needs substantial effort

Creating a seperate base image only with the requirements taking longer like sk-learn

This requires creating a seperate repository whcih will have the base image Dockerfile and a minimal requirements file. The docker image built from that repo will be used as the base image in the OSDF repo and the OSDF repo will have the comprehensive requirements file. This way we can reduce the occureance on drawback #1 from the previous option.

Drawbacks

  1. Needs substantial effort

Importing code from the upstream library

Importing the code directly into our project retaining the copyright of the upstream component. OSDF is using only a single functionality called Label Encoder from the scikit learn library.

Custom implementation of the functionality

Implementing the custom feature to support the required functionality from the upstream component. OSDF is using only a single functionality called Label Encoder from the scikit-learn library.



Note: The last two solutions can be considered as a last resort if we couldn't find any other alternatives