OSDF Image optimization
This documentation will be a space for discussion on the strategy for building OSDF docker images. When we switch to Integration base image, which is an alpine based image, the requirements are taking longer to install. In specific, the Scikit-learn package is taking forever to install.
Possible Options
Increasing the build timeout in the Jenkins
We just need to update the existing Dockerfile to use the integration base image and update other relavant steps and increase the build timeout in the jenkins. This includes all the jobs which builds OSDF image(new CSIT verification jobs, all merge jobs, staging jobs)
Drawbacks
This will increase the development lifecycle time. The contributors have to wait for longer time for the verify jobs.
Creating a seperate base image with requirements.txt and use it as a base image
This requires creating a seperate repository whcih will have the base image Dockerfile and the requirements file. The docker image built from that repo will be used as the base image in the OSDF repo.
Drawbacks
A new base image needs to be built everyrtime we update the requirements file with a new library or a new version for an existing library
Needs substantial effort
Creating a seperate base image only with the requirements taking longer like sk-learn
This requires creating a seperate repository whcih will have the base image Dockerfile and a minimal requirements file. The docker image built from that repo will be used as the base image in the OSDF repo and the OSDF repo will have the comprehensive requirements file. This way we can reduce the occureance on drawback #1 from the previous option.
Drawbacks
Needs substantial effort
Importing code from the upstream library
Importing the code directly into our project retaining the copyright of the upstream component. OSDF is using only a single functionality called Label Encoder from the scikit learn library.
Custom implementation of the functionality
Implementing the custom feature to support the required functionality from the upstream component. OSDF is using only a single functionality called Label Encoder from the scikit-learn library.
Note: The last two solutions can be considered as a last resort if we couldn't find any other alternatives