Part 2, presenting a pattern for a CI/CD pipeline implementation using Gitlab CI for Continuous Integration and Continuous Deployment of Python projects distributed as wheel packages using PyPI.
As a general rule in my .gitlab-ci.yml
implementations I try to limit bash
code to glue logic; just creating a bridge between the Gitlab pipeline
“session” and some utility that forms the core of the job. As a rule of thumb
I would say to limit bash code to no more than 5 lines; once you’re over that
then it’s time to rethink and start implementing the scripting in a more
structured and maintainable language such as Python.
There are some people out there who have created unit test frameworks for bash, but the fundamental language limitations of bash remain and in my opinion these limitations create numerous long term problems for maintainability that are best avoided by the use of a more capable language. My personal preference at this time is Python, but there are other choices that may be more suitable for your environment and projects. For example if your projects are based in golang then that may also be a suitable choice for your tooling, or similarly for other languages provided they have plenty of useful libraries for building pipeline related utilities. Just keep in mind that you want as few different languages for developers to learn as possible; although this should not be considered justification for writing your pipeline tooling in C - that, I think, would just be masochism.
One other rule for pipelines is that developers should be able to run the same code as Gitlab CI to test the pipeline jobs for themselves. This is another reason to minimize the lines of code in the job definitions themselves (and thus the bash code). In a Makefile based project it’s easy to set up the pipeline jobs so they just call a make target which is very easy for developers to replicate, although care must be taken with pipeline context in the form of environment variables. In a Python project instead of introducing another Docker image dependency to acquire make, in this project I’ve just use Python script files. Another consideration for developers running pipeline jobs that I have not explored much, is that Gitlab has the capability to run the pipeline jobs locally.
Project structure
It’s probably useful to have a quick outline of the project folder structure so that you have an idea of the file system context the pipeline is operating in.
<project repo root>
├── dist
├── <project source dir>
│ ├── __init__.py
│ └── VERSION
├── LICENSE
├── pyproject.toml
├── README.rst
├── scripts
│ ├── test_coverage.py
│ └── update_version.py
└── tests
└── unit_tests
The dist folder is where flit deposits the wheel packages it builds. The
scripts folder is for pipeline related scripts. The tests folder is where
unit and integration tests are located; separate from the main project source
code. Having the tests folder separate from the main project source code is a
practice I’ve only recently adopted but found it to be quite useful; because
the tests are separate the import syntax naturally references the project
source paths explicitly so you usually end up testing the “public” or
“exported” interface of modules in the project.In a more complex project it
also becomes easier to distinguish and separate unit tests from integration
tests from business tests. Finally packaging becomes easier; when tests are
included in the main source tree in respective tests
folders then flit
automatically packages them in the wheel. The tests are not very useful in the
wheel package and make it a little larger (maybe lots larger if your tests have
data files as well).
Pipeline stages
In the download_3gpp Gitlab CI pipeline I’ve defined four stages pre-package, package, test and publish. In Gitlab CI stages are executed in sequentially in the defined order and if necessary, dependency artifacts can be passed from a preceding stage to a subsequent stage.
The pre-package stage is basically “stuff that needs to be done before packaging” because the packaging (or subsequent stages) depends on it. The test stage must come after the package stage because the test stage consumes the wheel package generate by the package stage. Once all the tests have passed then the publish stage can be executed.
Package release identity management
For Python package release identification I use this code snippet in the main
__init__.py
file
# Docstring required by flit.
"""Command line utility for downloading standards documents from the 3GPP download site."""
import os.path
# The VERSION file is located in the same directory as this __init__.py file.
here = os.path.abspath(os.path.dirname(__file__))
version_file_path = os.path.abspath(os.path.join(here, "VERSION"))
with open(version_file_path) as version_file:
version = version_file.read().strip()
# Internally the code can consume the release id using the __version__ variable.
__version__ = version
The VERSION file is committed to source control with the content “0.0.0”. This means that by default a generated package will not have a formal release identity which is important for reducing ambiguity around which packages are actually formal releases. Developers can generate their own packages manually if needed and the package will acquire the default “0.0.0” release id which clearly communicates that this is not a formal release.
In principal, for any release there must be a single, unique package that
identifies itself as a release package using the semantic version for that
release. Per
PEP-491, for
Python wheel packages the package name takes the form <package name>-<PEP-440 release id>-<language tag>-<abi tag>-<platform tag>
update-version:
artifacts:
paths:
# Points to the path of the VERSION file.
- ${VERSION_PATH}
stage: pre-package
script:
- '${VENV_DIR}/python scripts/update_version.py
${CI_PIPELINE_ID}
${CI_COMMIT_REF_NAME}
>${VERSION_PATH}'
# debug logging
- cat ${CI_PROJECT_NAME}/VERSION
The update_version
job changes the VERSION file content slightly depending
on whether the pipeline is a tagged release, a “non-release” branch (master or
feature branch), or a special “dryrun” release tag. The modified VERSION file
is then propagated throughout the pipeline as a dependency in subsequent stages.
The dryrun tag takes the form <semantic version>.<pipeline number>-dryrun<int>
.
The intention of the dryrun tag is to enable as much as possible, testing the
pipeline code for a release without actually doing the release. In this case
the dryrun tag acquires the semantic version part of the tag and appends the
pipeline number as well to indicate that this is not a formal release. In
addition we will see later that publishing the package does not occur for a
dryrun tag. The artifacts generated by a dryrun pipeline are still available
for download and review via the
artifacts capability
of Gitlab.
For any pipeline that is not a tagged release the default “0.0.0” semantic version is used along with the pipeline number. This clearly indicates that the generated package is not a formal release, but is an artifact resulting from a pipeline.
Here is a link to update_version.py for your review. I’ve tried to keep this script extremely simple so that it is easy to maintain, and contrary to my normal practice it does not have any tests. If it evolves to be any more complicated than it is now then it will need unit tests and probably have to be migrated to it’s repo with packaging.
Package generation
The download_3gpp project presently only generates the wheel package for
installation, but the packaging stage could also include documentation and
other artifacts related to the complete delivery of the project. The
download_3gpp project only has the README.rst
documentation for the moment.
.acquire-project-dependencies:
before_script:
- '${VENV_DIR}/flit install -s'
build-package:
artifacts:
paths:
- ${PACKAGE_PATH}
dependencies:
- update-version
stage: package
script:
- '${VENV_DIR}/flit build'
extends:
- .acquire-project-dependencies
Run tests
In the pipeline, after the package stage is the test stage. In my experience this tends to be the busiest stage for Python projects as various different types of tests are run against the package.
Test package install
The first test is to try installing the generated package to ensure that
flit is doing it’s job correctly with your pyproject.toml
configuration:
run-wheel-install:
dependencies:
- build-package
stage: test
script:
- '${VENV_DIR}/pip install ${PACKAGE_PATH}'
It’s a fairly nominal test, but it is a valid test. Considering how many times now I’ve seen or heard about C/C++ code being distributed that literally doesn’t compile I think it’s important to include even the most nominal tests in your pipeline, after all just compiling your code, even once, would be a pretty nominal test.
Run unit tests (and integration, business tests)
This project doesn’t have integration or business tests so there is only one job. If there were integration and business tests then I would create separate jobs for them so that the tests all run in parallel in the pipeline.
.acquire-project-dependencies:
before_script:
- '${VENV_DIR}/flit install -s'
run-unit-tests:
artifacts:
reports:
junit: junit_report.xml
stage: test
script:
- '${VENV_DIR}/pytest
--junitxml=junit_report.xml
${TEST_DIR}'
extends:
- .acquire-project-dependencies
Test coverage analysis
I prefer to run the various coverage analyses in separate jobs, although they
could arguably be consolidated into a single job. If I recall correct the
run-coverage
cannot be consolidated because the stdout that is captured for
the Gitlab coverage reporting is suppressed when outputting to html or xml, so
having all separate jobs seems cleaner to me.
The run-coverage-threshold-test
job fails the pipeline if the test coverage
is below a defined threshold. My rule of thumb for Python projects is that test
coverage must be above 90% and preferably above 95%. I usually aim for 95% but
depending on the project sometimes that needs to be relaxed a little to 90%.
run-coverage:
coverage: '/TOTAL.*?([0-9]{1,3})%/'
stage: test
script:
- '${VENV_DIR}/pytest
--cov=${CI_PROJECT_NAME}
${TEST_DIR}'
extends:
- .acquire-project-dependencies
run-coverage-threshold-test:
stage: test
script:
- '${VENV_DIR}/pytest
--cov=${CI_PROJECT_NAME}
${TEST_DIR} | ${VENV_DIR}/python scripts/test_coverage.py ${COVERAGE_THRESHOLD}'
after_script:
# log coverage report for debugging
- '${VENV_DIR}/pytest
--cov=${CI_PROJECT_NAME}
${TEST_DIR}'
extends:
- .acquire-project-dependencies
run-xml-coverage-report:
artifacts:
paths:
- dist/coverage.xml
stage: test
script:
- 'mkdir -p dist'
- '${VENV_DIR}/pytest
--cov=${CI_PROJECT_NAME}
--cov-report=xml
${TEST_DIR}'
- 'mv coverage.xml dist/'
extends:
- .acquire-project-dependencies
run-html-coverage-report:
artifacts:
paths:
- dist/htmlcov/*
stage: test
script:
- 'mkdir -p dist'
- '${VENV_DIR}/pytest
--cov=${CI_PROJECT_NAME}
--cov-report=html
${TEST_DIR}'
- 'mv htmlcov dist/'
extends:
- .acquire-project-dependencies
Test code style formatting
Here’s where I’ve run into some problems. The implementations of black and isort sometimes have competing interpretations of code style formatting. If you’re lucky, your code won’t trigger the interaction, but most likely it eventually will.
Adding the isort configuration to pyproject.toml
does help but doesn’t solve
the problem completely in my experience. I’ve ended up having to accept failure
in the isort formatting job which further complicates things because care
must also be taken when running black and isort manually to acquire the
formatting before commit; the black formatting must be run last to ensure
that it’s formatting takes precedence over isort.
A bit frustrating right now, but hopefully it will be resolved in the not too distant future.
run-black-check:
stage: test
script:
- '${VENV_DIR}/black --check ${CI_PROJECT_NAME}'
- '${VENV_DIR}/black --check tests'
run-isort-check:
stage: test
script:
- '[[ ! $(${VENV_DIR}/isort -rc --diff ${CI_PROJECT_NAME}) ]]'
- '[[ ! $(${VENV_DIR}/isort -rc --diff tests) ]]'
after_script:
# log diffs for debugging
- '${VENV_DIR}/isort -rc --diff ${CI_PROJECT_NAME}'
- '${VENV_DIR}/isort -rc --diff tests'
Publish to PyPi
In the past I’ve had the publish job only run on a release tag using the only
property in the job. The problem with this is that the publish code is only
run on release, so if you make changes to the release job in a feature branch
then you won’t really know if the changes are broken until you actually
release. Too many times we’ve had the release meeting, everyone
agrees that we’re ready to release, tag the release and the build fails. This
creates considerable sense of urgency at the eleventh hour (as it should)
when everyone is now expecting the release artifacts to just emerge. This
can be avoided by ensuring that as much as possible of the release code is
run on the feature branch without actually publishing a release.
In the case of this pipeline, the only difference in the publishing job between
a release pipeline and non-release pipeline is the use of the flit publish
command. In my experience it’s pretty easy to get this right; it’s much more
likely that you’ve forgotten to update job dependencies or some other “pipeline
machinery” which will now be exercised in a feature branch before you get to a
release.
publish-wheel:
dependencies:
- build-package
# Need to ensure that flit is looking for the correct version when it publishes
# even though it is not re-packaging the project in this job.
- update-version
stage: publish
script:
# debug logging
- 'ls -l dist/'
# On a strict release tag publish the package to pypi.org
# Otherwise just log a message in the CI job log
- '[[ ${CI_COMMIT_REF_NAME} =~ ${BASH_RELEASE_PATTERN} ]] &&
${VENV_DIR}/flit publish ||
echo "Publish dry run"'
Conclusions
I’ve described a set of Gitlab-CI job patterns for a simple Python project. Hope it’s useful.
For a more complex project that includes docstring documentation and Sphinx documentation then other considerations would include:
- testing docstring style and running docstring test code
- publishing documentation to readthedocs