Gitlab CI/CD pipeline for open source Python projects, part 2

Part 2, presenting a pattern for a CI/CD pipeline implementation using Gitlab CI for Continuous Integration and Continuous Deployment of Python projects distributed as wheel packages using PyPI.

As a general rule in my .gitlab-ci.yml implementations I try to limit bash code to glue logic; just creating a bridge between the Gitlab pipeline “session” and some utility that forms the core of the job. As a rule of thumb I would say to limit bash code to no more than 5 lines; once you’re over that then it’s time to rethink and start implementing the scripting in a more structured and maintainable language such as Python.

There are some people out there who have created unit test frameworks for bash, but the fundamental language limitations of bash remain and in my opinion these limitations create numerous long term problems for maintainability that are best avoided by the use of a more capable language. My personal preference at this time is Python, but there are other choices that may be more suitable for your environment and projects. For example if your projects are based in golang then that may also be a suitable choice for your tooling, or similarly for other languages provided they have plenty of useful libraries for building pipeline related utilities. Just keep in mind that you want as few different languages for developers to learn as possible; although this should not be considered justification for writing your pipeline tooling in C - that, I think, would just be masochism.

One other rule for pipelines is that developers should be able to run the same code as Gitlab CI to test the pipeline jobs for themselves. This is another reason to minimize the lines of code in the job definitions themselves (and thus the bash code). In a Makefile based project it’s easy to set up the pipeline jobs so they just call a make target which is very easy for developers to replicate, although care must be taken with pipeline context in the form of environment variables. In a Python project instead of introducing another Docker image dependency to acquire make, in this project I’ve just use Python script files. Another consideration for developers running pipeline jobs that I have not explored much, is that Gitlab has the capability to run the pipeline jobs locally.

Project structure

It’s probably useful to have a quick outline of the project folder structure so that you have an idea of the file system context the pipeline is operating in.

<project repo root>
├── dist
├── <project source dir>
│   ├── __init__.py
│   └── VERSION
├── LICENSE
├── pyproject.toml
├── README.rst
├── scripts
│   ├── test_coverage.py
│   └── update_version.py
└── tests
    └── unit_tests

The dist folder is where flit deposits the wheel packages it builds. The scripts folder is for pipeline related scripts. The tests folder is where unit and integration tests are located; separate from the main project source code. Having the tests folder separate from the main project source code is a practice I’ve only recently adopted but found it to be quite useful; because the tests are separate the import syntax naturally references the project source paths explicitly so you usually end up testing the “public” or “exported” interface of modules in the project.In a more complex project it also becomes easier to distinguish and separate unit tests from integration tests from business tests. Finally packaging becomes easier; when tests are included in the main source tree in respective tests folders then flit automatically packages them in the wheel. The tests are not very useful in the wheel package and make it a little larger (maybe lots larger if your tests have data files as well).

Pipeline stages

In the download_3gpp Gitlab CI pipeline I’ve defined four stages pre-package, package, test and publish. In Gitlab CI stages are executed in sequentially in the defined order and if necessary, dependency artifacts can be passed from a preceding stage to a subsequent stage.

The pre-package stage is basically “stuff that needs to be done before packaging” because the packaging (or subsequent stages) depends on it. The test stage must come after the package stage because the test stage consumes the wheel package generate by the package stage. Once all the tests have passed then the publish stage can be executed.

Package release identity management

For Python package release identification I use this code snippet in the main __init__.py file

# Docstring required by flit.
"""Command line utility for downloading standards documents from the 3GPP download site."""

import os.path

# The VERSION file is located in the same directory as this __init__.py file.
here = os.path.abspath(os.path.dirname(__file__))
version_file_path = os.path.abspath(os.path.join(here, "VERSION"))

with open(version_file_path) as version_file:
    version = version_file.read().strip()

# Internally the code can consume the release id using the __version__ variable.
__version__ = version

The VERSION file is committed to source control with the content “0.0.0”. This means that by default a generated package will not have a formal release identity which is important for reducing ambiguity around which packages are actually formal releases. Developers can generate their own packages manually if needed and the package will acquire the default “0.0.0” release id which clearly communicates that this is not a formal release.

In principal, for any release there must be a single, unique package that identifies itself as a release package using the semantic version for that release. Per PEP-491, for Python wheel packages the package name takes the form <package name>-<PEP-440 release id>-<language tag>-<abi tag>-<platform tag>

update-version:
  artifacts:
    paths:
      # Points to the path of the VERSION file.
      - ${VERSION_PATH}
  stage: pre-package

  script:
    - '${VENV_DIR}/python scripts/update_version.py
          ${CI_PIPELINE_ID}
          ${CI_COMMIT_REF_NAME}
          >${VERSION_PATH}'
    # debug logging
    - cat ${CI_PROJECT_NAME}/VERSION

The update_version job changes the VERSION file content slightly depending on whether the pipeline is a tagged release, a “non-release” branch (master or feature branch), or a special “dryrun” release tag. The modified VERSION file is then propagated throughout the pipeline as a dependency in subsequent stages.

The dryrun tag takes the form <semantic version>.<pipeline number>-dryrun<int>. The intention of the dryrun tag is to enable as much as possible, testing the pipeline code for a release without actually doing the release. In this case the dryrun tag acquires the semantic version part of the tag and appends the pipeline number as well to indicate that this is not a formal release. In addition we will see later that publishing the package does not occur for a dryrun tag. The artifacts generated by a dryrun pipeline are still available for download and review via the artifacts capability of Gitlab.

For any pipeline that is not a tagged release the default “0.0.0” semantic version is used along with the pipeline number. This clearly indicates that the generated package is not a formal release, but is an artifact resulting from a pipeline.

Here is a link to update_version.py for your review. I’ve tried to keep this script extremely simple so that it is easy to maintain, and contrary to my normal practice it does not have any tests. If it evolves to be any more complicated than it is now then it will need unit tests and probably have to be migrated to it’s repo with packaging.

Package generation

The download_3gpp project presently only generates the wheel package for installation, but the packaging stage could also include documentation and other artifacts related to the complete delivery of the project. The download_3gpp project only has the README.rst documentation for the moment.

.acquire-project-dependencies:
  before_script:
    - '${VENV_DIR}/flit install -s'

build-package:
  artifacts:
    paths:
      - ${PACKAGE_PATH}
  dependencies:
    - update-version
  stage: package

  script:
    - '${VENV_DIR}/flit build'

  extends:
    - .acquire-project-dependencies

Run tests

In the pipeline, after the package stage is the test stage. In my experience this tends to be the busiest stage for Python projects as various different types of tests are run against the package.

Test package install

The first test is to try installing the generated package to ensure that flit is doing it’s job correctly with your pyproject.toml configuration:

run-wheel-install:
  dependencies:
    - build-package
  stage: test

  script:
    - '${VENV_DIR}/pip install ${PACKAGE_PATH}'

It’s a fairly nominal test, but it is a valid test. Considering how many times now I’ve seen or heard about C/C++ code being distributed that literally doesn’t compile I think it’s important to include even the most nominal tests in your pipeline, after all just compiling your code, even once, would be a pretty nominal test.

Run unit tests (and integration, business tests)

This project doesn’t have integration or business tests so there is only one job. If there were integration and business tests then I would create separate jobs for them so that the tests all run in parallel in the pipeline.

.acquire-project-dependencies:
  before_script:
    - '${VENV_DIR}/flit install -s'

run-unit-tests:
  artifacts:
    reports:
      junit: junit_report.xml
  stage: test

  script:
    - '${VENV_DIR}/pytest
        --junitxml=junit_report.xml
        ${TEST_DIR}'

  extends:
    - .acquire-project-dependencies

Test coverage analysis

I prefer to run the various coverage analyses in separate jobs, although they could arguably be consolidated into a single job. If I recall correct the run-coverage cannot be consolidated because the stdout that is captured for the Gitlab coverage reporting is suppressed when outputting to html or xml, so having all separate jobs seems cleaner to me.

The run-coverage-threshold-test job fails the pipeline if the test coverage is below a defined threshold. My rule of thumb for Python projects is that test coverage must be above 90% and preferably above 95%. I usually aim for 95% but depending on the project sometimes that needs to be relaxed a little to 90%.

run-coverage:
  coverage: '/TOTAL.*?([0-9]{1,3})%/'
  stage: test

  script:
    - '${VENV_DIR}/pytest
        --cov=${CI_PROJECT_NAME}
        ${TEST_DIR}'

  extends:
    - .acquire-project-dependencies


run-coverage-threshold-test:
  stage: test

  script:
    - '${VENV_DIR}/pytest
             --cov=${CI_PROJECT_NAME}
             ${TEST_DIR} | ${VENV_DIR}/python scripts/test_coverage.py ${COVERAGE_THRESHOLD}'
  after_script:
    # log coverage report for debugging
    - '${VENV_DIR}/pytest
        --cov=${CI_PROJECT_NAME}
        ${TEST_DIR}'

  extends:
    - .acquire-project-dependencies

run-xml-coverage-report:
  artifacts:
    paths:
      - dist/coverage.xml
  stage: test

  script:
    - 'mkdir -p dist'
    - '${VENV_DIR}/pytest
        --cov=${CI_PROJECT_NAME}
        --cov-report=xml
        ${TEST_DIR}'
    - 'mv coverage.xml dist/'

  extends:
    - .acquire-project-dependencies


run-html-coverage-report:
  artifacts:
    paths:
      - dist/htmlcov/*
  stage: test

  script:
    - 'mkdir -p dist'
    - '${VENV_DIR}/pytest
         --cov=${CI_PROJECT_NAME}
         --cov-report=html
         ${TEST_DIR}'
    - 'mv htmlcov dist/'

  extends:
    - .acquire-project-dependencies

Test code style formatting

Here’s where I’ve run into some problems. The implementations of black and isort sometimes have competing interpretations of code style formatting. If you’re lucky, your code won’t trigger the interaction, but most likely it eventually will.

Adding the isort configuration to pyproject.toml does help but doesn’t solve the problem completely in my experience. I’ve ended up having to accept failure in the isort formatting job which further complicates things because care must also be taken when running black and isort manually to acquire the formatting before commit; the black formatting must be run last to ensure that it’s formatting takes precedence over isort.

A bit frustrating right now, but hopefully it will be resolved in the not too distant future.

run-black-check:
  stage: test

  script:
    - '${VENV_DIR}/black --check ${CI_PROJECT_NAME}'
    - '${VENV_DIR}/black --check tests'


run-isort-check:
  stage: test

  script:
    - '[[ ! $(${VENV_DIR}/isort -rc --diff ${CI_PROJECT_NAME}) ]]'
    - '[[ ! $(${VENV_DIR}/isort -rc --diff tests) ]]'

  after_script:
    # log diffs for debugging
    - '${VENV_DIR}/isort -rc --diff ${CI_PROJECT_NAME}'
    - '${VENV_DIR}/isort -rc --diff tests'

Publish to PyPi

In the past I’ve had the publish job only run on a release tag using the only property in the job. The problem with this is that the publish code is only run on release, so if you make changes to the release job in a feature branch then you won’t really know if the changes are broken until you actually release. Too many times we’ve had the release meeting, everyone agrees that we’re ready to release, tag the release and the build fails. This creates considerable sense of urgency at the eleventh hour (as it should) when everyone is now expecting the release artifacts to just emerge. This can be avoided by ensuring that as much as possible of the release code is run on the feature branch without actually publishing a release.

In the case of this pipeline, the only difference in the publishing job between a release pipeline and non-release pipeline is the use of the flit publish command. In my experience it’s pretty easy to get this right; it’s much more likely that you’ve forgotten to update job dependencies or some other “pipeline machinery” which will now be exercised in a feature branch before you get to a release.

publish-wheel:
  dependencies:
    - build-package
    # Need to ensure that flit is looking for the correct version when it publishes
    # even though it is not re-packaging the project in this job.
    - update-version
  stage: publish

  script:
    # debug logging
    - 'ls -l dist/'
    # On a strict release tag publish the package to pypi.org
    # Otherwise just log a message in the CI job log
    - '[[ ${CI_COMMIT_REF_NAME} =~ ${BASH_RELEASE_PATTERN} ]] &&
         ${VENV_DIR}/flit publish ||
         echo "Publish dry run"'

Conclusions

I’ve described a set of Gitlab-CI job patterns for a simple Python project. Hope it’s useful.

For a more complex project that includes docstring documentation and Sphinx documentation then other considerations would include:

testing docstring style and running docstring test code
publishing documentation to readthedocs