A colleague recently highlighted mutmut for helping improving tests. This post is a walk-through of my experience getting started with mutmut on a simple project that I’ve discussed previously

When you are doing Test Driven Development (TDD), as you should be, it is relatively easy to achieve high levels of test coverage but the next question that emerges is, how effective are your tests at testing your implementation? Enter mutmut. mutmut is a tool for analysing the effectiveness or robustness of your tests via mutation testing. It takes your existing test implementation and introduces patterns of changes to your tests or code and analyses whether your tests continue to pass or fail in the face of those changes.

Ultimately you end up with information on what changes were made to your code and the outcome of the change and you have to make a decision on whether changes are needed to your tests to better cover the cases introduced by mutmut.

For this simple download_3GPP project of just four source files I’m going to target entirely eliminating any errors reported by mutmut and see how that goes. For a more complex project it might not be possible to entirely eliminate mutmut warnings, and judgement would be needed to prioritise which issues should be worked on, and whether you neuter the pipeline to prevent failures or flag specific issues to allow a pass.

Getting started

As with many things Pythonic, installation is quite simple using pip in a virtual environment.

pip install mutmut

Running mutmut itself to get an initial run against your tests is also extremely simple. mutmut uses pytest to run the tests, so if you’re not using either pytest or unittest that might be an issue. In addition, in the past I have experienced problems running tests formulated using the unittest framework with pytest, so that could be an issue. In this case the download_3gpp project already uses pytest so no problem.

mutmut run

- Mutation testing starting -

These are the steps:
1. A full test suite run will be made to make sure we
   can run the tests successfully and we know how long
   it takes (to detect infinite loops for example)
2. Mutants will be generated and checked

Results are stored in .mutmut-cache.
Print found mutants with `mutmut results`.

Legend for output:
🎉 Killed mutants.   The goal is for everything to end up in this bucket.
⏰ Timeout.          Test suite took 10 times as long as the baseline so were killed.
🤔 Suspicious.       Tests took a long time, but not long enough to be fatal.
🙁 Survived.         This means your tests needs to be expanded.
🔇 Skipped.          Skipped.

1. Using cached time for baseline tests, to run baseline again delete the cache file

2. Checking mutants
⠴ 108/108  🎉 84  ⏰ 0  🤔 2  🙁 22  🔇 0

From the summary above we see that there are 24 issues that need to be reviewed. I won’t work through all of them here and just highlight one with interesting implications.

Start with mutmut results to highlight the mutation numbers that you need to narrow down on a particular failure.

To apply a mutant on disk:
    mutmut apply <id>

To show a mutant:
    mutmut show <id>


Suspicious 🤔 (2)

---- download_3gpp/options.py (2) ----

20, 22

Survived 🙁 (22)

---- download_3gpp/__init__.py (1) ----

4

---- download_3gpp/download.py (9) ----

31, 37, 69, 93, 97, 101, 103, 105, 107

---- download_3gpp/entrypoint.py (3) ----

5, 8-9

---- download_3gpp/options.py (9) ----

10-13, 16, 19, 21, 23, 25

Untested (3)

---- download_3gpp/download.py (3) ----

77-78, 81

So I took a look at mutant #4 relating to download_3gpp/__init__.py because I know that it is a simple implementation relating to the acquisition of the package release version from a file.

mutmut show 4

--- download_3gpp/__init__.py
+++ download_3gpp/__init__.py
@@ -25,7 +25,7 @@
 version_file_path = os.path.abspath(os.path.join(here, "VERSION"))

 with open(version_file_path) as version_file:
-    version = version_file.read().strip()
+    version = None

 __version__ = version

We see that the mutation forced the version variable to None. The implication being that making that change did not result in any test failures when it should have. It’s interesting because I haven’t accounted for that potential failure mode. In the context of my simple utility and how the version file works do I really need to account for this failure mode? Maybe not, but for the sake of this walk-through let’s consider what is needed to resolve the issue.

I decided to refactor the version file loading into it’s own function that could be tested.

# download_3gpp/__init__.py
from .version import acquire_version

__version__ = acquire_version()
# download_3gpp/version.py
import pathlib


def acquire_version() -> str:
    here = pathlib.Path(__file__).parent.absolute()
    version_file_path = (here / "VERSION").absolute()

    with version_file_path.open(mode="r") as version_file:
        version = version_file.read().strip()

        if not version:
            raise RuntimeError("Version is None")

    return version
# tests/unit_tests/test_version.py
import io

import pytest

from download_3gpp.version import acquire_version


class TestAcquireVersion:
    def test_semantic_version(self, mocker):
        expected_result = "3.1.4"
        mocker.patch(
            "download_3gpp.version.pathlib.Path.open",
            return_value=io.StringIO(expected_result),
        )

        result = acquire_version()

        assert result == expected_result

    def test_failed(self, mocker):
        mocker.patch(
            "download_3gpp.version.pathlib.Path.open", return_value=io.StringIO("")
        )

        with pytest.raises(RuntimeError, match="^Version is None"):
            acquire_version()

And running mutmut again I now have 112 mutations instead of the original 108, BUT I have zero suspicious mutations and one less “survived” mutation. So with this refactoring I’ve actually resolved three issues instead of just one. Nice.

Now the astute among you will notice that my testing above is not complete; a Python PEP-440 compliant version is not a strict Semantic Version so my tests still don’t actually cover the required outcomes properly. I haven’t checked the mutmut results thoroughly yet, but it doesn’t seem that mutmut has picked up on this oversight, and maybe it cannot. On the other hand, just having implemented the tests above has got me thinking about their adequacy so maybe the whole process is working as it should - having people implement better tests and think about making those tests better.

mutmut in CI

Using mutmut in a Continuous Integration environment I found to be slightly more complicated. Since mutmut calls other tools, it needs the full virtual environment context to be activated, rather than just calling it with an explicit path.

There was also a secondary error relating to the click dependency unhappy about the lack of colour in the CI terminal.

RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult https://click.palletsprojects.com/python3/ for mitigation steps.
This system supports the C.UTF-8 locale which is recommended. You might be able to resolve your issue by exporting the following environment variables:
    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

I was able to make this work with the suggested declarations but it looks like the solution might be specific to the configuration of the either the Gitlab CI runner node I happen to be using, or perhaps the docker container context. In either case the solution might be susceptible to failure if the host node configuration is changed. Not great, but at least there is a work-around for now. I’ll cross that bridge if I come to it.

Once the terminal and virtual environment issues have been resolved it is simply a matter of running mutmut run and it exits non-zero when there are any mutation issues and this fails the pipeline. Perfect.

export LC_ALL=C.UTF-8; \
export LANG=C.UTF-8; \
source $(VENV_BINDIR)/activate; \
mutmut run

Conclusions

For such a small number of source files I was surprised by the number of failures found. It took time to resolve them, and careful analysis of the mutation diffs to decide on a solution to try. Am I happier about my code now? Yes indeed, and I’ve learned some things that I think will result in a more defensive programming approach for future projects and hopefully avoid similar problems in the future.

In future I’ll recommend that new projects start with mutmut testing to stay on top of issues. For an existing project be prepared for some (perhaps a lot of) additional work as mutmut uncovers issues that will most likely involve some thought and refactoring to resolve.