Python has a phenomenal ecosystem of third-party packages.

But, by its very nature, using third-party packages means trusting third-party code. And while it seems there have been very few attempts at security exploits through these packages, how would one try to reduce how much we trust this third party code?

Isolate

Either container approach (eg. with podman or docker ) or firejail.

This isolates the general filesystem, system services, UNIX sockets etc. Anything which is actually needed by the code being used can (and needs to) be made accessible explicitly.

Source code provenance

Many Python packages contain compiled code, which is often downloaded pre-compiled in the “wheel” package. The pre-compiled code is incredibly convenient, but can’t be inspected like Python-language source code. To disable download of pre-compiled code use:

pip install <package name> --no-binary :all:

Be warned though, it can take a surprising amount of time to compile larger packages.

Disable application network access at run time

Unless your application/Python notebook/data processing script positively needs to interact with network resources it is best to disable access to the network. The easiest way is probably to run in a docker container and run with:

docker run --net none <...>

Disable network access during build

If your software stack is of value it seems beneficial to restrict network access before any of untrusted packages come in contact with the proprietary stack. One complicating factor is that in general a Python package build script needs to be executed in order to find out its dependencies. See also pipgrip, pipdeptree etc. But in end…. no easy way to decouple building of random python packages from internet

Ideally a three stage docker build is needed:

  1. Scratch install stage in order to find all dependencies
  2. Download stage
  3. Local build stage

Some commands to look at: https://pip.pypa.io/en/stable/user_guide/#installing-from-local-packages

Read-only access to data

Provide access to data via read-only docker volume mount:

docker create <..> --volume /mydata:/mydata:ro --volume /myresults:/myresults

Snapshot writable directories

If processes have regular write access to /myresults, enable use a snapshot-capable filesystem and say snapper:

sudo snapper -c results create-config /myresults

This creates an organised automatic series of snapshots which can be used for recovery if something unexpected happens.

Need more help?

Services related to Python software packaging: https://bnikolic.co.uk/2023/05/22/python-ssc