Python has a vast number of incredibly useful third-party packages that are usually no further than a ```pip install`` away. This allows doing many things quickly and efficiently but it does create quite a large security surface area: each of this packages may pull in many other packages, some of the which are pre-compiled (i.e., wheels). There is no realistic way of fully securing such a large software supply chain; instead, it is best to isolate python applications which depend on un-audited third-party packages.
The challenge is to create effective isolation while allowing data in and out so that useful computation can be done.
File system isolation
Unless your application is supposed to rustle through all your data, including passwords etc, you should isolate it from all other parts of the file system!
To communicate with the program you can either use a shared directory that is visible by the application, or use explicit staging of input and output data to the program.
Unless your application is using the network, you should disable access to all network resources through an O/S-level (container, firewall) or hypervisor-level (virtual machine) feature.
Installing packages with
pip etc requires access to the relevant
servers: this should be done in a separate run where network is
suitably enabled but no data or source code of your own is
present. Don’t mix writing code/production with ```pip install``!
If you are using Jupyter you will need to enable the jupyter port to access the application from the web-browser. Inevitably this will have some security implications and probably should be avoided in most secure deployments.
There are two tools that make all this quite easy:
Containers (Docker/Singularity/etc). These are now quite standard technologies. One issue is that isolating an application is quite linked to creating it.
The firejail program on linux (https://firejail.wordpress.com/). Allows easy ad-hoc isolation of existing programs on a linux system.