Reproducibility and user empowerment via Docker

There are two reasons why I think Docker (and similar things like e.g. vagrant) is an important technology. And this is not just because of its technological aspects but a also due to its social-political implications...


The first one, which is quite obvious and has been already discussed elsewhere is about reproducibility of results, especially in a scientific setting: Analysis environments are getting more and more complex, thus reproducing results becomes itself a complex enterprise. And we are not talking here even of wet-lab reproducibility: we are talking about uber-complex software analysis pipelines. Docker and similar technologies change all that: Now it is becoming trivial to allow others to reproduce your analysis. Sure, in the past you could replicate the methods (assuming that they were fully documented - which I cannot think of a single case where that happened) but the cost of doing that was normally staggering (manually setting up everything). Also, if you were the originator of the work, making things easier for the world to replicate your work was no trivial task either (while there might be many cases where some people want to obfuscate their own work, it should be noted that empowering others to reproduce your work is not easy, even for those intentioned on doing that).

Enter Docker and friends: now it is easy for everyone involved to create/use a certain analysis pipeline. Furthermore the process can be standardized and can be widely used. For instance, in the realm of science it is not difficult to envision a time where scientific journals require that your work has to be easily (automatically) reproducible using a procedure like this.

User empowerment - The big thing?

The deal with cloud computing is well known: you trade convenience for control. Clouds create a dependency relationship between the user and the provider. It happens to be a nasty kind of relationship - The weak part (the user) is at an increased dependency of the strong part (the cloud provider). Things like the disappearance of Google Reader are just the harbinger of what can happen in a cloud environment: Your daily tools (which might be critical to your work - something more serious than a news reader) might disappear over-night. Also do you really trust that your cloud provider does not access your (private?) data?

Now, the convenience of the cloud is undeniable. The interesting question is: can we strike a compromise where the convenience still exists but the user retains a fair share of control?

From my perspective (as a software developer interested in allowing users maximum control and convenience) the last few years have been tough. A compromise has not been easy as there has been a massive amount of development in technologies that are convenient but take control away from the user, a few examples:

With Docker this all changes: you can make a cloud version of your application but you can also make a version that is trivial to download and install and can be run completely under the control of the user. An application with a complex web-interface, SQL database, NoSQL database, MapReduce framework can be trivially installed locally using a docker container. You can say to your users: "download this link, everything is inside and will be configured for you, like magic (very convenient). And if you do not like magic, feel free to inspect the container... its all there for you to see and change..."

If you are a developer that believes in both user-empowerment and convenience, there are no more heart-breaking doubts: you can use modern web/cloud-technologies while knowing that your work might deployed in the cloud OR under total user control.

Now, you might say that the current stage of the Virtualization/Container technology is too geeky. I agree, but is it too difficult to envision a world where a very complex application can be downloaded and run from a simple web click? Indeed, is Docker very far from this?

Patrick O’Kane and Coral Messam in Doctor Faustus. Photograph: Jonathan Keenan
Patrick O’Kane and Coral Messam in Doctor Faustus. Photograph: Jonathan Keenan