Version Control

A version control system (VCS) is a software tool that allows you to manage a tree of source code files and keep track of who changes what, when, and why. Version control repositories store the whole history of changes so that any past version of the file tree can be recovered. A repository is therefore a place where you can manage a particular tree of files with careful accounting. When collaborating on code with multiple people, using a VCS is a must.

But can a VCS be pressed into use to manage data as well as code? In principle, yes: a dataset is just another collection of files. But data can be much larger than code and thereby cripple the performance of the VCS.

Fortunately, a modern VCS running on modern hardware can be decently performant. Moreover, extensions have been invented that allow VCSs to handle large files efficiently. It therefore makes sense to use a VCS as the basis for a code and data management approach.