Documentation

Reusability and reproducibility are supported by useful documentation. Metadata and in-code comments are limited. Human-readable documentation can provide the missing context and give detailed guidance.

When managing code or data with Git, at a minimum include a small README text file at the root of the Git repository tree with a brief synopsis and a mention of the license. When the repository is hosted on GitHub or GitLab, using Markdown — as a README.md at the root of the repository — or one of the other markup formats supported by GitHub/GitLab, will provide richer documentation that renders as a webpage with styling and links when browsing the repository via the GitHub/GitLab web interface, yet still can be read as a text file after cloning the repository for offline use. For an introduction see the quickstart for writing on GitHub and the description of GitLab README and index files. You can include several such markup files in a repository and link them together to create quite elaborate documentation that is co-managed with your code or data. See for example this Markdown README of a repository that includes multiple Markdown files to provide elaborate documentation webpages via GitHub.

When the documentation is an ad-hoc effort shared among collaborators, instead using the GitHub wiki or GitLab wiki service associated with a repository should be considered. Beware though that this makes your documentation dependent on either platform, and will no longer include the documentation when the repository is cloned. It is therefore important to still include a README that links to the wiki.

For more elaborate needs, many other options exist. For example, the GitHub pages service allows you to set up a static website with provided-for hosting. Generating a static web site and other documentation formats from in-code comments using tools such as roxygen2 or Sphinx is highly recommended since it makes it easier to keep code and documentation in-sync.

Sphinx can also be used to generate documentation from text-based markup files that can be handled conveniently with Git. Sphinx supports cross reference links and index entries to make the documentation more useful as a reference — see for example the Sphinx-generated documentation you are reading now. Sphinx, though very powerful and customizable, has a high learning curve mostly on account of the reStructuredText markup which is more elaborate than Markdown. In particular when your code is not Python, an alternative documentation website generator might be a better choice for your team. Popular alternatives are MkDocs and Jekyll. See the comparison of documentation generators on Wikipedia for further options.

A good example of documentation is the IIASA MESSAGEix model framework repository hosted on GitHub. It has a README.md that among other things mentions the license, but also holds a CITATION.cff file. Rich documentation is provided via a MESSAGEix documentation website. The documentation for the Python API is generated from in-code comments. Other content for the website is managed in a doc subdirectory of the repository from which the website is built using Sphinx. Note that the code and documentation are managed together in a single repository.

Just as with code or data, the evolution of elaborate documentation needs to be managed, and can be managed well with a Git workflow. When documentation might become elaborate — as will be the case for a large model — plan for its maintenance. Information will become outdated. Links to external resources will break. Using a documentation generator such as Sphinx can save a lot of time by automating the creation of updates of documentation in multiple formats. Sphinx will also check links on generation. When using links in markup files in a repository or a wiki, you can use a link checker such as Lychee and automate it via GitHub Actions or otherwise.