Documentation
Reusability and reproducibility are supported by useful documentation. Metadata and in-code comments are limited. Human-readable documentation can provide the missing context and give detailed guidance.
When managing code or data with Git, at a minimum include a small README text
file at the root of the Git repository tree with a brief synopsis and a mention
of the license. When the repository is hosted on GitHub or GitLab, using
Markdown — as a README.md
at
the root of the repository — or one of the other markup formats supported by
GitHub/GitLab, will provide richer documentation that renders as a webpage with
styling and links when browsing the repository via the GitHub/GitLab web
interface, yet still can be read as a text file after cloning the repository for
offline use. For an introduction see the quickstart for writing on GitHub
and the description of GitLab README and index files.
You can include several such markup files in a repository and link them together
to create quite elaborate documentation that is co-managed with your code or
data. See for example this Markdown README of a repository
that includes multiple Markdown files to provide elaborate documentation
webpages via GitHub.
When the documentation is an ad-hoc effort shared among collaborators, instead using the GitHub wiki or GitLab wiki service associated with a repository should be considered. Beware though that this makes your documentation dependent on either platform, and will no longer include the documentation when the repository is cloned. It is therefore important to still include a README that links to the wiki.
For more elaborate needs, many other options exist. For example, the GitHub pages service allows you to set up a static website with provided-for hosting. Generating a static web site and other documentation formats from in-code comments using tools such as roxygen2 or Sphinx is highly recommended since it makes it easier to keep code and documentation in-sync.
Sphinx can also be used to generate documentation from text-based markup files that can be handled conveniently with Git. Sphinx supports cross reference links and index entries to make the documentation more useful as a reference — see for example the Sphinx-generated documentation you are reading now. Sphinx, though very powerful and customizable, has a high learning curve mostly on account of the reStructuredText markup which is more elaborate than Markdown. In particular when your code is not Python, an alternative documentation website generator might be a better choice for your team. Popular alternatives are MkDocs and Jekyll. See the comparison of documentation generators on Wikipedia for further options.
A good example of documentation is the IIASA MESSAGEix model framework
repository hosted on GitHub. It has a
README.md
that among other things mentions the license, but also holds a
CITATION.cff
file. Rich documentation is provided via a MESSAGEix
documentation website. The documentation for
the Python API is generated
from in-code comments. Other content for the website is managed in a doc
subdirectory of the repository from which the website is
built using Sphinx. Note that the code and documentation are managed together in
a single repository.
Just as with code or data, the evolution of elaborate documentation needs to be managed, and can be managed well with a Git workflow. When documentation might become elaborate — as will be the case for a large model — plan for its maintenance. Information will become outdated. Links to external resources will break. Using a documentation generator such as Sphinx can save a lot of time by automating the creation of updates of documentation in multiple formats. Sphinx will also check links on generation. When using links in markup files in a repository or a wiki, you can use a link checker such as Lychee and automate it via GitHub Actions or otherwise.