Contributing

All code contributors are listed in the contributors file.

Getting in touch

Any code contribution (bug fixes/tutorials/documentation changes) and feedback is very welcome. Please open a new issue via

Setting up datafold for development

This section describes all steps to set up datafold for code development.

Quick set up

The bash script includes all steps that are detailed below.

# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/

# Recommended: set up virtual environment
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

# Install package and development dependencies
python -m pip install -r requirements-dev.txt

# Install git hooks and code formatting tools
pre-commit install
pre-commit run --all-files

# Optional: run tests with coverage and pytest
coverage run -m pytest datafold/
coverage html -d coverage/
coverage report

# Optional: test if tutorials run without error
pytest tutorials/

# Optional: build documentation (writes to "docs/build/")
python setup.py build_docs

datafold is not available from the conda package manager. If you run Python with Anaconda’s package manager, the recommended way is to set up datafold in a conda environment by using pip.

Also note the official instructions for package management in Anaconda, particularly the subsection on how to install non-conda packages.

# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/

# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip  # use pip from within the conda environment

# Install package and development dependencies
pip install -r requirements-dev.txt

# Install git hooks and code formatting tools
pre-commit install
pre-commit run --all-files

# Optional: run tests with coverage and pytest
coverage run -m pytest datafold/
coverage html -d coverage/
coverage report

# Optional: test if tutorials run without error
pytest tutorials/

# Optional: build documentation (writes to "docs/build/")
python setup.py build_docs

Fork and create merge requests to datafold

Please read and follow the steps of gitlab’s “Project forking workflow”.

Note

We set up a “Continuous Integration” (CI) pipeline. However, the worker (a gitlab-runner) of the datafold repository is not available for forked projects (for background information see here).

After you have created a fork you can clone the repository with:

git clone git@gitlab.com:[NAMESPACE]/datafold.git

(replace [NAMESPACE] accordingly)

Install development dependencies

The file requirements-dev.txt in the root directory of the repository contains all developing dependencies and is readable with pip.

The recommended (but optional) way is to install all dependencies into a virtual environment. This avoids conflicts with other installed packages.

# Create and activate new virtual environment
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

# Install package and extra dependencies
pip install -r requirements-dev.txt

To install the dependencies without a virtual environment only run the last statement.

# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip  # use pip from within the conda environment

# Install package and extra dependencies
pip install -r requirements-dev.txt

Note

While the above procedure works, you may also want to follow the best practices from Anaconda more strictly. In particular, it is recommended to install package dependencies listed in requirements-dev.txt separately with conda install package_name, if the package is hosted on conda.

Install git pre-commit hooks

The datafold source code and configuration files are automatically formatted and checked with

  • black for general code formatting

  • isort for sorting Python import statements alphabetically and in sections.

  • nbstripout to remove potentially large binary formatted output cells in Jupyter notebooks before the content bloats the git history.

  • mypy for static type checking (if applicable).

  • Diverse hooks, such as removing trailing whitespaces, validating configuration files or sorting the requirement files.

It is highly recommended that the tools inspect and format the code before the code is committed to the git history. The git hooks alter the source code in a deterministic way. Each hook should therefore only format the code once to obtain the desired format and none of the tool should break the code.

Conveniently, all of this is managed via pre-commit (installs with requirements-dev.txt) and the configuration in .pre-commit-config.yaml

To install the git-hooks locally run from the root directory:

pre-commit install

The git-hooks then run automatically prior to each git commit. To format the current source code without a commit (e.g. for testing purposes or during development), run from the root directory:

pre-commit run --all-files

Run tests

The unit tests are executed with the test suite pytest and coverage.py (both install with requirements-dev.txt)

To execute all unit tests locally run from the root directory:

coverage run -m pytest datafold/
coverage html -d coverage/

A html coverage report is then located in the folder coverage/. To test if the tutorials run without raising an error run:

pytest tutorials/

All tests can also be executed remotely in a Continuous Integration (CI) setup. The pipeline runs with every push to the main repository. The CI configuration is located in the file .gitlab-ci.yml.

Compile and build documentation

The documentation page is built with Sphinx and various extensions (install with requirements-dev.txt). The source code is documented with numpydoc style.

Additional dependencies to build the documentation that do not install with the development dependencies:

  • LaTex to render equations,

  • mathjax to display equations in the browser

  • graphviz to render class dependency graphs

  • pandoc to convert between formats (required by nbsphinx Sphinx extension that includes the Jupyter tutorials to the web page).

On a debian-like platform, install the packages with

apt install libjs-mathjax fonts-mathjax dvipng pandoc graphviz

(This excludes the Latex installation, see available texlive packages).

To build the documentation run:

python setup.py build_docs --outdir="./public"

The page entry is then located at ./public/index.html. To include the executed cells of the tutorials, add the flag --runtutorials.