Contributing#

The maintainers of datafold and code contributors are listed here.

Getting in touch#

Any code contribution (bug fixes/tutorials/documentation changes) and feedback is very welcome. Please open a new issue via

Setting up datafold for development#

This section describes all steps to set up datafold for code development.

Note

Many tasks of setting up the development environment are also included in the Makefile. Run

make help

in the shell to view the available targets with a short description.

In Linux make is a standard tool and pre-installed.

Warning

The make targets are not fully tested for Windows. Please file an issue if you encounter problems.

In Windows the recommended way is to use make in the git bash. For this you may install Chocolatey first (with administrator rights) and then use the choco software manger tool to install make with

choco install make

Chocolatey is also suitable to install non-Python dependencies required for building the datafold’s html documentation.

Note

The datafold repository also includes a Dockerfile which creates a Docker image suitable for development (e.g. it automatically installs all non-Python dependencies necessary to build the documentation). In Linux run

docker build -t datafold .

to create the Docker image (possibly requires sudo rights). To start a new Docker container in the interactive session run

docker run -v `pwd`:/home/datafold-mount -w /home/datafold-mount/ -it --rm --net=host datafold bash

This mounts the datafold repository within the container (all data is shared between the host system and container). To install the dependencies within the container execute:

make install_devdeps

Quick set up#

The bash script includes all steps that are detailed below.

# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/

# Set up Python virtual environment
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

# Install package and development dependencies
python -m pip install -r requirements-dev.txt

# Install and run git hooks managed by pre-commit
python -m pre_commit run --all-files

# Run tests with coverage and pytest
python -m coverage run -m pytest datafold/
python -m coverage html -d coverage/
python -m coverage report

# Test if tutorials run without error
python -m pytest tutorials/

# Build documentation (writes to "docs/build/")
# Note that this requires additional third-party dependencies

python -m sphinx -M html doc/source/ pages/

datafold is not available from the conda package manager. If you run Python with Anaconda’s package manager, the recommended way is to set up datafold in a conda environment by using pip.

Also note the official instructions for package management in Anaconda, particularly the subsection on how to install non-conda packages.

# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/

# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip  # use pip from within the conda environment

# Install package and development dependencies
pip install -r requirements-dev.txt

# Install and run git hooks managed by pre-commit
python -m pre_commit run --all-files

# Run tests with coverage and pytest
python -m coverage run -m pytest datafold/
python -m coverage html -d coverage/
python -m coverage report

# Test if tutorials run without error
python -m pytest tutorials/

# Build documentation (writes to "docs/build/")
# Note that this requires additional third-party dependencies
python -m sphinx -M html doc/source/ pages/

Fork and create merge requests to datafold#

Please read and follow the steps of gitlab’s “Project forking workflow”.

Note

We set up a “Continuous Integration” (CI) pipeline. However, the worker (a gitlab-runner) of the datafold repository is not available for forked projects (for background information see here).

After you have created a fork you can clone the repository with:

git clone git@gitlab.com:[NAMESPACE]/datafold.git

(replace [NAMESPACE] accordingly)

Install development dependencies#

The file requirements-dev.txt in the root directory of the repository contains all developing dependencies and is readable with pip.

The recommended (but optional) way is to install all dependencies into a virtual environment. This avoids conflicts with other installed packages.

# Create and activate new virtual environment
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

# Install package and extra dependencies
pip install -r requirements-dev.txt

To install the dependencies without a virtual environment only run the last statement.

# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip  # use pip from within the conda environment

# Install package and extra dependencies
pip install -r requirements-dev.txt

Note

While the above procedure works, you may also want to follow the best practices from Anaconda more strictly. In particular, it is recommended to install package dependencies listed in requirements-dev.txt separately with conda install package_name, if the package is hosted on conda.

Install git pre-commit hooks#

The datafold source code and configuration files are automatically formatted and checked with

  • black for general code formatting

  • isort for sorting Python import statements alphabetically and in sections.

  • nbstripout to remove potentially large binary formatted output cells in Jupyter notebooks before the content bloats the git history.

  • mypy for static type checking (if applicable).

  • Diverse hooks, such as removing trailing whitespaces, validating configuration files or sorting the requirement files.

It is highly recommended that the tools inspect and format the code before the code is committed to the git history. The git hooks alter the source code in a deterministic way. Each hook should therefore only format the code once to obtain the desired format and none of the tool should break the code.

Conveniently, all of this is managed via pre-commit (installs with requirements-dev.txt) and the configuration in .pre-commit-config.yaml

To install the git-hooks locally run from the root directory:

python -m pre_commit install

The git-hooks then run automatically prior to each git commit. To format the current source code without a commit (e.g. for testing purposes or during development), run from the root directory:

python -m pre_commit run --all-files

Run tests#

The unit tests are executed with the test suite pytest and coverage.py (both install with requirements-dev.txt)

To execute all unit tests locally run from the root directory:

python -m coverage run --branch -m pytest datafold/; \
python -m coverage html -d ./coverage/; \
python -m coverage report;

A html coverage report is then located in the folder coverage/. To test if the tutorials run without raising an error run:

python -m pytest tutorials/;

All tests can also be executed remotely in a Continuous Integration (CI) setup. The pipeline runs with every push to the main datafold repository. The CI configuration is located in the .gitlab-ci.yml file.

Compile and build documentation#

The documentation page is built with Sphinx and various extensions (install with requirements-dev.txt). The source code is documented with numpydoc style.

Additional dependencies to build the documentation that do not install with the development dependencies:

  • LaTex to render equations,

  • mathjax to display equations in the browser

  • graphviz to render class dependency graphs

  • pandoc to convert between formats (required by nbsphinx Sphinx extension that includes the Jupyter tutorials to the web page).

Install the non-Python software with (preferably with sudo)

apt install libjs-mathjax fonts-mathjax dvipng pandoc graphviz texlive-base texlive-latex-extra

Install the non-Python software with (preferably with administrator rights in the bash)

choco install pandoc miktex graphviz

Install the non-Python software with (best with administrator rights)

make install_docdeps

To build the documentation run:

python setup.py build_docs --outdir="./public"

The page entry is then located at ./public/index.html. To execute all cells in the tutorials (Jupyter notebooks) add the flag --runtutorials.