Contributing#
The maintainers of datafold and code contributors are listed here.
Getting in touch#
Any code contribution (bug fixes/tutorials/documentation changes) and feedback is very welcome. Please open a new issue via
email if you have no gitlab account (this opens a confidential issue).
Setting up datafold for development#
This section describes all steps to set up datafold for code development.
Note
Many tasks of setting up the development environment are also included in the Makefile. Run
make help
in the shell to view the available targets with a short description.
In Linux make
is a standard tool and pre-installed.
Warning
The make
targets are not fully tested for Windows. Please file an issue if you
encounter problems.
In Windows the recommended way is to use make
in the
git bash. For this you may
install Chocolatey first
(with administrator rights) and then use the choco
software manger tool to install
make
with
choco install make
Chocolatey is also suitable to install non-Python dependencies required for building the datafold’s html documentation.
Note
The datafold repository also includes a Dockerfile which creates a Docker image suitable for development (e.g. it automatically installs all non-Python dependencies necessary to build the documentation). In Linux run
docker build -t datafold .
to create the Docker image (possibly requires sudo
rights). To start a new Docker
container in the interactive session run
docker run -v `pwd`:/home/datafold-mount -w /home/datafold-mount/ -it --rm --net=host datafold bash
This mounts the datafold repository within the container (all data is shared between the host system and container). To install the dependencies within the container execute:
make install_devdeps
Quick set up#
The bash script includes all steps that are detailed below.
# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/
# Set up Python virtual environment
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
# Install package and development dependencies
python -m pip install -r requirements-dev.txt
# Install and run git hooks managed by pre-commit
python -m pre_commit run --all-files
# Run tests with coverage and pytest
python -m coverage run -m pytest datafold/
python -m coverage html -d coverage/
python -m coverage report
# Test if tutorials run without error
python -m pytest tutorials/
# Build documentation (writes to "docs/build/")
# Note that this requires additional third-party dependencies
python -m sphinx -M html doc/source/ pages/
datafold is not available from the conda package manager. If you run
Python with Anaconda’s package manager, the recommended way is to set up
datafold in a conda
environment by using pip
.
Also note the official instructions for package management in Anaconda, particularly the subsection on how to install non-conda packages.
# Clone repository (replace [NAMESPACE] with your fork or "datafold-dev")
git clone git@gitlab.com:[NAMESPACE]/datafold.git
cd ./datafold/
# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip # use pip from within the conda environment
# Install package and development dependencies
pip install -r requirements-dev.txt
# Install and run git hooks managed by pre-commit
python -m pre_commit run --all-files
# Run tests with coverage and pytest
python -m coverage run -m pytest datafold/
python -m coverage html -d coverage/
python -m coverage report
# Test if tutorials run without error
python -m pytest tutorials/
# Build documentation (writes to "docs/build/")
# Note that this requires additional third-party dependencies
python -m sphinx -M html doc/source/ pages/
Fork and create merge requests to datafold#
Please read and follow the steps of gitlab’s “Project forking workflow”.
Note
We set up a “Continuous Integration” (CI) pipeline. However, the worker (a gitlab-runner) of the datafold repository is not available for forked projects (for background information see here).
After you have created a fork you can clone the repository with:
git clone git@gitlab.com:[NAMESPACE]/datafold.git
(replace [NAMESPACE] accordingly)
Install development dependencies#
The file requirements-dev.txt
in the root directory of the repository contains all
developing dependencies and is readable with pip
.
The recommended (but optional) way is to install all dependencies into a virtual environment. This avoids conflicts with other installed packages.
# Create and activate new virtual environment
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
# Install package and extra dependencies
pip install -r requirements-dev.txt
To install the dependencies without a virtual environment only run the last statement.
# Create new conda environment with pip installed
conda create -n .venv
conda activate .venv
conda install pip # use pip from within the conda environment
# Install package and extra dependencies
pip install -r requirements-dev.txt
Note
While the above procedure works, you may also want to follow the best practices
from Anaconda
more strictly. In particular, it is recommended to install package dependencies
listed in requirements-dev.txt
separately with
conda install package_name
, if the package is hosted on conda
.
Install git pre-commit hooks#
The datafold source code and configuration files are automatically formatted and checked with
black for general code formatting
isort for sorting Python
import
statements alphabetically and in sections.nbstripout to remove potentially large binary formatted output cells in Jupyter notebooks before the content bloats the git history.
mypy for static type checking (if applicable).
Diverse hooks, such as removing trailing whitespaces, validating configuration files or sorting the requirement files.
It is highly recommended that the tools inspect and format the code before the code is committed to the git history. The git hooks alter the source code in a deterministic way. Each hook should therefore only format the code once to obtain the desired format and none of the tool should break the code.
Conveniently, all of this is managed via pre-commit
(installs with requirements-dev.txt
) and the configuration in
.pre-commit-config.yaml
To install the git-hooks locally run from the root directory:
python -m pre_commit install
The git-hooks then run automatically prior to each git commit
. To format the
current source code without a commit (e.g. for testing purposes or during development),
run from the root directory:
python -m pre_commit run --all-files
Run tests#
The unit tests are executed with the test suite
pytest and
coverage.py
(both install with requirements-dev.txt
)
To execute all unit tests locally run from the root directory:
python -m coverage run --branch -m pytest datafold/; \
python -m coverage html -d ./coverage/; \
python -m coverage report;
A html coverage report is then located in the folder coverage/
. To test if the
tutorials run without raising an error run:
python -m pytest tutorials/;
All tests can also be executed remotely in a Continuous Integration (CI) setup. The pipeline runs with every push to the main datafold repository. The CI configuration is located in the .gitlab-ci.yml file.
Compile and build documentation#
The documentation page is
built with Sphinx and various extensions
(install with requirements-dev.txt
). The source code is documented with
numpydoc style.
Additional dependencies to build the documentation that do not install with the development dependencies:
LaTex to render equations,
mathjax to display equations in the browser
graphviz to render class dependency graphs
pandoc to convert between formats (required by
nbsphinx
Sphinx extension that includes the Jupyter tutorials to the web page).
Install the non-Python software with (preferably with sudo)
apt install libjs-mathjax fonts-mathjax dvipng pandoc graphviz texlive-base texlive-latex-extra
Install the non-Python software with (preferably with administrator rights in the bash)
choco install pandoc miktex graphviz
Install the non-Python software with (best with administrator rights)
make install_docdeps
To build the documentation run:
python setup.py build_docs --outdir="./public"
The page entry is then located at ./public/index.html
. To execute all cells in the
tutorials (Jupyter notebooks) add the flag --runtutorials
.