Overview

django_analyses provides a database-supported pipeline engine meant to facilitate research management.

A general schema for pipeline management is laid out as follows:

_images/models.png

Analyses

Each Analysis may be associated with a number of AnalysisVersion instances, and each of those must be provided with an interface, i.e. a Python class exposing some run() method and returning a dictionary of results.

Input and Output Specifications

InputSpecification and OutputSpecification each aggregate a number of InputDefinition and OutputDefinition sub-classes (respectively).

Input and Output Definitions

Currently, there are seven different types of built-in input definitions:

and two different kinds of supported output definitions:

Each one of these InputDefinition and OutputDefinition sub-classes provides unique validation rules (default, minimal/maximal value or length, choices, etc.), and you can easily create more definitions to suit your own needs.

Pipelines

Pipeline instances are used to reference a particular collection of Node and Pipe instances. Each Node defines a particular combination of analysis version and configuration, and each Pipe connects between one node’s output definition and another’s input definition.

Runs

Run instances are used to keep a record of every time an analysis version is run with a distinct set of inputs and outputs. If we ever to execute a run with identical parameters, the RunManager will simply return the existing run.