Docker Compose deployment

⚠️ Security note

This stack can be deployed in production only if the host is properly secured (firewall enabled, restricted exposed ports, strong credentials, TLS/reverse proxy when needed). Running it on an open host can lead to data leaks or insecure access to internal institutional directories (especially if LDAP is enabled).

🌍 Environment-specific Compose files (DEV / PROD)

The repository ships with additional Compose files to adapt the same modular stack to different environments:

  • DEV overlay (docker-compose.dev.yaml)
    • Docker image tags for components under active development are not pinned (typically :latest).
    • Environment variables that depend on the environment (e.g. APP_ENV, when present) are set to DEV.
  • PROD overlay (docker-compose.prod.yaml)
    • Docker image tags are pinned (explicit versions) to ensure reproducible deployments.
    • Environment variables that depend on the environment (e.g. APP_ENV, when present) are set to PROD.

Use the overlays by combining files with -f (the overlay only overrides what differs from the base file).


πŸ—„οΈ Local PostgreSQL vs external PostgreSQL

Several CRISalid components can use either:

  • a local PostgreSQL container managed by Docker Compose
  • or an external PostgreSQL server already provided by the institution

This is handled with two mechanisms:

  • the base Compose files are now external-DB compatible by default
  • the file docker/docker-compose.local-postgres.yaml re-adds the depends_on relationships required when using local PostgreSQL containers

Base principle

For each component:

  • the application service uses ${..._DB_HOST} and can therefore point either to a local container or to an external host
  • the local PostgreSQL service is placed in its own dedicated profile
  • local PostgreSQL startup dependencies are added only through docker/docker-compose.local-postgres.yaml

Dedicated database profiles

The following profiles are used for local PostgreSQL containers:

  • harvester-db
  • sovisuplus-db
  • keycloak-db
  • cdb-db

When these profiles are not enabled, the corresponding application uses the database host configured in its .env file.


πŸ” Choosing the Components

Before you start, review the components available in map/components.qmd and decide which ones you need.

The main Compose file (docker/docker-compose.yaml) is modular. It uses the include directive and profiles to enable only selected components.

Main docker-compose.yaml
name: crisalid

include:
  - path: ./neo4j/neo4j.yaml
    env_file: ./neo4j/.env
    project_directory: ./neo4j
  - path: ./apollo/apollo.yaml
    env_file: ./apollo/.env
    project_directory: ./apollo
  - path: ./crisalid-bus/crisalid-bus.yaml
    env_file: ./crisalid-bus/.env
    project_directory: ./crisalid-bus
  - path: ./harvester/harvester.yaml
    env_file: ./harvester/.env
    project_directory: ./harvester
  - path: ./ikg/ikg.yaml
    env_file: ./ikg/.env
    project_directory: ./ikg
  - path: ./cdb/cdb.yaml
    env_file: ./cdb/.env
    project_directory: ./cdb
  - path: ./sovisuplus/sovisuplus.yaml
    env_file: ./sovisuplus/.env
    project_directory: ./sovisuplus
  - path: ./sovisuplus/sovisuplus-maintenance.yaml
    env_file: ./sovisuplus/.env
    project_directory: ./sovisuplus
  - path: ./keycloak/keycloak.yaml
    env_file: ./keycloak/.env
    project_directory: ./keycloak
  - path: ./ofelia/ofelia.yaml
    env_file: ./ofelia/.env
    project_directory: ./ofelia
  - path: ./cvs/cvs.yaml
    env_file: ./cvs/.env
    project_directory: ./cvs

DEV example (with docker managed PostgreSQL)

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.dev.yaml \
  -f docker/docker-compose.local-postgres.yaml \
  --profile neo4j \
  --profile apollo \
  --profile crisalid-bus \
  --profile harvester \
  --profile harvester-db \
  --profile ikg \
  --profile cdb \
  --profile cdb-db \
  --profile keycloak \
  --profile keycloak-db \
  --profile sovisuplus \
  --profile sovisuplus-db \
  --profile ofelia \
  up -d

PROD example (with docker managed PostgreSQL)

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.prod.yaml \
  -f docker/docker-compose.local-postgres.yaml \
  --profile neo4j \
  --profile apollo \
  --profile crisalid-bus \
  --profile harvester \
  --profile harvester-db \
  --profile ikg \
  --profile cdb \
  --profile cdb-db \
  --profile keycloak \
  --profile keycloak-db \
  --profile sovisuplus \
  --profile sovisuplus-db \
  --profile ofelia \    
  up -d

PROD example (with external PostgreSQL)

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.prod.yaml \
  --profile neo4j \
  --profile apollo \
  --profile crisalid-bus \
  --profile harvester \
  --profile ikg \
  --profile cdb \
  --profile keycloak \
  --profile sovisuplus \
  --profile ofelia \
  up -d

Configuration check

If you are unsure which services and profiles are effectively enabled after variable expansion and file overrides, run:

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.dev.yaml \
  -f docker/docker-compose.local-postgres.yaml \
  --profile neo4j \
  --profile apollo \
  # add the profiles you want to check \
    config

🧰 Preparation Steps

1. 🧾 .env Files

Each directory under docker/ (e.g. apollo, crisalid-bus, ikg, neo4j, cdb, harvester, …) has its own .env.sample file.

  • Copy each .env.sample to .env for each directory except cdb (which is configured by the configure_cdb.sh script).
  • Fill in appropriate values (hostnames, ports, secrets, etc.)
  • The main docker/.env.sample includes values used by multiple components (like RabbitMQ or Neo4j credentials)

If you plan to connect the CRISalid Directory Bridge (cdb) to your institutional LDAP, make sure to set:

LDAP_HOST=
LDAP_BIND_DN=
LDAP_BIND_PASSWORD=

2. πŸ”§ Configure CRISalid Bus

This script reads the .env values and generates the RabbitMQ definitions.json file (exchanges, queues, admin user, etc.).

./docker/configure_crisalid_bus.sh

3. πŸ”§ Configure CRISalid Directory Bridge (CDB)

This script clones the DAGs, generates environment files, and runs the Airflow initialization.

You must specify the target environment:

./docker/configure_cdb.sh dev

or

./docker/configure_cdb.sh prod

4. πŸ” Configure Basic Authentication for SVP Harvester

SVP Harvester supports HTTP Basic authentication for both its web interface and REST API.

Authentication is enabled by default. You control this behavior via the HARVESTER_ENABLE_BASIC_AUTH environment variable in the harvester’s .env file.

Enable / disable authentication

In docker/harvester/.env:

HARVESTER_ENABLE_BASIC_AUTH=true
  • true (default): all /admin/* and /api/* endpoints are protected
  • false: authentication is disabled (all endpoints are public)

⚠️ This authentication mechanism is intended for local development and restricted environments. Always use HTTPS if enabling it in production.


Create the first user

User credentials are stored in a local file mounted into the container:

app/auth/users.json

Before starting the container for the first time, ensure that this file exists and is initialized with an empty JSON object:

mkdir -p harvester/auth
echo '{}' > harvester/auth/users.json

Once the harvester-ui container is running, create an initial user :

docker exec -it harvester-ui python scripts/add_basic_user.py admin

You will be prompted to enter and confirm a password.

The credentials take effect immediately; no container restart is required.


Remove a user

To remove an existing user:

docker exec -it harvester-ui python scripts/remove_basic_user.py admin

πŸ”„ Optional: Full reset of Airflow state

By default, the script does NOT wipe Airflow state (DAG history, users, variables, connections, etc.).

If you want to completely reset Airflow (including metadata database and volumes), use:

./docker/configure_cdb.sh dev --reset

You will be prompted to type RESET to confirm.

⚠️ This removes Docker volumes for the CDB profile and permanently deletes:

  • DAG execution history
  • Users and passwords
  • Variables and connections
  • XCom data

Use with caution


ℹ️ In dev, Airflow GUI admin credentials are initialized from environment variables (default: admin:admin unless overridden in docker/.env).


After running the script, if you intend to use the CSV mode for structures and people (instead of LDAP), place your data files in:

docker/cdb/data/
β”œβ”€β”€ structure.csv
└── people.csv

Sample CSVs:

Full documentation (in French):


5. πŸ”‘ Configure Keycloak

Keycloak is handling authentication within the system. Multiple client applications (such as Sovisu+) can share the same authentication realm. To set up Keycloak in this environment, follow these steps:

  1. Global .env Configuration

In the global .env file, you will find the shared Keycloak configuration variables, such as the realm name ( KEYCLOAK_REALM) and the client secrets (SOVISUPLUS_KEYCLOAK_CLIENT_SECRET). The KEYCLOAK_REALM can be customized ( e.g., crisalid-my-university) for readability.

Example:

KEYCLOAK_REALM=crisalid-inst
SOVISUPLUS_KEYCLOAK_CLIENT_SECRET=MY-SECRET-VALUE
  1. Keycloak Configuration Script

Run the ./configure_keycloak.sh script. This will create the required configuration file from the template ( docker/keycloak/config/crisalid-inst.json.template).

  1. Customizing Keycloak .env Settings

The docker/keycloak/.env.sample file provides the environment settings for Keycloak, such as the admin credentials and database configurations. Copy the sample file to .env and modify the settings as needed.

KEYCLOAK_ADMIN=admin
KEYCLOAK_ADMIN_PASSWORD=admin
KEYCLOAK_DB_VENDOR=postgres
KEYCLOAK_DB_HOST=keycloak-db # Only if you are using the local PostgreSQL container for Keycloak. 
# If using an external database, set this to the appropriate hostname or IP address.
KEYCLOAK_DB_PORT=5432
KEYCLOAK_DB_NAME=keycloak
KEYCLOAK_DB_USER=keycloak
KEYCLOAK_DB_PASSWORD=keycloak
  1. Define specific URIs in /etc/hosts

To ensure that Sovisu+ and Keycloak can be accessed correctly, you need to define specific URIs in your /etc/hosts file. This is necessary because Sovisu+ uses OAuth2 with ORCID, which requires a specific hostname even to deliver β€œsandbox” keys.

# Add these lines to your /etc/hosts file
127.0.0.1 sovisuplus.local
127.0.0.1 keycloak.local

6. SoVisu+ custom themes

SoVisu+ allows you to customize its appearance using themes. You can create your own theme by following these steps:

  1. Copy the sample theme directory:
cp -r sovisuplus/theme-sample sovisuplus/theme
  1. Edit the theme files in sovisuplus/theme to customize text and images according to your institution’s branding.

7. SoVisu+ RBAC Roles File

SoVisu+ comes with a sample RBAC configuration you can customize. Start by copying the sample file and editing it:

cp sovisuplus/config/rbac.roles.sample.yaml sovisuplus/config/rbac.roles.yaml
  • Edit sovisuplus/config/rbac.roles.yaml to define your roles by grouping permissions (but dont create new permissions, i.e new actions/subjects/fields as they are used in the code).
  • After any change, the docker startup script will copy the file to the container and re-seed the roles and permissions in database.

8. ⏰ Configure Ofelia (system-wide scheduler)

Ofelia is the scheduler of the whole CRISalid stack. It runs as its own container and periodically triggers tasks for other containers (by running a CLI command or calling an API endpoint).

Ofelia reads a config file and uses the Docker API to:

  • job-exec: run a command inside an existing container (via docker exec)
  • job-local: run a command directly inside the Ofelia container itself (typically, a curl command to call an API)
  • (and job-run: run a one-shot container if needed)

You can find full documentation here:

In this deployment, Ofelia is included as its own Docker Compose profile (ofelia) with:

  • a compose file: docker/ofelia/ofelia.yaml
  • a scheduler config file template: docker/ofelia/config.ini
  • a configuration template: docker/ofelia/.env.sample

Create Ofelia .env

In docker/ofelia/, copy the sample file:

cp docker/ofelia/.env.sample docker/ofelia/.env

The sample content is:

CONFIG_FILE_NAME=config.ini

This variable is used only by Docker Compose to choose which config file to mount inside Ofelia. You can later duplicate config.ini under another name (for example config.dev.ini, config.prod.ini, etc.) and switch by changing CONFIG_FILE_NAME in .env without editing the Compose file:

CONFIG_FILE_NAME=config.dev.ini
# or:
# CONFIG_FILE_NAME=config.prod.ini

Define the scheduled jobs in config.ini

The default scheduler configuration file is:

docker/ofelia/config.ini

Example content:

; file: docker/ofelia/config.ini

[job-exec "ikg-fetch-pubs"]
schedule = @every 2m
container = crisalid-ikg
command = python -m app.cli people fetch-publication-random

This declares a job named ikg-fetch-pubs that:

  • runs every 2 minutes (@every 2m)
  • uses job-exec: it executes the command inside the running crisalid-ikg container
  • calls the internal CLI: python -m app.cli people fetch-publication-random

Run Ofelia with Docker Compose

To start the scheduler along with the rest of the stack, include the ofelia profile, for example:

docker compose \
...
  --profile ofelia \
  up -d

πŸ”Œ Communication with Host Machine

If you want to connect external tools (on your host) to the containers, open the necessary ports.

For example, to expose RabbitMQ’s AMQP port on the host machine, edit docker/crisalid-bus/crisalid-bus.yaml and uncomment the 2nd ports line:

ports:
  - "${CRISALID_BUS_HTTP_PORT}:15672"
#  - "${CRISALID_BUS_AMQP_PORT}:5672"
expose:
  - "${CRISALID_BUS_AMQP_PORT}"

♻️ Resetting Containers

To stop and delete containers + volumes for one profile, use the same Compose files and profiles that were used during up.

DEV example

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.dev.yaml \
  --profile cdb \
  down --volumes

PROD example

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.prod.yaml \
  --profile cdb \
  down --volumes

To also delete images:

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.prod.yaml \
  --profile cdb \
  down --volumes --rmi all

πŸ”Ž Removing named volumes manually (if needed)

If some volumes remain, you can remove them explicitly:

docker volume rm postgres-db-volume redis-db-volume data-versioning-redis-volume
docker volume rm keycloak_postgres_data
docker volume rm svp-db-volume
docker volume rm crisalid-bus-volume
docker volume rm neo4j-data-volume neo4j-logs-volume neo4j-import-volume neo4j-plugins-volume neo4j-backups-volume

πŸš€ Starting the Services

The stack is modular. Select the profiles you need and combine the base Compose file with the appropriate environment overlay.

πŸ”§ DEV

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.dev.yaml \
  --profile neo4j \
  --profile apollo \
  --profile crisalid-bus \
  --profile harvester \
  --profile ikg \
  --profile cdb \
  --profile keycloak \
  --profile sovisuplus \
  --profile ofelia \
  up -d

In DEV:

  • Image tags for components under development are not pinned (typically :latest)
  • Environment variables such as APP_ENV (when defined) are set to DEV

🏭 PROD

docker compose \
  -f docker/docker-compose.yaml \
  -f docker/docker-compose.prod.yaml \
  --profile neo4j \
  --profile apollo \
  --profile crisalid-bus \
  --profile harvester \
  --profile ikg \
  --profile cdb \
  --profile keycloak \
  --profile sovisuplus \
  --profile ofelia \
  up -d

In PROD:

  • Image tags are pinned to explicit versions
  • Environment variables such as APP_ENV (when defined) are set to PROD

You can add or remove profiles depending on the components required by the institution.

βœ… Next Steps

Once your services are up, follow the component-specific instructions in each section of the documentation. You can now:

🧭 Back to Development Index