Introduction

Eka CI is a Continuous Integration server purpose-built for Nix projects. It is designed to make reviewing Nix-based pull requests fast and trustworthy, especially for repositories that are too large to be reviewed by hand on every change.

The goal is to answer one question as quickly and as reliably as possible:

Should I merge this PR?

To do that, Eka CI focuses on the things that actually matter for a Nix repository:

  • Does evaluation still succeed?
  • Which packages were added, removed, newly succeed, or newly fail?
  • What is the closure-size and dependency impact of the change?
  • What does the rebuild blast radius look like across systems?

Manual review processes do not scale to repositories the size of Nixpkgs. Eka CI replaces that workflow with a small set of strong signals attached directly to each pull request.

What Eka CI provides

  • GitHub App integration — webhook-based event handling, check runs, merge queue support, and fine-grained credential management.
  • Nix-aware build orchestration — dependency graph tracking with an LRU cache, multi-tier build queues, dedicated FOD queue, remote builders, and requiredSystemFeatures support.
  • Binary cache integration — S3, Cachix, and Attic, with credential sources ranging from environment variables to Vault, AWS Secrets Manager, and systemd-creds.
  • Change summaries and rebuild impact — per-PR diffs of which packages changed and how many derivations have to rebuild, posted as a single GitHub check.
  • Build metrics — output (NAR) size and closure size tracked over time, compared against the base branch, with configurable thresholds.
  • PR comment commands@eka-ci merge and friends for queueing merges from a comment.

Components

Eka CI is a Cargo workspace with two main binaries:

  • eka-ci-server — the long-running CI server that talks to GitHub and orchestrates builds.
  • ekaci — a CLI client that talks to the server over a Unix socket.

A web frontend (Elm) lives alongside but is partially implemented; the HTTP API and WebSocket endpoints are the supported integration surface today.

How to read these docs

If you are setting Eka CI up for the first time, start with Quick Start and then work through Installation and GitHub App Setup.

If you are operating an existing deployment, the LRU Cache Tuning and Monitoring & Metrics pages are the most useful starting points.

For a deeper picture of how the server is built, see Architecture.

Quick Start

This page walks through the minimum set of steps required to get Eka CI watching a single repository. For deeper detail on each step, follow the linked pages.

Prerequisites

  • Nix package manager installed (with flakes enabled)
  • A GitHub organization with admin access
  • A publicly reachable HTTPS endpoint for receiving webhooks

1. Create a GitHub App

Eka CI authenticates to GitHub as a GitHub App. Create one at https://github.com/organizations/YOUR_ORG/settings/apps with:

  • Permissions: Checks (read/write), Contents (read), Pull Requests (read)
  • Events: pull_request, workflow_run, merge_group, installation

Generate and download the private key. The full walkthrough, including all eight credential sources, lives in GitHub App Setup.

2. Configure the server

Create ~/.config/ekaci/ekaci.toml:

[[github_apps]]
id = "main"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }

[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }

[caches.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = ["main", "release/*"]

See Server Configuration and Configuring Caches for the full set of options.

3. Configure the repository

Add .eka-ci/config.json at the root of any repository you want Eka CI to build:

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "allow_eval_failures": true,
      "caches": ["production-s3"]
    }
  },
  "checks": {
    "nixfmt": {
      "shell": "formatting",
      "command": "nixfmt --check **/*.nix",
      "allow_network": false
    }
  }
}

See Repository Configuration for the full schema.

4. Run the server

nix build
./result/bin/eka-ci-server

For a long-running deployment, run it under systemd. A minimal unit file is included in Installation.

5. Open a pull request

Once the server is running and the GitHub App is installed on a repository, opening a pull request will trigger:

  1. An evaluation of the repository against the PR head and base.
  2. A diff of derivations and a queued build for the changes.
  3. One or more check runs reporting build status.
  4. A EkaCI: Change Summary check posting a per-PR summary of changed packages and rebuild impact (see Change Summaries).

From there, reviewers can merge through the GitHub UI or by commenting @eka-ci merge — see PR Comment Commands.

Installation

Eka CI is distributed as a Nix flake. The build produces two binaries:

  • eka-ci-server — the CI server daemon.
  • ekaci — a CLI client that talks to the server over a Unix socket.

Build from source

git clone https://github.com/ekala-project/eka-ci.git
cd eka-ci
nix build
./result/bin/eka-ci-server --help

The flake exposes the standard packages.default attribute, so it can also be consumed from another flake:

{
  inputs.eka-ci.url = "github:ekala-project/eka-ci";

  outputs = { self, nixpkgs, eka-ci, ... }: {
    # ...
    nixosConfigurations.example = nixpkgs.lib.nixosSystem {
      modules = [
        ({ pkgs, ... }: {
          environment.systemPackages = [ eka-ci.packages.${pkgs.system}.default ];
        })
      ];
    };
  };
}

Run as a systemd service

A minimal unit file:

[Unit]
Description=eka-ci server
After=network.target

[Service]
Type=simple
ExecStart=/path/to/eka-ci-server
Restart=on-failure
User=eka-ci
Environment="RUST_LOG=info"
# Example: provide credentials for cache backends
Environment="VAULT_TOKEN=s.your-token"

[Install]
WantedBy=multi-user.target

For production deployments you will likely want to add systemd hardening (ProtectSystem=strict, PrivateTmp=true, NoNewPrivileges=true, etc.) and to load secrets via LoadCredential= and the systemd credential source documented in GitHub App Setup.

Required state directories

By default the server stores state under paths that can be overridden in ekaci.toml:

PurposeDefaultSetting
SQLite database~/.local/share/ekaci/sqlite.dbdb_path
Build logs~/.local/share/ekaci/logslogs_dir
Unix socket$XDG_RUNTIME_DIR/ekaci.socksocket_path

For a multi-user system service you typically want these under /var/lib/ekaci and /var/log/ekaci. See Server Configuration.

Verify the install

Once the server is running you can ping it via the CLI:

ekaci status

And confirm the metrics endpoint is reachable:

curl http://127.0.0.1:3030/metrics | head

If both succeed, continue to GitHub App Setup.

GitHub App Setup and Configuration Guide

Complete guide for creating, configuring, and securing GitHub Apps for eka-ci.

Table of Contents


Introduction

What is a GitHub App?

A GitHub App is a first-class integration with GitHub that provides:

  • Fine-grained permissions
  • Webhook-based event delivery
  • Organization-wide installation
  • Higher API rate limits
  • Better security than personal access tokens

Why GitHub Apps?

eka-ci uses GitHub Apps because they:

  • Fine-grained permissions - Request only the access you need
  • Organization-wide installation - One setup for all repositories
  • Better security - Credentials can't be used to access user data
  • Webhook integration - Automatic notifications for CI events
  • Rate limit advantages - Higher API rate limits (5,000 vs 1,000 requests/hour)

Prerequisites

Before you begin:

  1. Administrative access to the GitHub organization where you want to install eka-ci
  2. A running eka-ci server with a publicly accessible URL (for webhooks)
  3. Access to secure credential storage (Vault, AWS Secrets Manager, or similar for production)

Part 1: Creating the GitHub App

Step 1: Navigate to GitHub App Settings

  1. Go to your organization's settings page:

    https://github.com/organizations/YOUR_ORG/settings/apps
    

    Or for personal accounts:

    https://github.com/settings/apps
    
  2. Click "New GitHub App"

Step 2: Configure Basic Information

Fill in the basic app information:

FieldValueNotes
GitHub App nameeka-ci (or your preferred name)Must be unique across GitHub
Homepage URLhttps://your-eka-ci-server.comYour eka-ci server's public URL
DescriptionContinuous Integration for Nix projectsOptional but recommended
Callback URLLeave emptyNot used by eka-ci
Setup URLLeave emptyNot used by eka-ci

Step 3: Configure Webhook Settings

This is critical for eka-ci to receive events:

FieldValueNotes
Webhook URLhttps://your-eka-ci-server.com/github/webhookMust be publicly accessible
Webhook secretGenerate a strong secretIMPORTANT: Save this securely!

Generating a webhook secret:

# Generate a random secret
openssl rand -hex 32

# Example output:
# 3f8a9c7b2e1d6f4a8b9c7e2d1f6a4b9c8e7d2f1a6b4c9e8d7f2a1b6c4e9d8f7

⚠️ Security Note: The webhook secret verifies that webhook payloads come from GitHub. While eka-ci currently doesn't verify this signature (pending implementation), you should still configure it for future use.

Webhook settings:

  • Content type: application/json
  • SSL verification: ✅ Enable SSL verification (required for production)

Step 4: Configure Permissions

eka-ci requires these Repository permissions:

PermissionAccess LevelPurpose
ChecksRead & WriteCreate and update CI check runs on PRs
ContentsRead onlyClone repositories and read source code
Pull requestsRead onlyReceive PR events and read PR metadata
MetadataRead onlyDefault permission (automatically included)

Do NOT grant:

  • Write access to Contents, Pull Requests, or Issues (not needed)
  • Any Organization permissions
  • Any Account permissions

Step 5: Subscribe to Events

Enable these webhook events:

  • Pull request - Triggers builds on PR open, update, close
  • Pull request review - Re-checks auto-merge eligibility when maintainers approve or dismiss reviews
  • Workflow run - For approval workflow integration
  • Merge group - For GitHub merge queue support
  • Installation - Tracks when app is installed/uninstalled
  • Installation repositories - Tracks repository access changes

Do NOT enable:

  • Push events (eka-ci is PR-focused)
  • Issue events (not used)
  • Other events (creates unnecessary webhook traffic)

Step 6: Installation Scope

Choose "Only on this account" unless you plan to distribute eka-ci as a public service.

Step 7: Create the App

  1. Review your settings
  2. Click "Create GitHub App"
  3. You'll be redirected to your app's settings page

Part 2: Obtaining Credentials

After creating the app, you need two pieces of information:

App ID

  1. On your GitHub App's settings page, find "App ID" near the top
  2. It's a numeric value like 123456
  3. Save this - you'll need it for eka-ci configuration

Private Key

  1. Scroll down to the "Private keys" section
  2. Click "Generate a private key"
  3. A .pem file will download automatically
  4. CRITICAL: Store this file securely - it cannot be recovered if lost!

The downloaded file looks like:

your-app-name.YYYY-MM-DD.private-key.pem

Contents:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA1234567890abcdefghijklmnopqrstuvwxyz...
...multiple lines of base64-encoded key data...
-----END RSA PRIVATE KEY-----

Part 3: Securing Your Credentials

⚠️ NEVER commit credentials to version control!

eka-ci supports 8 different methods for securing GitHub App credentials. Choose based on your environment:

Development: Environment Variables

Pros: Simple, quick setup Cons: Not suitable for production, credentials in memory

export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY="$(cat your-app-name.private-key.pem)"
./eka-ci-server

eka-ci configuration (~/.config/ekaci/ekaci.toml):

# No configuration needed - automatic fallback to environment variables
# OR explicitly:
[[github_apps]]
id = "dev-app"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }

⚠️ Not recommended for production! Use one of the secure methods below.

Production Option 1: File-Based with Restricted Permissions

Pros: Simple, no external dependencies Cons: Credentials on disk, manual rotation

  1. Create a secure directory:
sudo mkdir -p /etc/eka-ci
sudo chmod 700 /etc/eka-ci
  1. Create a credentials file (JSON format):
sudo tee /etc/eka-ci/github-app.json <<EOF
{
  "GITHUB_APP_ID": "123456",
  "GITHUB_APP_PRIVATE_KEY": "$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')"
}
EOF

sudo chmod 600 /etc/eka-ci/github-app.json
sudo chown eka-ci:eka-ci /etc/eka-ci/github-app.json
  1. Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }

Supported file formats:

JSON:

{
  "GITHUB_APP_ID": "123456",
  "GITHUB_APP_PRIVATE_KEY": "-----BEGIN RSA PRIVATE KEY-----\n..."
}

Key=value:

GITHUB_APP_ID=123456
GITHUB_APP_PRIVATE_KEY=-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----

Production Option 2: GitHub App Key File

Pros: Keeps private key in original PEM format Cons: Credentials on disk

  1. Store the private key securely:
sudo cp your-app-name.private-key.pem /etc/eka-ci/github-app-key.pem
sudo chmod 600 /etc/eka-ci/github-app-key.pem
sudo chown eka-ci:eka-ci /etc/eka-ci/github-app-key.pem
  1. Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { github-app-key-file = {
    app_id_env = "GITHUB_APP_ID",
    key_file = "/etc/eka-ci/github-app-key.pem"
}}
  1. Set the App ID:
export GITHUB_APP_ID=123456

Pros: Best security, audit logs, automatic rotation Cons: Requires Vault infrastructure

  1. Store credentials in Vault:
# First, format the private key for JSON (escape newlines)
PRIVATE_KEY=$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')

# Store in Vault
vault kv put secret/eka-ci/github-app \
  GITHUB_APP_ID="123456" \
  GITHUB_APP_PRIVATE_KEY="${PRIVATE_KEY}"
  1. Configure eka-ci:
[[github_apps]]
id = "production"

[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
namespace = "production"  # Optional, for Vault Enterprise
  1. Run eka-ci with Vault token:
export VAULT_TOKEN=s.your-vault-token
./eka-ci-server

Production Option 4: AWS Secrets Manager

Pros: Managed service, integrates with IAM Cons: AWS-specific, costs money

  1. Create secret in AWS Secrets Manager:
# Format the private key
PRIVATE_KEY=$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')

# Create secret
aws secretsmanager create-secret \
  --name eka-ci/github-app \
  --description "eka-ci GitHub App credentials" \
  --secret-string "{\"GITHUB_APP_ID\":\"123456\",\"GITHUB_APP_PRIVATE_KEY\":\"${PRIVATE_KEY}\"}"
  1. Configure eka-ci:
[[github_apps]]
id = "production"

[github_apps.credentials.aws-secrets-manager]
secret_name = "eka-ci/github-app"
region = "us-east-1"  # Optional, defaults to AWS_REGION env var
  1. Ensure eka-ci has IAM permissions:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:eka-ci/github-app-*"
    }
  ]
}

Production Option 5: systemd Credentials (Linux with TPM2)

Pros: Hardware-encrypted, no external dependencies Cons: Requires systemd 250+, TPM2 chip

  1. Create credential file:
# Format credentials as JSON
cat > /tmp/github-app.json <<EOF
{
  "GITHUB_APP_ID": "123456",
  "GITHUB_APP_PRIVATE_KEY": "$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')"
}
EOF
  1. Encrypt with systemd:
# Encrypt using TPM2
sudo systemd-creds encrypt \
  --name=github-app-credentials \
  /tmp/github-app.json \
  /var/lib/systemd/credential/github-app.cred

# Clean up plaintext
shred -u /tmp/github-app.json
  1. Configure systemd service:
[Service]
LoadCredential=github-app-credentials:/var/lib/systemd/credential/github-app.cred
  1. Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { systemd-credential = { name = "github-app-credentials" } }

Production Option 6: Instance Metadata (Cloud VMs)

Pros: No credentials on disk, automatic rotation Cons: Requires IAM role setup, cloud-specific

For EC2/GCP/Azure instances with IAM roles that can access AWS Secrets Manager:

[[github_apps]]
id = "cloud-production"
credentials = "instance-metadata"

The instance profile must have permissions to access Secrets Manager (see Option 4).

Production Option 7: AWS Profile

Pros: Uses existing AWS credentials Cons: AWS-specific

Use credentials from ~/.aws/credentials:

[[github_apps]]
id = "aws-profile-app"
credentials = { aws-profile = { profile = "eka-ci-production" } }

~/.aws/credentials:

[eka-ci-production]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Part 4: Installing the GitHub App

Step 1: Install on Your Organization

  1. Go to your GitHub App's settings page

  2. Click "Install App" in the left sidebar

  3. Select your organization

  4. Choose repository access:

    • "All repositories" - eka-ci will build all repos (recommended)
    • "Only select repositories" - Choose specific repos
  5. Click "Install"

Step 2: Verify Installation

eka-ci automatically tracks installations via webhooks. Check the logs:

# Look for installation confirmation
journalctl -u eka-ci -f | grep -i "installation"

# Expected output:
# INFO eka_ci_server::github::webhook: Received installation event: created
# INFO eka_ci_server::db: Stored GitHub installation: id=12345678

Part 5: Configuring eka-ci Server

Basic Configuration

Create or edit ~/.config/ekaci/ekaci.toml:

# GitHub App configuration
[[github_apps]]
id = "main"

# Choose ONE credential source from Part 3
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }

# Permissions (optional, defaults to allow all)
[github_apps.permissions]
allow_all = true

All Credential Source Options

Quick reference for all available credential sources:

# 1. Environment Variables
[[github_apps]]
id = "env-based"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }

# 2. File (JSON or key=value)
[[github_apps]]
id = "file-based"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }

# 3. GitHub App Key File
[[github_apps]]
id = "key-file"
credentials = { github-app-key-file = {
    app_id_env = "GITHUB_APP_ID",
    key_file = "/etc/eka-ci/github-app-key.pem"
}}

# 4. HashiCorp Vault
[[github_apps]]
id = "vault"
[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
namespace = "production"  # Optional

# 5. AWS Secrets Manager
[[github_apps]]
id = "aws-sm"
[github_apps.credentials.aws-secrets-manager]
secret_name = "eka-ci/github-app"
region = "us-east-1"  # Optional

# 6. systemd Credentials
[[github_apps]]
id = "systemd"
credentials = { systemd-credential = { name = "github-app-credentials" } }

# 7. Instance Metadata
[[github_apps]]
id = "imds"
credentials = "instance-metadata"

# 8. AWS Profile
[[github_apps]]
id = "aws-profile"
credentials = { aws-profile = { profile = "eka-ci-production" } }

Permission Controls

GitHub App permissions allow you to restrict which repositories and branches can use specific GitHub App credentials.

Allow All (Default)

[[github_apps]]
id = "unrestricted-app"
credentials = { /* ... */ }

[github_apps.permissions]
allow_all = true

Repository Restrictions

Only specific repositories can use this GitHub App:

[[github_apps]]
id = "restricted-app"
credentials = { /* ... */ }

[github_apps.permissions]
allow_all = false
allowed_repos = [
    "myorg/repo1",
    "myorg/repo2",
    "anotherorg/special-repo"
]

Branch Restrictions

Restrict to specific branches or branch patterns:

[[github_apps]]
id = "production-only-app"
credentials = { /* ... */ }

[github_apps.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = [
    "main",
    "master",
    "release/*",
    "hotfix/*"
]

Glob pattern support:

  • "main" - exact match
  • "release/*" - prefix match (e.g., release/v1.0, release/v2.0)
  • "*/staging" - suffix match
  • "*" - match all

Complete Configuration Examples

Development Setup

Simple environment variable-based setup:

# ~/.config/ekaci/ekaci.toml

# No github_apps section needed - falls back to environment variables
# Set: GITHUB_APP_ID and GITHUB_APP_PRIVATE_KEY

Or explicitly:

[[github_apps]]
id = "dev"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }

[github_apps.permissions]
allow_all = true

Production with Vault

Multi-environment setup with Vault:

# Production app - restricted to production repos
[[github_apps]]
id = "production"

[github_apps.credentials.vault]
address = "https://vault.prod.example.com:8200"
secret_path = "eka-ci/github-app-prod"
token_env = "VAULT_TOKEN"
namespace = "production"

[github_apps.permissions]
allow_all = false
allowed_repos = ["company/production-*"]
allowed_branches = ["main", "release/*"]

# Staging app - restricted to staging repos
[[github_apps]]
id = "staging"

[github_apps.credentials.vault]
address = "https://vault.staging.example.com:8200"
secret_path = "eka-ci/github-app-staging"
token_env = "VAULT_TOKEN"
namespace = "staging"

[github_apps.permissions]
allow_all = false
allowed_repos = ["company/staging-*"]
allowed_branches = ["main", "develop", "feature/*"]

AWS-Based Production

Using AWS Secrets Manager with instance metadata:

[[github_apps]]
id = "aws-production"

[github_apps.credentials.aws-secrets-manager]
secret_name = "prod/eka-ci/github-app"
region = "us-east-1"

[github_apps.permissions]
allow_all = false
allowed_repos = ["mycompany/*"]
allowed_branches = ["main", "release/*"]

Part 6: Testing the Setup

Test 1: Check Server Startup

Start eka-ci and verify GitHub App registration:

./eka-ci-server

# Expected log output:
# INFO eka_ci_server::github: Registering GitHub App from configuration: main
# INFO eka_ci_server::github: Successfully registered as GitHub app

Test 2: Create a Test Pull Request

  1. Create a test branch in one of your repositories
  2. Make a trivial change and open a PR
  3. Check that eka-ci creates a check run on the PR

What to look for:

  • A check run appears on the PR (usually named "eka-ci")
  • Initial status is "Queued" or "In Progress"
  • Check the eka-ci logs for webhook receipt:
    journalctl -u eka-ci -f | grep webhook
    
    # Expected:
    # INFO eka_ci_server::github::webhook: Received pull_request event: opened
    # INFO eka_ci_server::scheduler: Queued build for PR #123
    

Test 3: Verify Webhook Delivery

On GitHub:

  1. Go to your GitHub App's settings
  2. Click "Advanced" tab
  3. Scroll to "Recent Deliveries"
  4. Verify webhooks are being delivered successfully (green checkmarks)
  5. If you see red X's, click to view the error details

Security Best Practices

Protect Your Private Key

  • DO: Store in a secret manager (Vault, AWS Secrets Manager)
  • DO: Use file permissions 600 (owner read/write only)
  • DO: Encrypt with TPM2 (systemd credentials)
  • DON'T: Commit to Git
  • DON'T: Store in Docker images
  • DON'T: Share via chat/email
  • DON'T: Log to files or stdout

Rotate Credentials Regularly

How to rotate the private key:

  1. Generate a new private key on GitHub App settings
  2. Update the key in your secret manager
  3. Restart eka-ci to load the new key
  4. Delete the old key from GitHub

Recommended rotation schedule:

  • Production: Every 90 days
  • Staging: Every 180 days
  • Development: Yearly

Use Webhook Secrets

Configure a webhook secret and verify signatures in your eka-ci deployment.

⚠️ Current Status: Webhook signature verification is planned but not yet implemented in eka-ci. You should still configure a webhook secret for future use.

To implement verification (for contributors):

#![allow(unused)]
fn main() {
// In webhook handler, verify HMAC-SHA256 signature
let signature = headers.get("X-Hub-Signature-256");
let payload = request.body();
let expected = hmac_sha256(webhook_secret, payload);
assert_eq!(signature, expected);
}

Principle of Least Privilege

Only grant the minimum required permissions:

  • ✅ Checks: Read & Write (required)
  • ✅ Contents: Read only (required)
  • ✅ Pull Requests: Read only (required)
  • Never grant write access to Contents, PRs, or Issues
  • Never grant Organization or Account permissions

Monitor and Audit

Monitor webhook delivery:

# Check for webhook failures
journalctl -u eka-ci | grep -i "webhook.*error"

# Monitor installation changes
journalctl -u eka-ci | grep -i "installation"

Audit credential access:

  • Enable Vault audit logging
  • Enable AWS CloudTrail for Secrets Manager
  • Review systemd journal for credential loads

Additional security practices:

  • Never commit credentials to version control
  • Use .gitignore for credential files
  • Always use secret management systems in production
  • Create separate GitHub Apps for different environments
  • Use permission restrictions to limit blast radius
  • Enable audit logging in Vault/AWS
  • Monitor who accesses GitHub App credentials
  • Review permission configurations regularly
  • Use instance metadata in cloud deployments to avoid storing long-lived credentials

Network Security

Webhook endpoint security:

  • ✅ Use HTTPS (required for production)
  • ✅ Use a valid SSL certificate
  • ✅ Configure firewall to allow GitHub IPs only (optional)
  • ✅ Use webhook secrets when implemented

GitHub IP ranges (for firewall rules):

# Download GitHub's IP ranges
curl https://api.github.com/meta | jq -r '.hooks[]'

# Example firewall rule (iptables)
iptables -A INPUT -p tcp --dport 443 -s 192.30.252.0/22 -j ACCEPT

Secure the Server

Server hardening checklist:

  • ✅ Run eka-ci as non-root user
  • ✅ Use systemd sandboxing features
  • ✅ Enable SELinux or AppArmor
  • ✅ Keep dependencies updated
  • ✅ Enable automatic security updates
  • ✅ Monitor logs for suspicious activity

Troubleshooting

GitHub App Registration Fails

Error: failed to locate $GITHUB_APP_ID

Cause: Credentials not properly configured

Solution:

  1. Check your configuration file syntax
  2. Verify the credentials file exists and has correct permissions
  3. For Vault/AWS, verify connectivity and permissions
  4. Check eka-ci logs for detailed error messages

If using the configuration file, ensure you have:

[[github_apps]]
id = "..."
credentials = { /* valid credential source */ }

Webhooks Not Received

Symptoms: PRs don't trigger builds

Debugging steps:

  1. Verify webhook URL is correct:

    curl https://your-eka-ci-server.com/github/webhook
    # Should return 405 Method Not Allowed (GET not supported)
    
  2. Check GitHub webhook deliveries:

    • Go to GitHub App settings → Advanced → Recent Deliveries
    • Look for failed deliveries (red X)
    • Click to see error details
  3. Common webhook errors:

    • SSL certificate error: Fix your SSL cert or disable verification (dev only)
    • Timeout: Server is slow or down
    • Connection refused: Firewall blocking GitHub IPs
  4. Check eka-ci logs:

    journalctl -u eka-ci -n 100 | grep webhook
    

Permission Denied Errors

Error: Repository myorg/myrepo is not allowed to use GitHub App production

Solution: Update permissions in config:

[github_apps.permissions]
allow_all = false
allowed_repos = ["myorg/myrepo", "myorg/*"]

Or if the issue is GitHub App permissions:

Cause: GitHub App doesn't have required permissions

Solution:

  1. Go to GitHub App settings → Permissions
  2. Verify:
    • Checks: Read & Write
    • Contents: Read
    • Pull Requests: Read
  3. If you changed permissions, you must reinstall the app:
    • Go to Installations
    • Click "Configure"
    • Accept new permissions

Check Runs Not Appearing

Symptoms: Webhook received but no check run created

Debugging:

  1. Check logs for errors:

    journalctl -u eka-ci -f | grep -E "(check|error)"
    
  2. Verify repository is configured:

    • Repository must have .eka-ci/config.json
    • Configuration must define jobs
  3. Check GitHub API rate limits:

    # The eka-ci server logs should show rate limit status
    journalctl -u eka-ci | grep "rate limit"
    

Private Key Format Errors

Error: "invalid value for $GITHUB_APP_PRIVATE_KEY"

Cause: Private key not properly formatted for storage

Solution:

For JSON files, escape newlines:

# Convert PEM to JSON-safe format
PRIVATE_KEY=$(cat key.pem | sed 's/$/\\n/' | tr -d '\n')
echo "{\"GITHUB_APP_PRIVATE_KEY\":\"${PRIVATE_KEY}\"}"

For environment variables, preserve newlines:

# Use actual newlines
export GITHUB_APP_PRIVATE_KEY="$(cat key.pem)"

Vault Connection Fails

Error: Failed to read secret from Vault path

Checklist:

  • Verify Vault address is correct
  • Check VAULT_TOKEN environment variable is set
  • Verify secret path exists: vault kv get secret/eka-ci/github-app
  • Check Vault namespace if using enterprise Vault
  • Ensure Vault token has read permissions

AWS Secrets Manager Fails

Error: Failed to retrieve secret from AWS Secrets Manager

Checklist:

  • Verify AWS credentials are configured (env vars or instance profile)
  • Check secret name is correct
  • Verify region setting
  • Ensure IAM permissions include secretsmanager:GetSecretValue
  • Check secret is in JSON format with correct keys

Multiple GitHub Apps Not Supported

Current Status: eka-ci uses the first configured GitHub App for all repositories.

Workaround: Use permission restrictions to limit apps to specific repos:

[[github_apps]]
id = "prod"
credentials = { /* ... */ }

[github_apps.permissions]
allowed_repos = ["myorg/prod-*"]

[[github_apps]]
id = "dev"
credentials = { /* ... */ }

[github_apps.permissions]
allowed_repos = ["myorg/dev-*"]

⚠️ Note: Only the first app will be used currently. Multi-app support is planned.


Advanced Topics

Approval Workflow Integration

eka-ci supports requiring approval before running builds (to prevent malicious PRs from external contributors):

  1. Enable approval requirement:

    ./eka-ci-server --require-approval
    
  2. Or in systemd:

    [Service]
    Environment="EKA_CI_REQUIRE_APPROVAL=true"
    
  3. Approve users via the web UI or API

Merge Queue Support

eka-ci supports GitHub's merge queue feature:

  1. Enable merge queue on your repository (Settings → General → Merge queue)
  2. eka-ci will automatically receive merge_group events
  3. Builds will run for merge queue entries

OAuth Integration (Optional)

eka-ci also supports OAuth for web UI authentication:

[oauth]
client_id = "your-oauth-app-client-id"
client_secret = "your-oauth-app-client-secret"
redirect_url = "https://your-eka-ci-server.com/github/auth/callback"

Note: This is separate from the GitHub App and is optional.

Migration from Environment Variables

From Environment Variables to Vault

Before:

export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
./eka-ci-server

After:

  1. Store credentials in Vault:
vault kv put secret/eka-ci/github-app \
  GITHUB_APP_ID="123456" \
  GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
  1. Update configuration:
[[github_apps]]
id = "main"

[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
  1. Start server with Vault token:
export VAULT_TOKEN=your-vault-token
./eka-ci-server

API Reference

CredentialSource Enum

All available credential source variants:

#![allow(unused)]
fn main() {
pub enum CredentialSource {
    // Environment variables
    Env { vars: Vec<String> },

    // File-based (JSON or key=value)
    File { path: PathBuf },

    // AWS profile from ~/.aws/credentials
    AwsProfile { profile: String },

    // Cachix token (for backward compatibility)
    CachixToken { env_var: String },

    // HashiCorp Vault
    Vault {
        address: String,
        secret_path: String,
        token_env: String,
        namespace: Option<String>,
    },

    // AWS Secrets Manager
    AwsSecretsManager {
        secret_name: String,
        region: Option<String>,
    },

    // systemd credentials
    SystemdCredential { name: String },

    // Instance metadata service
    InstanceMetadata,

    // GitHub App key file
    GitHubAppKeyFile {
        app_id_env: String,
        key_file: PathBuf,
    },

    // No authentication
    None,
}
}

GitHubAppConfig Structure

#![allow(unused)]
fn main() {
pub struct GitHubAppConfig {
    pub id: String,
    pub credentials: CredentialSource,
    pub permissions: GitHubAppPermissions,
}

pub struct GitHubAppPermissions {
    pub allow_all: bool,
    pub allowed_repos: Vec<String>,
    pub allowed_branches: Vec<String>,
}
}

FAQ

Q: Can I use the same GitHub App for multiple eka-ci servers? A: Not recommended. Each eka-ci instance should have its own GitHub App to avoid conflicts.

Q: What happens if my private key is compromised? A: Immediately revoke it on GitHub App settings and generate a new one. Update your secret manager and restart eka-ci.

Q: Can I use a GitHub Personal Access Token instead? A: No, eka-ci requires a GitHub App. Personal access tokens don't support the required webhooks and permissions model.

Q: Do I need a separate GitHub App for each repository? A: No, one GitHub App can be installed on multiple repositories in the same organization.

Q: How do I migrate from environment variables to Vault? A: See Migration from Environment Variables.

Q: Is webhook signature verification implemented? A: Not yet. It's mentioned in the architecture but not implemented. You should still configure a webhook secret for future use.

Q: Which credential source should I use for production? A: HashiCorp Vault is recommended for best security. AWS Secrets Manager is good for AWS deployments. systemd credentials are excellent for single-server deployments with TPM2.

Q: Can I configure multiple GitHub Apps? A: Yes, you can configure multiple apps in the config file, but currently only the first one will be used. Use permission controls to route different repos to different apps (with the caveat that only the first app is active).



Contributing

If you encounter issues with this setup process:

  1. Check existing issues: https://github.com/ekala-project/eka-ci/issues
  2. Report bugs with detailed logs and configuration (redact secrets!)
  3. Contribute improvements to this documentation

Last Updated: 2024-04-10 eka-ci Version: Latest

NixOS Module

The eka-ci flake provides a NixOS module at nixosModules.daemon that exposes the service under services.eka-ci. The module uses the RFC-42 "settings" pattern: most configuration is freeform TOML that gets serialized to ekaci.toml, with common fields typed for validation and auto-generated documentation.

Quick Start

{
  inputs.eka-ci.url = "github:ekala-project/eka-ci";

  outputs = { self, nixpkgs, eka-ci, ... }: {
    nixosConfigurations.example = nixpkgs.lib.nixosSystem {
      modules = [
        eka-ci.nixosModules.daemon
        {
          services.eka-ci = {
            enable = true;
            environmentFile = "/run/secrets/eka-ci.env";
            settings = {
              github_apps = [{
                id = "main";
                credentials.systemd-credential.name = "github-app-key";
              }];
              security.allow_insecure_webhooks = false;
            };
          };
        }
      ];
    };
  };
}

The service runs as a systemd DynamicUser by default, stores state under /var/lib/eka-ci, and listens on 127.0.0.1:3030.

Top-Level Options

services.eka-ci.enable

Type: boolean Default: false

Enable the EkaCI server.

services.eka-ci.package

Type: package Default: pkgs.eka-ci

Package providing the eka_ci_server binary.

services.eka-ci.user / services.eka-ci.group

Type: string Default: "eka-ci"

User and group the service runs as when dynamicUser = false. Ignored when dynamicUser = true.

services.eka-ci.dynamicUser

Type: boolean Default: true

Use systemd's DynamicUser= to run the service under an ephemeral user/group. Recommended unless you need a stable UID for filesystem permissions on shared storage.

services.eka-ci.openFirewall

Type: boolean Default: false

Open settings.web.port in the system firewall.

services.eka-ci.environmentFile

Type: null or path Default: null

Path to a file passed to systemd as EnvironmentFile=. Use this to provide secrets such as WEBHOOK_SECRET, GITHUB_OAUTH_CLIENT_SECRET, JWT_SECRET, VAULT_TOKEN, GITEA_TOKEN, GITLAB_TOKEN, AWS keys, and any environment variables referenced from settings.caches.*.credentials.env.vars.

The file is read by systemd at start time and never enters the Nix store.

Example (/run/secrets/eka-ci.env):

WEBHOOK_SECRET=your-webhook-secret
GITHUB_OAUTH_CLIENT_SECRET=...
JWT_SECRET=...
VAULT_TOKEN=s.abc123...
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
# For Gitea integration (single instance)
GITEA_TOKEN=your-gitea-token
GITEA_DOMAIN=gitea.example.com
# For GitLab integration (single instance)
GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxx
GITLAB_DOMAIN=gitlab.com

services.eka-ci.credentials

Type: attribute set of path Default: {}

Map of credential name to file path, wired through systemd's LoadCredential=. Each entry becomes available inside the unit at $CREDENTIALS_DIRECTORY/<name> and can be referenced from ekaci.toml via the systemd-credential credential source.

Example:

services.eka-ci = {
  credentials.github-app-key = "/run/secrets/github-app.json";
  settings.github_apps = [{
    id = "main";
    credentials.systemd-credential.name = "github-app-key";
  }];
};

Pairs naturally with sops-nix, agenix, or systemd-creds.

services.eka-ci.extraEnvironment

Type: attribute set of string Default: {}

Additional Environment= entries passed to the systemd unit.

Example:

services.eka-ci.extraEnvironment = {
  RUST_LOG = "eka_ci_server::scheduler=debug,info";
};

Settings Options

The services.eka-ci.settings submodule is freeform: any key not explicitly listed below is still accepted and serialized to ekaci.toml as-is. Typed options provide validation and documentation for the most common fields.

settings.db_path

Type: null or path Default: null

SQLite database path. When null the server falls back to $XDG_DATA_HOME/ekaci/sqlite.db, which under this module resolves to /var/lib/eka-ci/ekaci/sqlite.db.

settings.logs_dir

Type: null or path Default: null

Directory where build logs are stored. When null the server falls back to $XDG_DATA_HOME/ekaci/build-logs.

settings.require_approval

Type: boolean Default: false

Require maintainer approval before building PRs from external contributors.

settings.merge_queue_require_approval

Type: boolean Default: false

Require approval before building entries pulled from the GitHub merge queue.

settings.build_no_output_timeout_seconds

Type: integer between 30 and 86400 Default: 1200

Number of seconds with no build output after which a build is considered hung.

settings.build_max_duration_seconds

Type: integer between 60 and 604800 Default: 14400

Hard upper bound, in seconds, on total build wall-clock time.

settings.graph_lru_capacity

Type: positive integer Default: 100000

Capacity of the in-memory derivation-graph LRU cache, in nodes. See LRU Cache Tuning for sizing guidance.

settings.default_merge_method

Type: one of "merge", "squash", "rebase" Default: "squash"

Default merge method used by the @eka-ci merge PR comment command.

settings.web

Type: submodule

HTTP server settings.

  • web.address (string, default "127.0.0.1"): IPv4 address the HTTP server binds to.
  • web.port (port, default 3030): TCP port the HTTP server binds to.
  • web.bundle_path (null or path, default null): Optional path to a pre-built web UI bundle.
  • web.allowed_origins (list of string, default []): CORS allow-list. Each entry must be a fully-qualified http:// or https:// origin with no path, query, fragment, or * wildcard. An empty list rejects all cross-origin requests.

settings.unix

Type: submodule

Unix domain socket settings used by the CLI client.

  • unix.socket_path (null or path, default null): Unix domain socket the CLI client connects to. When null the server falls back to $XDG_RUNTIME_DIR/ekaci.socket, which under this module resolves to /run/eka-ci/ekaci.socket.

settings.oauth

Type: submodule

OAuth settings for the (optional) web UI.

  • oauth.client_id (null or string, default null): GitHub OAuth client ID. May also be supplied via the GITHUB_OAUTH_CLIENT_ID environment variable (preferred — see environmentFile).
  • oauth.client_secret (null or string, default null): GitHub OAuth client secret. Avoid setting this in Nix — values here end up in the world-readable Nix store. Use environmentFile to supply GITHUB_OAUTH_CLIENT_SECRET instead.
  • oauth.redirect_url (null or string, default null): OAuth callback URL. Defaults to http://{web.address}:{web.port}/github/auth/callback when unset.
  • oauth.jwt_secret (null or string, default null): JWT signing secret. Avoid setting this in Nix. Provide JWT_SECRET via environmentFile. When omitted entirely, the server generates an ephemeral 256-bit secret on each start (sessions invalidate across restarts).

settings.security

Type: submodule

Security-related settings.

  • security.max_hook_timeout_seconds (integer between 1 and 86400, default 300): Maximum wall-clock time, in seconds, that any post-build hook is allowed to run.
  • security.audit_hooks (boolean, default true): Emit structured audit log records every time a hook runs.
  • security.webhook_secret (null or string, default null): Webhook HMAC secret used for all platforms (GitHub, GitLab, and Gitea). Avoid setting this in Nix. Provide WEBHOOK_SECRET via environmentFile. The server refuses to start if no webhook secret is available unless allow_insecure_webhooks is true.
  • security.allow_insecure_webhooks (boolean, default false): Allow the server to start without a webhook secret. Intended for local development only; never enable in production.
  • security.allow_private_cache_hosts (boolean, default false): Allow cache destinations whose DNS resolves to private/loopback addresses. Disables built-in SSRF protection; only enable in trusted, isolated networks.

settings.caches

Type: list of submodule Default: []

List of binary caches the server may push to.

Each cache entry has the following fields:

  • id (string, required): Cache identifier referenced from .eka-ci/config.json.
  • cache_type (one of "nix-copy", "cachix", "attic", required): Backend type for this cache.
  • destination (string, required): Destination URL passed to the chosen backend. Validated for SSRF unless settings.security.allow_private_cache_hosts is set.
  • credentials (freeform, required): Credential source. See Credential Sources below.
  • permissions (submodule, default allows all): Repository/branch access control.
    • allow_all (boolean, default true): When true, ignores allowed_repos and allowed_branches and grants access to every repository and branch.
    • allowed_repos (list of string, default []): Glob patterns of owner/repo strings that are permitted to use this entry.
    • allowed_branches (list of string, default []): Glob patterns of branch names permitted to use this entry.

Example:

settings.caches = [{
  id = "production-s3";
  cache_type = "nix-copy";
  destination = "s3://my-bucket/nix-cache?region=us-east-1";
  credentials.env.vars = [ "AWS_ACCESS_KEY_ID" "AWS_SECRET_ACCESS_KEY" ];
  permissions = {
    allow_all = false;
    allowed_repos = [ "myorg/*" ];
    allowed_branches = [ "main" "release/*" ];
  };
}];

settings.github_apps

Type: list of submodule Default: []

List of GitHub Apps the server authenticates as.

Each GitHub App entry has the following fields:

  • id (string, required): GitHub App identifier.
  • credentials (freeform, required): Credential source. See Credential Sources below.
  • permissions (submodule, default allows all): Same structure as settings.caches.*.permissions.

Example:

settings.github_apps = [{
  id = "main";
  credentials.file.path = "/run/secrets/github-app.json";
  permissions = {
    allow_all = false;
    allowed_repos = [ "myorg/*" ];
  };
}];

settings.gitea_instances

Type: list of submodule Default: []

List of Gitea instances the server integrates with. Each instance requires a domain and access token. Supports both Gitea.com and self-hosted instances.

Each Gitea instance entry has the following fields:

  • domain (string, required): Gitea instance domain (without protocol), e.g., "gitea.example.com".
  • token (null or string, default null): Gitea access token. Avoid setting this in Nix — use environmentFile to supply GITEA_TOKEN instead (for single instance setups).

Example:

settings.gitea_instances = [
  {
    domain = "gitea.example.com";
    token = null;  # Provided via environmentFile
  }
  {
    domain = "code.company.net";
    token = null;  # Provided via environmentFile
  }
];

For single-instance setups, you can use environment variables:

# In environmentFile
GITEA_TOKEN=your-gitea-access-token
GITEA_DOMAIN=gitea.example.com

settings.gitlab_instances

Type: list of submodule Default: []

List of GitLab instances the server integrates with. Each instance requires a domain and project access token. Supports both GitLab.com and self-hosted instances.

Each GitLab instance entry has the following fields:

  • domain (string, required): GitLab instance domain (without protocol), e.g., "gitlab.com" or "gitlab.example.com".
  • token (null or string, default null): GitLab project access token (starts with glpat-). Avoid setting this in Nix — use environmentFile to supply GITLAB_TOKEN instead (for single instance setups).

Example:

settings.gitlab_instances = [
  {
    domain = "gitlab.com";
    token = null;  # Provided via environmentFile
  }
  {
    domain = "gitlab.enterprise.com";
    token = null;  # Provided via environmentFile
  }
];

For single-instance setups, you can use environment variables:

# In environmentFile
GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxx
GITLAB_DOMAIN=gitlab.com

Credential Sources

Both settings.caches.*.credentials and settings.github_apps.*.credentials accept one of ten credential source variants. The field is freeform (not exhaustively typed) so all variants serialize correctly. Choose the one that matches your secret-management setup:

1. Environment variables

credentials.env.vars = [ "AWS_ACCESS_KEY_ID" "AWS_SECRET_ACCESS_KEY" ];

The server reads the listed environment variables at runtime. Provide them via environmentFile.

2. File

credentials.file.path = "/etc/eka-ci/creds.json";

The server reads a JSON or KEY=VALUE file at the given path.

3. AWS profile

credentials.aws-profile.profile = "production";

The server reads credentials from ~/.aws/credentials using the named profile.

4. Cachix token

credentials.cachix-token.env_var = "CACHIX_AUTH_TOKEN";

The server reads a Cachix auth token from the named environment variable.

5. HashiCorp Vault

credentials.vault = {
  address = "https://vault.example.com:8200";
  secret_path = "secret/data/eka-ci/s3-cache";
  token_env = "VAULT_TOKEN";  # optional, defaults to "VAULT_TOKEN"
  namespace = "production";    # optional
};

The server authenticates to Vault using the token from token_env and reads the secret at secret_path.

6. AWS Secrets Manager

credentials.aws-secrets-manager = {
  secret_name = "eka-ci/s3-credentials";
  region = "us-east-1";  # optional, falls back to AWS_REGION env var
};

The server uses AWS SDK credential resolution (environment, instance metadata, profiles) to authenticate to AWS Secrets Manager and reads the named secret.

7. systemd credential

credentials.systemd-credential.name = "github-app-key";

The server reads the credential from $CREDENTIALS_DIRECTORY/<name>. Pair this with the top-level services.eka-ci.credentials option:

services.eka-ci.credentials.github-app-key = "/run/secrets/github-app.json";

8. Instance metadata

credentials = "instance-metadata";

The server retrieves credentials from EC2/GCP/Azure instance metadata. No configuration needed.

9. GitHub App key file

credentials.github-app-key-file = {
  app_id_env = "GITHUB_APP_ID";
  key_file = "/etc/eka-ci/github-app.pem";
};

The server reads the GitHub App ID from the named environment variable and the PEM-encoded private key from the file.

10. None

credentials = "none";

No authentication. Only valid for public caches.

Systemd Hardening

The module applies aggressive systemd hardening by default:

  • DynamicUser = true (ephemeral user/group)
  • ProtectSystem = "strict" (read-only /usr, /boot, /efi)
  • ProtectHome = true (no access to /home, /root)
  • PrivateTmp = true (isolated /tmp)
  • PrivateDevices = true (empty /dev)
  • NoNewPrivileges = true (no privilege escalation)
  • ProtectKernelModules/Tunables/Logs = true
  • ProtectControlGroups/Clock/Hostname = true
  • RestrictNamespaces/Realtime/SUIDSGID = true
  • RestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ]
  • LockPersonality = true
  • MemoryDenyWriteExecute = true
  • SystemCallArchitectures = "native"
  • SystemCallFilter = [ "@system-service" "~@privileged" "~@resources" ]
  • Empty CapabilityBoundingSet and AmbientCapabilities
  • UMask = "0077"

If you need to relax any of these, override systemd.services.eka-ci.serviceConfig in your configuration.

Complete Example

{ config, ... }:
{
  services.eka-ci = {
    enable = true;
    openFirewall = false;  # Behind a reverse proxy

    environmentFile = config.sops.secrets.eka-ci-env.path;

    credentials = {
      github-app-key = config.sops.secrets.github-app-json.path;
      s3-creds       = config.sops.secrets.s3-json.path;
    };

    extraEnvironment.RUST_LOG = "info";

    settings = {
      web = {
        address = "127.0.0.1";
        port = 3030;
        allowed_origins = [ "https://ci.example.com" ];
      };

      graph_lru_capacity = 200000;  # Large repo
      default_merge_method = "squash";

      security = {
        audit_hooks = true;
        allow_insecure_webhooks = false;
      };

      github_apps = [{
        id = "main";
        credentials.systemd-credential.name = "github-app-key";
        permissions = {
          allow_all = false;
          allowed_repos = [ "myorg/*" ];
        };
      }];

      gitea_instances = [{
        domain = "gitea.example.com";
        token = null;  # Provided via environmentFile
      }];

      gitlab_instances = [{
        domain = "gitlab.com";
        token = null;  # Provided via environmentFile
      }];

      caches = [
        {
          id = "s3-production";
          cache_type = "nix-copy";
          destination = "s3://my-bucket/nix-cache?region=us-east-1";
          credentials.systemd-credential.name = "s3-creds";
          permissions = {
            allow_all = false;
            allowed_repos = [ "myorg/production-*" ];
            allowed_branches = [ "main" ];
          };
        }
        {
          id = "cachix-public";
          cache_type = "cachix";
          destination = "myorg";
          credentials.cachix-token.env_var = "CACHIX_AUTH_TOKEN";
        }
      ];
    };
  };

  # Reverse proxy
  services.nginx.virtualHosts."ci.example.com" = {
    enableACME = true;
    forceSSL = true;
    locations."/" = {
      proxyPass = "http://127.0.0.1:3030";
      proxyWebsockets = true;
    };
  };
}

See Also

NixOS Module Reference

This page is auto-generated from the NixOS module options schema. For a user-friendly guide, see NixOS Module.

services.eka-ci.enable

Whether to enable EkaCI, a Nix-aware Continuous Integration server.

Type: boolean

Default: false

Example: true

services.eka-ci.package

The eka-ci package to use.

Type: package

Default: pkgs.eka-ci

services.eka-ci.credentials

Map of credential name to file path, wired through systemd’s LoadCredential=. Each entry becomes available inside the unit at $CREDENTIALS_DIRECTORY/<name> and can be referenced from ekaci.toml via the systemd-credential credential source, e.g.

services.eka-ci.settings.github_apps = [
  {
    id = "main";
    credentials.systemd-credential.name = "github-app-key";
  }
];

Type: attribute set of absolute path

Default: { }

Example:

{
  github-app-key = "/run/secrets/github-app.json";
  s3-creds       = "/run/secrets/s3.json";
}

services.eka-ci.dynamicUser

Use systemd’s DynamicUser= to run the service under an ephemeral user/group. Recommended unless you need a stable UID for filesystem permissions on shared storage.

Type: boolean

Default: true

services.eka-ci.environmentFile

Path to a file passed to systemd as EnvironmentFile=. Use this to provide secrets such as GITHUB_WEBHOOK_SECRET, GITHUB_OAUTH_CLIENT_SECRET, JWT_SECRET, VAULT_TOKEN, AWS keys, and any environment variables referenced from settings.caches.*.credentials.env.vars. The file is read by systemd at start time and never enters the Nix store.

Type: null or absolute path

Default: null

Example: "/run/secrets/eka-ci.env"

services.eka-ci.extraEnvironment

Additional Environment= entries passed to the systemd unit.

Type: attribute set of string

Default: { }

Example:

{
  RUST_LOG = "eka_ci_server=debug,info";
}

services.eka-ci.group

Group the service runs as when dynamicUser is false. Ignored when dynamicUser = true.

Type: string

Default: "eka-ci"

services.eka-ci.openFirewall

Open settings.web.port in the system firewall.

Type: boolean

Default: false

services.eka-ci.settings

Configuration for EkaCI, serialised verbatim to ekaci.toml. The submodule is freeform: any key not explicitly modelled here is still accepted and forwarded as-is to the TOML output.

Type: open submodule of (TOML value)

Default: { }

services.eka-ci.settings.build_max_duration_seconds

Hard upper bound, in seconds, on total build wall-clock time.

Type: integer between 60 and 604800 (both inclusive)

Default: 14400

services.eka-ci.settings.build_no_output_timeout_seconds

Number of seconds with no build output after which a build is considered hung.

Type: integer between 30 and 86400 (both inclusive)

Default: 1200

services.eka-ci.settings.caches

List of binary caches the server may push to.

Type: list of (open submodule of (TOML value))

Default: [ ]

services.eka-ci.settings.caches.*.cache_type

Backend type for this cache.

Type: one of “nix-copy”, “cachix”, “attic”

Example: "nix-copy"

services.eka-ci.settings.caches.*.credentials

Credential source. One of:

  • { env = { vars = [ ... ]; }; }
  • { file = { path = "/etc/..."; }; }
  • { aws-profile = { profile = "..."; }; }
  • { cachix-token = { env_var = "..."; }; }
  • { vault = { address; secret_path; token_env ? "VAULT_TOKEN"; namespace ? null; }; }
  • { aws-secrets-manager = { secret_name; region ? null; }; }
  • { systemd-credential = { name = "..."; }; }
  • "instance-metadata"
  • { github-app-key-file = { app_id_env; key_file; }; }
  • "none"

Prefer systemd-credential paired with the top-level services.eka-ci.credentials option to keep secrets out of the world-readable Nix store.

Type: TOML value

Example:

{
  env = {
    vars = [
      "AWS_ACCESS_KEY_ID"
      "AWS_SECRET_ACCESS_KEY"
    ];
  };
}

services.eka-ci.settings.caches.*.destination

Destination URL passed to the chosen backend. Validated for SSRF unless settings.security.allow_private_cache_hosts is set.

Type: string

Example: "s3://my-bucket/nix-cache?region=us-east-1"

services.eka-ci.settings.caches.*.id

Cache identifier referenced from .eka-ci/config.json.

Type: string

Example: "production-s3"

services.eka-ci.settings.caches.*.permissions

Repository/branch access control for this cache.

Type: submodule

Default: { }

services.eka-ci.settings.caches.*.permissions.allow_all

When true, ignores allowed_repos and allowed_branches and grants access to every repository and branch.

Type: boolean

Default: true

services.eka-ci.settings.caches.*.permissions.allowed_branches

Glob patterns of branch names permitted to use this entry. Ignored when allow_all is true.

Type: list of string

Default: [ ]

Example:

[
  "main"
  "release/*"
]

services.eka-ci.settings.caches.*.permissions.allowed_repos

Glob patterns of owner/repo strings that are permitted to use this entry. Ignored when allow_all is true.

Type: list of string

Default: [ ]

Example:

[
  "myorg/*"
]

services.eka-ci.settings.db_path

SQLite database path. When null the server falls back to $XDG_DATA_HOME/ekaci/sqlite.db which, under this module, resolves to /var/lib/eka-ci/ekaci/sqlite.db.

Type: null or absolute path

Default: null

services.eka-ci.settings.default_merge_method

Default merge method used by the @eka-ci merge PR comment command.

Type: one of “merge”, “squash”, “rebase”

Default: "squash"

services.eka-ci.settings.gitea_instances

List of Gitea instances the server integrates with. Each instance requires a domain and access token. Supports both Gitea.com and self-hosted instances.

Type: list of (open submodule of (TOML value))

Default: [ ]

services.eka-ci.settings.gitea_instances.*.domain

Gitea instance domain (without protocol).

Type: string

Example: "gitea.example.com"

services.eka-ci.settings.gitea_instances.*.token

Gitea access token. Avoid setting this in Nix — values here end up in the world-readable Nix store. Use services.eka-ci.environmentFile to supply GITEA_TOKEN instead (for single instance) or configure tokens via systemd credentials.

Type: null or string

Default: null

services.eka-ci.settings.github_apps

List of GitHub Apps the server authenticates as.

Type: list of (open submodule of (TOML value))

Default: [ ]

services.eka-ci.settings.github_apps.*.credentials

Credential source. Same shape as services.eka-ci.settings.caches.*.credentials.

Type: TOML value

Example:

{
  file = {
    path = "/etc/eka-ci/github-app.json";
  };
}

services.eka-ci.settings.github_apps.*.id

GitHub App identifier referenced from per-app permission lookups.

Type: string

Example: "main"

services.eka-ci.settings.github_apps.*.permissions

Repository/branch access control for this GitHub App.

Type: submodule

Default: { }

services.eka-ci.settings.github_apps.*.permissions.allow_all

When true, ignores allowed_repos and allowed_branches and grants access to every repository and branch.

Type: boolean

Default: true

services.eka-ci.settings.github_apps.*.permissions.allowed_branches

Glob patterns of branch names permitted to use this entry. Ignored when allow_all is true.

Type: list of string

Default: [ ]

Example:

[
  "main"
  "release/*"
]

services.eka-ci.settings.github_apps.*.permissions.allowed_repos

Glob patterns of owner/repo strings that are permitted to use this entry. Ignored when allow_all is true.

Type: list of string

Default: [ ]

Example:

[
  "myorg/*"
]

services.eka-ci.settings.gitlab_instances

List of GitLab instances the server integrates with. Each instance requires a domain and project access token. Supports both GitLab.com and self-hosted instances.

Type: list of (open submodule of (TOML value))

Default: [ ]

services.eka-ci.settings.gitlab_instances.*.domain

GitLab instance domain (without protocol).

Type: string

Example: "gitlab.com"

services.eka-ci.settings.gitlab_instances.*.token

GitLab project access token. Avoid setting this in Nix — values here end up in the world-readable Nix store. Use services.eka-ci.environmentFile to supply GITLAB_TOKEN instead (for single instance) or configure tokens via systemd credentials.

Type: null or string

Default: null

services.eka-ci.settings.graph_lru_capacity

Capacity of the in-memory derivation-graph LRU cache, in nodes. See docs/lru-cache-tuning.md for sizing guidance.

Type: positive integer, meaning >0

Default: 100000

services.eka-ci.settings.logs_dir

Directory where build logs are stored. When null the server falls back to $XDG_DATA_HOME/ekaci/build-logs.

Type: null or absolute path

Default: null

services.eka-ci.settings.merge_queue_require_approval

Require approval before building entries pulled from the GitHub merge queue.

Type: boolean

Default: false

services.eka-ci.settings.oauth

OAuth settings for the (optional) web UI.

Type: open submodule of (TOML value)

Default: { }

services.eka-ci.settings.oauth.client_id

GitHub OAuth client ID. May also be supplied via the GITHUB_OAUTH_CLIENT_ID environment variable (preferred — see services.eka-ci.environmentFile).

Type: null or string

Default: null

services.eka-ci.settings.oauth.client_secret

GitHub OAuth client secret. Avoid setting this in Nix — values here end up in the world-readable Nix store. Use services.eka-ci.environmentFile to supply GITHUB_OAUTH_CLIENT_SECRET instead.

Type: null or string

Default: null

services.eka-ci.settings.oauth.jwt_secret

JWT signing secret. Avoid setting this in Nix. Provide JWT_SECRET via services.eka-ci.environmentFile. When omitted entirely, the server generates an ephemeral 256-bit secret on each start (sessions invalidate across restarts).

Type: null or string

Default: null

services.eka-ci.settings.oauth.redirect_url

OAuth callback URL. Defaults to http://{web.address}:{web.port}/github/auth/callback when unset.

Type: null or string

Default: null

services.eka-ci.settings.require_approval

Require maintainer approval before building PRs from external contributors.

Type: boolean

Default: false

services.eka-ci.settings.security

Security-related settings.

Type: open submodule of (TOML value)

Default: { }

services.eka-ci.settings.security.allow_insecure_webhooks

Allow the server to start without a webhook secret. Intended for local development only; never enable in production.

Type: boolean

Default: false

services.eka-ci.settings.security.allow_private_cache_hosts

Allow cache destinations whose DNS resolves to private/loopback addresses. Disables built-in SSRF protection; only enable in trusted, isolated networks.

Type: boolean

Default: false

services.eka-ci.settings.security.audit_hooks

Emit structured audit log records every time a hook runs.

Type: boolean

Default: true

services.eka-ci.settings.security.max_hook_timeout_seconds

Maximum wall-clock time, in seconds, that any post-build hook is allowed to run.

Type: integer between 1 and 86400 (both inclusive)

Default: 300

services.eka-ci.settings.security.webhook_secret

GitHub webhook HMAC secret. Avoid setting this in Nix. Provide GITHUB_WEBHOOK_SECRET via services.eka-ci.environmentFile.

The server refuses to start if no webhook secret is available unless allow_insecure_webhooks is true.

Type: null or string

Default: null

services.eka-ci.settings.unix

Unix-domain-socket settings used by the CLI client.

Type: open submodule of (TOML value)

Default: { }

services.eka-ci.settings.unix.socket_path

Unix domain socket the CLI client connects to. When null the server falls back to $XDG_RUNTIME_DIR/ekaci.socket, which under this module resolves to /run/eka-ci/ekaci.socket.

Type: null or absolute path

Default: null

services.eka-ci.settings.web

HTTP server settings.

Type: open submodule of (TOML value)

Default: { }

services.eka-ci.settings.web.address

IPv4 address the HTTP server binds to.

Type: string

Default: "127.0.0.1"

services.eka-ci.settings.web.allowed_origins

CORS allow-list. Each entry must be a fully-qualified http:// or https:// origin with no path, query, fragment, or * wildcard. An empty list rejects all cross-origin requests.

Type: list of string

Default: [ ]

Example:

[
  "https://app.example.com"
]

services.eka-ci.settings.web.bundle_path

Optional path to a pre-built web UI bundle.

Type: null or absolute path

Default: null

services.eka-ci.settings.web.port

TCP port the HTTP server binds to.

Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)

Default: 3030

services.eka-ci.user

User the service runs as when dynamicUser is false. Ignored when dynamicUser = true.

Type: string

Default: "eka-ci"

Server Configuration

The server is configured via a single TOML file, by default at ~/.config/ekaci/ekaci.toml. This page covers the most common settings; for credential sources see GitHub App Setup and Configuring Caches.

Minimal example

[[github_apps]]
id = "main"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }

Full example

# Web server
[web]
address = "127.0.0.1"
port = 3030

# State paths
db_path  = "/var/lib/ekaci/sqlite.db"
logs_dir = "/var/log/ekaci"

# Build behaviour
build_no_output_timeout_seconds = 1200   # 20 minutes
graph_lru_capacity              = 100000 # see lru-cache-tuning.md
require_approval                = false  # require approval for external PRs

# OAuth (optional, for the web UI)
[oauth]
client_id     = "github-oauth-client-id"
client_secret = "github-oauth-client-secret"
redirect_url  = "https://your-server.com/github/auth/callback"
jwt_secret    = "your-jwt-secret"

# Security
[security]
max_hook_timeout_seconds = 300
audit_hooks              = true

# GitHub App credentials
[[github_apps]]
id = "production"
credentials = { vault = {
    address     = "https://vault.example.com:8200",
    secret_path = "eka-ci/github-app",
    token_env   = "VAULT_TOKEN"
} }

[github_apps.permissions]
allow_all     = false
allowed_repos = ["myorg/*"]

# Binary caches
[[caches]]
id           = "s3-cache"
cache_type   = "nix-copy"
destination  = "s3://bucket/path"
credentials  = { aws-secrets-manager = {
    secret_name = "eka-ci/s3-credentials",
    region      = "us-east-1"
} }

[caches.permissions]
allow_all       = false
allowed_repos   = ["myorg/production-*"]
allowed_branches = ["main"]

Key settings

[web]

The HTTP API and Prometheus /metrics endpoint bind to address:port. For production deployments behind a reverse proxy, bind to 127.0.0.1 and let the proxy terminate TLS.

graph_lru_capacity

Capacity of the in-memory derivation graph cache. Larger repositories need a larger cache; see LRU Cache Tuning for sizing guidance.

build_no_output_timeout_seconds

A build is considered hung if it produces no output for this many seconds. The default of 20 minutes is appropriate for most Nixpkgs-style packages; bump it for repos with very slow fixed-output derivations.

require_approval

When true, builds for pull requests from external (non-collaborator) authors are queued but not executed until a maintainer approves. The approval workflow is partially implemented — see the project README for current status.

[security]

max_hook_timeout_seconds caps the wall-clock time of any post-build hook. audit_hooks enables structured audit log records every time a hook runs.

Credentials

All credential blocks (GitHub Apps, caches, OAuth) use a tagged enum:

credentials = { env  = { vars = ["..."] } }
credentials = { file = { path = "/etc/..." } }
credentials = { vault = { address = "...", secret_path = "...", token_env = "..." } }
credentials = { aws-secrets-manager = { secret_name = "...", region = "..." } }
credentials = { systemd = { credential_id = "..." } }
credentials = { instance-metadata = { provider = "aws" } }
credentials = { aws-profile = { profile = "..." } }
credentials = { github-app-key = { app_id = "main" } }

Each source is documented in GitHub App Setup and Configuring Caches.

Permissions

Both [[github_apps]] and [[caches]] accept a permissions block:

[caches.permissions]
allow_all        = false
allowed_repos    = ["myorg/*"]
allowed_branches = ["main", "release/*"]

Glob patterns use *-style matching. When allow_all = true, the other lists are ignored.

Repository Configuration

Repositories opt in to Eka CI by adding a .eka-ci/config.json file. This file is untrusted: it can reference caches, jobs, and checks defined on the server, but it can never inject credentials, host paths, or arbitrary commands beyond what the server allows.

Schema

{
  "jobs": {
    "package-name": {
      "file": "path/to/file.nix",
      "attr_path": "optional.attr.path",
      "allow_eval_failures": false,
      "caches": ["cache-id-from-server-config"],
      "size_check": {
        "max_increase_percent": 10.0,
        "base_branch": "main"
      }
    }
  },
  "checks": {
    "check-name": {
      "shell": "shell-derivation-attr",
      "command": "command to run",
      "allow_network": false,
      "ro_bind": ["/path/to/readonly/bind"]
    }
  }
}

Jobs

A job describes a Nix expression to evaluate and the derivations to build from it.

FieldRequiredDescription
fileyesPath to a .nix file relative to the repository root.
attr_pathnoOptional sub-attribute path inside the file.
allow_eval_failuresnoIf true, evaluation errors do not fail the check.
cachesnoList of cache IDs (defined server-side) to push successful builds to.
size_checknoConfigures output- and closure-size monitoring.

Size checks

When size_check is set, Eka CI:

  1. Calculates output (NAR) and closure size for each successful build.
  2. Stores sizes in historical tables, keyed by commit and repository.
  3. Compares against the most recent successful build on base_branch.
  4. Logs warnings (and surfaces them in the change summary) when the increase exceeds max_increase_percent.

Checks

A check runs a sandboxed command in a shell derivation defined in the repository.

FieldRequiredDescription
shellyesAttribute name of a shell derivation that provides the tools.
commandyesThe command line to run inside the sandbox.
allow_networknoDefault false. When true, the check is allowed network access.
ro_bindnoAdditional read-only bind mounts to expose to the sandbox.

Checks are sandboxed via birdcage with no filesystem write access outside their working directory and no network access by default. See Architecture for details on the security model.

Cache references

The caches field on a job lists cache IDs — string identifiers from the server's [[caches]] blocks. The repository never sees the underlying credentials, destinations, or permissions.

If a job references a cache it is not allowed to push to (per the cache's allowed_repos/allowed_branches), the push is silently skipped and a warning is logged. The build itself still succeeds. See Configuring Caches.

Configuring Caches in EKA-CI

This guide explains how to configure binary caches for EKA-CI, allowing build outputs to be pushed to various cache backends.

Overview

EKA-CI uses a two-tier configuration model for security:

  1. Server Configuration (trusted): Defines available caches, credentials, and permissions
  2. Repository Configuration (untrusted): References caches by ID only

This separation ensures that repository contributors cannot inject arbitrary commands or access credentials directly.

Server Configuration

Cache definitions are stored in the server configuration file (typically ~/.config/ekaci/ekaci.toml or specified via --config-file).

Basic Structure

# Security settings for hook execution
[security]
max_hook_timeout_seconds = 300  # Maximum time for cache push operations
audit_hooks = true              # Enable audit logging of all cache operations

# Cache definitions
[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }

[[caches]]
id = "public-cachix"
cache_type = "cachix"
destination = "my-cache-name"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }

Cache Types

1. Nix Copy (S3/HTTP Binary Caches)

Uses nix copy to push derivations to S3-compatible storage or HTTP binary caches.

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-west-2"

# Option 1: Environment variables
[caches.credentials]
env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] }

# Option 2: AWS profile
# [caches.credentials]
# aws-profile = { profile = "production" }

# Option 3: Credential file
# [caches.credentials]
# file = { path = "/etc/eka-ci/aws-credentials" }

Supported S3 destinations:

  • s3://bucket/path?region=REGION - S3 with explicit region
  • s3://bucket/path?profile=PROFILE - S3 using AWS profile
  • s3://bucket/path?endpoint=URL - S3-compatible services (MinIO, etc.)

HTTP binary caches:

[[caches]]
id = "http-cache"
cache_type = "nix-copy"
destination = "https://cache.example.com"
credentials = { none = {} }  # Public cache, no auth needed

2. Cachix

Uses Cachix for binary cache storage with built-in authentication.

[[caches]]
id = "my-cachix"
cache_type = "cachix"
destination = "my-cache-name"  # Your Cachix cache name

[caches.credentials]
cachix-token = { env_var = "CACHIX_AUTH_TOKEN" }

Getting a Cachix token:

  1. Sign up at cachix.org
  2. Create a cache
  3. Generate an auth token
  4. Set CACHIX_AUTH_TOKEN environment variable when running EKA-CI

3. Attic

Uses Attic for self-hosted binary caches.

[[caches]]
id = "attic-cache"
cache_type = "attic"
destination = "https://attic.example.com/my-cache"

[caches.credentials]
env = { vars = ["ATTIC_TOKEN"] }

Credential Sources

EKA-CI supports multiple credential sources, including secure secret management systems to avoid storing plain-text credentials.

Retrieve credentials from HashiCorp Vault:

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"

[caches.credentials]
vault = {
    address = "https://vault.example.com:8200",
    secret_path = "secret/data/eka-ci/s3-cache",
    token_env = "VAULT_TOKEN",  # Optional, defaults to VAULT_TOKEN
    namespace = "prod"          # Optional, for Vault Enterprise
}

Vault secret format (KV v2):

{
  "data": {
    "AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
    "AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  }
}

Benefits:

  • Secrets never stored on disk in plain text
  • Automatic secret rotation support
  • Audit logging of secret access
  • Fine-grained access control

AWS Secrets Manager

Retrieve credentials from AWS Secrets Manager:

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"

[caches.credentials]
aws-secrets-manager = {
    secret_name = "eka-ci/s3-cache-credentials",
    region = "us-east-1"  # Optional, defaults to AWS_REGION env var
}

Secret format (JSON):

{
  "AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
  "AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}

Benefits:

  • Native AWS integration
  • Automatic encryption at rest
  • IAM-based access control
  • Secret rotation with Lambda

systemd Credentials (Linux Systems)

Use systemd's encrypted credentials feature:

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"

[caches.credentials]
systemd-credential = { name = "s3-cache-creds" }

Setup:

# Encrypt credential
echo -n "AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=wJal..." | \
  systemd-creds encrypt --name=s3-cache-creds - \
  /etc/credstore.encrypted/s3-cache-creds

# Service loads it automatically
systemctl restart eka-ci.service

Benefits:

  • Encrypted at rest with TPM2 or system key
  • Integrated with systemd services
  • No external dependencies
  • OS-level security

Instance Metadata Service (Cloud VMs)

Use IAM roles/service accounts without explicit credentials:

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"

[caches.credentials]
instance-metadata = {}

Supported platforms:

  • AWS EC2 with IAM roles
  • Google Cloud with service accounts
  • Azure VMs with managed identities

Benefits:

  • No credentials to manage
  • Automatic credential rotation
  • Follows cloud best practices
  • Reduced attack surface

Environment Variables

Read credentials from environment variables (simple but less secure):

[caches.credentials]
env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] }

Note: Environment variables are visible in /proc/<pid>/environ and process listings.

File-based Credentials

Read credentials from a file (ensure proper file permissions):

[caches.credentials]
file = { path = "/etc/eka-ci/cache-credentials" }

File format: Key-value pairs, one per line

AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Security: Set file permissions to 600 (readable only by EKA-CI user):

chmod 600 /etc/eka-ci/cache-credentials
chown ekaci:ekaci /etc/eka-ci/cache-credentials

AWS Profile

Use credentials from ~/.aws/credentials:

[caches.credentials]
aws-profile = { profile = "production" }

AWS credentials file (~/.aws/credentials):

[production]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region = us-west-2

Cachix Token

Specific to Cachix authentication:

[caches.credentials]
cachix-token = { env_var = "CACHIX_AUTH_TOKEN" }

No Authentication

For public caches that don't require authentication:

[caches.credentials]
none = {}

Credential Source Comparison

MethodSecurityComplexityRotationAuditBest For
HashiCorp Vault⭐⭐⭐⭐⭐MediumAutomaticYesEnterprise production
AWS Secrets Manager⭐⭐⭐⭐⭐LowAutomaticYesAWS environments
systemd Credentials⭐⭐⭐⭐MediumManualLimitedLinux systemd systems
Instance Metadata⭐⭐⭐⭐⭐LowAutomaticYesCloud VMs
AWS Profile⭐⭐⭐LowManualNoDevelopment
Environment Variables⭐⭐LowManualNoDevelopment/testing
File-based⭐⭐LowManualNoSimple deployments

Cache Permissions

Control which repositories and branches can use each cache.

Allow All (Default)

[[caches]]
id = "public-cache"
# ... other config ...

[caches.permissions]
allow_all = true  # Any repository can use this cache

Specific Repositories

[[caches]]
id = "org-cache"
# ... other config ...

[caches.permissions]
allow_all = false
allowed_repos = [
    "myorg/repo1",
    "myorg/repo2",
    "anotherorg/special-repo"
]

Branch Restrictions

[[caches]]
id = "production-cache"
# ... other config ...

[caches.permissions]
allow_all = false
allowed_repos = ["myorg/myrepo"]
allowed_branches = [
    "main",           # Exact match
    "release/*",      # Prefix wildcard
    "*"               # Match all branches (if repo is allowed)
]

Branch pattern syntax:

  • main - Exact match only
  • release/* - Matches release/v1.0, release/v2.0, etc.
  • */hotfix - Matches any branch ending with /hotfix
  • * - Matches all branches

Repository Configuration

In your repository's .eka-ci/config.json, reference caches by ID:

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "caches": ["production-s3", "public-cachix"]
    },
    "another-package": {
      "file": "package.nix",
      "caches": ["production-s3"]
    }
  }
}

Security Note: Repository contributors can only reference cache IDs. They cannot:

  • Define arbitrary commands
  • Access credentials
  • Push to caches they don't have permission for
  • Create new caches

Complete Examples

Example 1: Public Open Source Project

# Server config: ~/.config/ekaci/ekaci.toml

[security]
max_hook_timeout_seconds = 300
audit_hooks = true

[[caches]]
id = "public-cachix"
cache_type = "cachix"
destination = "my-oss-project"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }
permissions = { allow_all = true }
// Repository config: .eka-ci/config.json
{
  "jobs": {
    "stdenv": {
      "file": "default.nix",
      "caches": ["public-cachix"]
    }
  }
}

Example 2: Private Company Repository

# Server config: /etc/eka-ci/ekaci.toml

[security]
max_hook_timeout_seconds = 600
audit_hooks = true

[[caches]]
id = "dev-cache"
cache_type = "nix-copy"
destination = "s3://company-dev-cache/nix?region=us-east-1"
credentials = { aws-profile = { profile = "dev" } }
permissions = { allow_all = false, allowed_repos = ["company/*"] }

[[caches]]
id = "prod-cache"
cache_type = "nix-copy"
destination = "s3://company-prod-cache/nix?region=us-east-1"
credentials = { aws-profile = { profile = "production" } }

[caches.permissions]
allow_all = false
allowed_repos = ["company/backend", "company/frontend"]
allowed_branches = ["main", "release/*"]
// Repository config: .eka-ci/config.json
{
  "jobs": {
    "backend": {
      "file": "backend.nix",
      "caches": ["dev-cache", "prod-cache"]
    }
  }
}

Example 3: Production with HashiCorp Vault

Secure production setup using Vault for secret management:

# Server config: /etc/eka-ci/ekaci.toml

[security]
max_hook_timeout_seconds = 600
audit_hooks = true

[[caches]]
id = "prod-s3"
cache_type = "nix-copy"
destination = "s3://company-prod-cache/nix?region=us-east-1"

[caches.credentials]
vault = {
    address = "https://vault.company.internal:8200",
    secret_path = "secret/data/eka-ci/prod-s3",
    namespace = "production"
}

[caches.permissions]
allow_all = false
allowed_repos = ["company/backend", "company/frontend"]
allowed_branches = ["main", "release/*"]

[[caches]]
id = "staging-s3"
cache_type = "nix-copy"
destination = "s3://company-staging-cache/nix?region=us-east-1"

[caches.credentials]
vault = {
    address = "https://vault.company.internal:8200",
    secret_path = "secret/data/eka-ci/staging-s3",
    namespace = "production"
}

[caches.permissions]
allow_all = false
allowed_repos = ["company/*"]
allowed_branches = ["develop", "feature/*", "main"]

Vault setup:

# Store S3 credentials in Vault
vault kv put secret/eka-ci/prod-s3 \
  AWS_ACCESS_KEY_ID="AKIA..." \
  AWS_SECRET_ACCESS_KEY="wJal..."

vault kv put secret/eka-ci/staging-s3 \
  AWS_ACCESS_KEY_ID="AKIA..." \
  AWS_SECRET_ACCESS_KEY="wJal..."

# Grant EKA-CI service access
vault policy write eka-ci-policy - <<EOF
path "secret/data/eka-ci/*" {
  capabilities = ["read"]
}
EOF

vault token create -policy=eka-ci-policy

Repository config:

{
  "jobs": {
    "backend": {
      "file": "backend.nix",
      "caches": ["staging-s3", "prod-s3"]
    }
  }
}

Example 4: Multi-Cache Strategy

Push to both a fast internal cache and a public Cachix:

[[caches]]
id = "internal-s3"
cache_type = "nix-copy"
destination = "s3://internal-cache/nix?region=us-west-2&endpoint=https://minio.internal"
credentials = { env = { vars = ["MINIO_ACCESS_KEY", "MINIO_SECRET_KEY"] } }
permissions = { allow_all = false, allowed_repos = ["myorg/*"] }

[[caches]]
id = "public-fallback"
cache_type = "cachix"
destination = "myorg-public"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }
permissions = { allow_all = false, allowed_repos = ["myorg/*"] }
{
  "jobs": {
    "my-app": {
      "file": "default.nix",
      "caches": ["internal-s3", "public-fallback"]
    }
  }
}

Example 5: Cloud VM with IAM Roles

AWS EC2 instance using IAM role (no credentials needed):

# Server config on EC2 instance

[security]
max_hook_timeout_seconds = 300
audit_hooks = true

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-cache/nix?region=us-east-1"
credentials = { instance-metadata = {} }  # Uses EC2 IAM role
permissions = { allow_all = true }

Required IAM role policy:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:PutObject",
      "s3:GetObject",
      "s3:ListBucket"
    ],
    "Resource": [
      "arn:aws:s3:::my-cache/*",
      "arn:aws:s3:::my-cache"
    ]
  }]
}

Operational Considerations

Setting Environment Variables

When running EKA-CI as a systemd service:

# /etc/systemd/system/eka-ci.service
[Service]
Environment="AWS_ACCESS_KEY_ID=AKIA..."
Environment="AWS_SECRET_ACCESS_KEY=wJal..."
Environment="CACHIX_AUTH_TOKEN=eyJ..."
EnvironmentFile=/etc/eka-ci/secrets.env

Secrets Management

Recommended: Use secure credential sources (see Credential Sources section)

Production deployments should use one of:

  • HashiCorp Vault - Enterprise secret management with rotation and audit
  • AWS Secrets Manager - Native AWS secret storage
  • systemd Credentials - Encrypted credentials with TPM2 support
  • Instance Metadata - Cloud IAM roles (no credentials to manage)

For development/testing only:

Environment file:

# /etc/eka-ci/secrets.env
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
CACHIX_AUTH_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Warning: Plain-text environment files and environment variables should only be used for development. Production systems should use Vault, AWS Secrets Manager, systemd credentials, or instance metadata.

Monitoring and Auditing

When audit_hooks = true, all cache operations are logged:

[INFO] Sent hook task for drv /nix/store/abc-foo.drv (job: my-package)
[WARN] Permission denied for cache 'prod-cache' in myorg/myrepo: Branch develop is not allowed
[WARN] Cache ID 'nonexistent-cache' not found in server registry, skipping

Testing Cache Configuration

  1. Verify server config loads:

    eka-ci-server --config-file ekaci.toml
    # Check logs for "Loading configuration file from..."
    
  2. Test permissions: Create a test PR and check logs for permission warnings

  3. Verify credentials:

    # For S3
    nix copy /nix/store/some-drv --to 's3://bucket/path?region=us-east-1'
    
    # For Cachix
    cachix push my-cache /nix/store/some-drv
    

Troubleshooting

Cache push fails silently

Check that:

  1. Cache ID in .eka-ci/config.json matches server config
  2. Repository has permission to use the cache
  3. Credentials are valid and accessible
  4. Server logs show the hook execution

Permission denied

[WARN] Permission denied for cache 'prod-cache' in myorg/myrepo

Solutions:

  • Add repository to allowed_repos list
  • Check branch name matches allowed_branches pattern
  • Set allow_all = true if appropriate

Credentials not found

[ERROR] Failed to execute hook: Environment variable AWS_ACCESS_KEY_ID not set

Solutions:

  • Ensure environment variables are set when starting server
  • Check systemd service file for Environment= or EnvironmentFile=
  • Verify file paths for file-based credentials

Timeout errors

[WARN] Hook execution timed out after 300 seconds

Solutions:

  • Increase max_hook_timeout_seconds in security config
  • Check network connectivity to cache destination
  • Verify cache backend is responsive

Security Best Practices

  1. Use minimal permissions: Only grant cache access to repositories that need it
  2. Separate dev/prod caches: Use branch restrictions to prevent dev builds in production caches
  3. Rotate credentials: Regularly rotate AWS keys and Cachix tokens
  4. Audit logs: Monitor audit_hooks output for unauthorized access attempts
  5. File permissions: Ensure credential files are readable only by the EKA-CI service user
  6. Environment isolation: Use systemd's PrivateTmp, ProtectSystem, etc. for additional security

Migration from Arbitrary Hooks

If you previously used arbitrary post-build hooks, migrate to the secure cache reference system:

Before (insecure):

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "post_build_hooks": [{
        "name": "push-to-s3",
        "command": ["nix", "copy", "--to", "s3://bucket/path"],
        "env": {
          "AWS_ACCESS_KEY_ID": "hardcoded-key",
          "AWS_SECRET_ACCESS_KEY": "hardcoded-secret"
        }
      }]
    }
  }
}

After (secure):

Server config:

[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://bucket/path?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }

Repository config:

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "caches": ["s3-cache"]
    }
  }
}

Further Reading

Rebuild Detection

When a pull request changes a Nix expression, Eka CI computes which derivations need to rebuild. This is a key input to both the change summary and the build queue.

How it works

For each opened or updated pull request, the server:

  1. Evaluates the base ref to produce a base jobset of derivations.
  2. Evaluates the head ref to produce a head jobset.
  3. Diffs the two jobsets, classifying each derivation as one of:
    • Added — present at head, missing at base.
    • Removed — present at base, missing at head.
    • Rebuild — same attribute but a different .drv hash.
  4. For each rebuild target, walks the in-memory dependency graph to compute its blast radius — the count of transitive dependents that must also rebuild.

The resulting set of derivations is what gets enqueued for the platform-specific build queues.

Configuration

Rebuild detection is controlled by a small number of settings in ekaci.toml:

[rebuild]
# Maximum number of derivations to rebuild before classifying a PR as
# "wide" and skipping per-package builds.
max_rebuild_count = 20000

# Skip rebuild evaluation entirely for PRs that touch any of these paths.
skip_paths = ["doc/**", "**/CHANGELOG.md"]

The exact set of available settings is evolving; consult the source of truth at backend/server/src/config.rs. Repositories can additionally constrain rebuild detection via change_set rules, allowing maintainers to mark certain files as "rebuild-only" or "docs-only" without re-evaluating Nix.

Change sets

A change set is a per-repository declaration of which file globs map to which kind of change. They are evaluated cheaply from the Git diff before a full Nix evaluation runs. Typical uses:

  • Marking README.md and doc/**/*.md as documentation-only.
  • Marking flake.lock updates as triggering a full rebuild.
  • Treating CI-only files (like .github/**) as no-op.

When a PR's diff is fully covered by change-set rules that imply "no rebuild", Eka CI can skip the build phase entirely and post a "no rebuilds expected" change summary.

Metrics

Rebuild detection emits Prometheus metrics; see Monitoring & Metrics for the exact metric names. Useful series include rebuild counts per system, blast-radius histograms, and skip-due-to-change-set counters.

Change Summary Operational Runbook

Table of Contents

  1. Overview
  2. Per-Repo Configuration
  3. Metrics
  4. Endpoints
  5. GitHub Check Posting
  6. Truncation Strategy
  7. Cache
  8. Troubleshooting
  9. Alerts

Overview

The change-summary pipeline computes a per-PR view that combines:

  • Package changes (A1): structured diff between head and base jobsets — Added / Removed / VersionBump / LicenseChange / MaintainerChange / RebuildOnly.
  • Rebuild impact (A2): per-system rebuild counts plus per-package "blast radius" (count of transitive dependents) across the in-memory build graph.

Outputs:

  • GET /v1/commits/{sha}/package-changes — JSON, structured diff only.
  • GET /v1/commits/{sha}/rebuild-impact — JSON, impact only.
  • GET /v1/commits/{sha}/change-summary — JSON, combined + pre-rendered markdown.
  • GET /v1/commits/{sha}/change-summary.mdtext/markdown, ready to paste.
  • A single GitHub check run per PR head (EkaCI: Change Summary), idempotently created/patched on a 5-minute debounce after each jobset evaluation.

Per-Repo Configuration

Behaviour can be tuned per repository via .ekaci/config.json. Both blocks are optional; absent ⇒ engine defaults.

{
  "package_change_summary": {
    "enabled": true,
    "max_packages_listed": 100,
    "include_rebuild_only": false
  },
  "rebuild_impact": {
    "enabled": true,
    "max_top_blast_radius": 5,
    "compute_full_blast_radius": false
  }
}

package_change_summary

FieldDefaultNotes
enabledtrueHides the package-changes section of the check when false.
max_packages_listed100Soft cap on table rows before the renderer collapses to counts-only. The web endpoint clamps user-supplied max_packages_listed query params to ×10 this value.
include_rebuild_onlyfalseWhen true, RebuildOnly rows render alongside Added/Removed/Bumped. Counts are still surfaced in the rebuild-only summary line regardless.

rebuild_impact

FieldDefaultNotes
enabledtrueHides the blast-radius section of the check when false.
max_top_blast_radius5Number of top-rebuild packages reported. The web endpoint clamps user-supplied max_top_blast_radius query params to ×10 this value.
compute_full_blast_radiusfalseWhen true, walks the full transitive dependent set (expensive on large jobsets — use sparingly). The default mode reports per-seed direct rebuild counts.

Schema is parsed by CIConfig in backend/server/src/ci/config.rs. Partial blocks are accepted; missing inner fields fall back to the defaults shown above.


Metrics

All metrics are exposed at GET /v1/metrics with the namespace eka_ci_.

MetricTypeLabelsMeaning
eka_ci_change_summary_total_duration_secondsHistogramphaseWall-clock per phase: classify, impact, render, end_to_end.
eka_ci_change_summary_cache_hits_totalCounterRebuildImpactCache lookups served from SQLite.
eka_ci_change_summary_cache_misses_totalCounterRebuildImpactCache lookups that fell through to a cold compute.
eka_ci_change_summary_metadata_unavailable_totalCounterCalls where head-side pname/version/license/maintainers were entirely missing. Indicates the eval pipeline did not populate package metadata.
eka_ci_change_summary_truncated_totalCounterlevelTruncation events by drop level: columns (one of maintainers/license/rebuild-only dropped) or summary (table collapsed to counts-only).
eka_ci_rebuild_impact_traversal_duration_secondsHistogramsystemPer-system blast-radius traversal duration.
eka_ci_rebuild_impact_seeds_totalHistogramPer-call distribution of changed-drv seed count fed into the BFS.

Useful queries

Cache hit ratio (target: > 0.8 in steady state for re-rendered PRs):

rate(eka_ci_change_summary_cache_hits_total[5m])
  /
ignoring() (
  rate(eka_ci_change_summary_cache_hits_total[5m])
  + rate(eka_ci_change_summary_cache_misses_total[5m])
)

End-to-end p95:

histogram_quantile(0.95,
  sum by (le) (rate(eka_ci_change_summary_total_duration_seconds_bucket{phase="end_to_end"}[5m]))
)

Truncation rate (target: < 5% of renders):

sum(rate(eka_ci_change_summary_truncated_total[15m]))
  /
sum(rate(eka_ci_change_summary_total_duration_seconds_count{phase="end_to_end"}[15m]))

Endpoints

PathAuthNotes
GET /v1/commits/{sha}/package-changesrequiredReturns full structured diff; never truncated by the orchestrator. Query: base_sha, job, max_packages_listed.
GET /v1/commits/{sha}/rebuild-impactrequiredRead-through RebuildImpactCache. Query: base_sha, job, max_top_blast_radius.
GET /v1/commits/{sha}/change-summaryrequiredCombined; includes pre-rendered markdown.
GET /v1/commits/{sha}/change-summary.mdpublicReturns the same markdown that posts to the GitHub check. Public per design §10.1 — same data is visible on the PR check tab.

max_packages_listed and max_top_blast_radius are clamped to ×10 their defaults to keep payload sizes predictable.


GitHub Check Posting

  • One check run per PR head SHA, titled EkaCI: Change Summary.
  • Posted with status=Completed, conclusion=Neutral (informational; does not gate merge).
  • 5-minute debounce after the last CreateJobSet for a head SHA so all jobsets contribute to a single aggregated render.
  • Idempotent: subsequent renders for the same head PATCH the same check run id.
  • Defense-in-depth: a 65,500-byte sender-side cap protects against GitHub's 65,535-char output.summary ceiling. Hits append a _…truncated by sender safety net_ footer; this is rare in practice (the markdown renderer's 60,000-byte soft limit fires first).

Truncation Strategy

The renderer drops content in priority order until the markdown fits under 60,000 bytes:

  1. Drop maintainer rows from the package table.
  2. Drop license rows.
  3. Drop the rebuild-only count line.
  4. Collapse the entire change table to a counts-only summary.

Every step that fires increments eka_ci_change_summary_truncated_total (level=columns for steps 1-3, level=summary for step 4).


Cache

The RebuildImpactCache SQLite table memoises rebuild-impact responses keyed by (head_sha, base_sha, job).

  • Pruned on startup: rows older than 7 days are dropped (DEFAULT_CACHE_TTL_DAYS).
  • Cache write failures are logged at WARN; the freshly-computed answer is still returned (just unmemoised).
  • 404s (head jobset missing) are not cached.

To force recompute of a specific entry:

DELETE FROM RebuildImpactCache WHERE head_sha = ? AND base_sha = ? AND job = ?;

Troubleshooting

"No change summary check appearing on a PR"

  1. Check that the PR target jobset finished evaluating (Job rows for the head SHA exist).
  2. Confirm the GitHub App has checks: write permission for the repo.
  3. The 5-minute debounce means the check appears at least 5 minutes after the last jobset. Verify by waiting or by checking the change_summary_pending log line in the GitHub service.
  4. If a base SHA is missing from the PR (rare; happens on detached PR heads), the change-summary check is skipped — this is intentional.

"Rendered summary is truncated more than expected"

  • Inspect eka_ci_change_summary_truncated_total rates. A spike usually correlates with a large fan-out PR (touches many packages).
  • Per design §11.1, the per-repo max_packages_listed and max_top_blast_radius knobs can be raised, but the GitHub 65 535-byte cap is the hard ceiling. Larger PRs always benefit from the change-summary.md endpoint, which always returns full markdown.

"Cache hit ratio is low"

  • Expected the first time a (head, base, job) triple is queried. Subsequent renders should hit.
  • A persistent miss rate means re-evaluations are producing different head_sha values (e.g., force-pushes). This is normal for active PRs.
  • If miss rate is high without new commits, check for RebuildImpactCache write failures in the WARN log.

"Metadata unavailable counter is non-zero"

  • Means the eval pipeline did not populate pname/version/license/maintainers for any drv on the head side. Check the nix-eval-jobs invocation produced meta blocks.
  • Affects display only — Added/Removed classification falls back to RebuildOnly rows.

Alerts

Suggested Prometheus alert rules:

- alert: ChangeSummaryEndToEndSlow
  expr: |
    histogram_quantile(0.95,
      sum by (le) (rate(eka_ci_change_summary_total_duration_seconds_bucket{phase="end_to_end"}[5m]))
    ) > 10
  for: 15m
  annotations:
    summary: "change-summary p95 > 10s"

- alert: ChangeSummaryCacheMissesElevated
  expr: |
    rate(eka_ci_change_summary_cache_misses_total[15m])
      / (rate(eka_ci_change_summary_cache_hits_total[15m])
         + rate(eka_ci_change_summary_cache_misses_total[15m])) > 0.5
  for: 30m
  annotations:
    summary: "change-summary cache miss ratio > 50%"

- alert: ChangeSummaryTruncationSpike
  expr: |
    sum(rate(eka_ci_change_summary_truncated_total{level="summary"}[15m])) > 0
  for: 30m
  annotations:
    summary: "change-summary collapsing tables to counts-only"

GitHub PR Comment Commands

eka-ci listens for comments on pull requests that mention the bot and dispatches supported commands. This document lists the commands that are currently recognized, the conditions under which they succeed, and the feedback users should expect.

Summary

CommandPurpose
@eka-ci mergeQueue the PR to be merged once CI passes, using the repository's default merge method (or the configured default squash).
@eka-ci merge mergeSame, with an explicit merge (merge-commit) method.
@eka-ci merge squashSame, with an explicit squash method.
@eka-ci merge rebaseSame, with an explicit rebase method.
@eka-ci merge cancelWithdraw a previously-issued @eka-ci merge request.

The bot mention is case-insensitive (@eka-ci, @Eka-CI, @EKA-CI all work). The command verb and method are also case-insensitive.

Where commands are accepted

The comment must be on a pull request (comments on plain issues are ignored). Additionally:

  • Only newly-created comments trigger commands — edits and deletions do not revoke or re-issue commands. If you want to cancel, post @eka-ci merge cancel as a new comment.
  • Bot-authored comments are ignored (the User.type field on the comment author must not be Bot).
  • The @eka-ci mention must be the first non-whitespace token of a line. Mentions embedded in prose — e.g. cc @eka-ci please help — are deliberately ignored to prevent accidental triggers.
  • A single comment may span multiple lines; the first line that parses as a command wins. Other lines are treated as prose.

Parser behavior

  • The bot handle match is case-insensitive (ASCII).
  • The command verb (merge) must immediately follow the mention.
  • Unknown verbs (e.g. @eka-ci rebuild) are silently ignored.
  • Unknown methods (e.g. @eka-ci merge squashh) fall back to a bare @eka-ci merge rather than rejecting the whole comment. This permissive behavior means typos still queue a merge.
  • Trailing tokens after the method are ignored: @eka-ci merge squash please thanks parses as a squash-merge request.

@eka-ci merge [method]

Queues the PR for auto-merge once all CI gates pass.

Authorization

The comment author must satisfy at least one of:

  1. Repo permission of write, maintain, or admin on the repository the PR targets, OR
  2. Be a registered maintainer of every package whose source is changed by the PR (per the eka-ci maintainers table).

If neither condition holds:

  • The bot reacts -1 to the command comment.
  • The bot posts a reply explaining that the command was denied.
  • No merge request is recorded.

Push-time (force-push) protection

To guard against commits landing between when a reviewer types the command and when the webhook is processed, the bot performs a best-effort timestamp check before accepting the request:

  • It fetches the current head commit via the GitHub API and reads the committer.date.
  • If that timestamp is more than 30 seconds after the created_at of the triggering comment, the bot refuses:
    • Reacts -1 on the command comment.
    • Posts a reply naming the head commit and asking the user to review the new changes and re-issue the command.

The 30-second grace window absorbs clock skew between GitHub's event recorder and the commit-metadata service. If the API call fails or the committer date cannot be parsed, the check falls open — the bot proceeds and relies on the post-acceptance SHA-drift check (below) as a second line of defense.

Caveat: Because the signal is the commit's own committer.date, a force-push of a much older, cherry-picked commit will not trigger this check. The post-acceptance SHA-drift hook still catches that case.

Request recording and acknowledgement

On acceptance, the bot:

  1. Records a pending comment-merge request pinned to the current head SHA, together with the requested merge method (if any), the requester's GitHub user id/login, and the comment id.
  2. Reacts +1 on the command comment as a visible ack.
  3. Kicks the auto-merge evaluator immediately so that if CI gates are already green, the merge lands right away.

Merge method selection

When the auto-merger eventually runs, it selects the merge method in this order of preference:

  1. The method explicitly given in the comment (merge / squash / rebase), if any.
  2. The PR's stored merge-method preference (set via the UI), if any.
  3. squash (the default fallback).

If the selected method is disabled in the repository's merge settings, the bot logs a warning and skips auto-merge. No comment is posted in that case — the requester is expected to re-issue with an allowed method.

Post-acceptance SHA-drift protection

After the comment-merge is recorded, the bot continues to monitor the PR. If any new commit lands on the head branch before the merge completes (PR Synchronize webhook), the request is cancelled:

  1. A :confused: reaction is added to the original command comment.
  2. A reply is posted naming the expected (pinned) and current head SHAs, and instructing the user to re-issue the command against the updated PR.
  3. The pending merge request is cleared from the database.

This is a hard guarantee: the merge bot will never land a commit that the requester did not explicitly target.

Gates the merge still has to pass

@eka-ci merge is not a force-merge. It opts the PR into the auto-merge evaluator, which still requires:

  • The head commit's jobset has fully concluded with no failing new-or-changed jobs (pr_head_build_succeeded).
  • The merge method selected is allowed by the repository settings.
  • Any other CI gates configured on the commit are passing (these are enforced by GitHub's own branch-protection rules independently of eka-ci).

Note that the package-maintainer approval gate used for UI-triggered auto-merges is skipped for comment-driven merges, because the requester's authorization was already verified at command time.

@eka-ci merge cancel

Withdraws an outstanding comment-merge request.

Behavior when nothing is pending

If no @eka-ci merge is currently pending on the PR, the bot silently no-ops. It does not react, does not post, and does not write to the database. This is intentional: it denies unauthorized commenters any signal about bot state.

Authorization

The comment author must satisfy at least one of:

  1. Be the original requester of the pending merge (identified by GitHub user id). This is a fast path that skips the permission API call.
  2. Have write, maintain, or admin on the repository.
  3. Be a maintainer of every changed package in the PR.

If none of these hold:

  • The bot reacts -1 on the cancel comment.
  • The bot posts a reply explaining why the cancel was denied.
  • The pending merge request remains in place.

This prevents random commenters from griefing pending maintainer merges.

On acceptance

  • The pending merge request is cleared from the database.
  • The bot reacts +1 on the cancel comment.
  • Any subsequent auto-merge evaluation proceeds as if no comment-merge had ever been issued (ambient auto-merge remains in effect if it was separately enabled via the UI).

Rate limiting

To protect the installation's shared GitHub API budget (5000 req/hr), commands are rate-limited per (user_id, owner, repo) triple at the webhook boundary:

  • Minimum interval: 5 seconds between accepted commands from the same user on the same repository.
  • Rejections are silent: no reaction, no comment, no DB write. Feedback would itself amplify the spam the limit is designed to contain.
  • The rate-limit state is process-local and non-persistent; it resets on server restart. It protects against burst spam only; sustained abuse is left to GitHub's own abuse-detection systems.

If you legitimately need to correct a just-issued command (e.g. wrong method), wait 5 seconds before re-issuing, or use @eka-ci merge cancel followed by the new command.

User-visible reactions

The bot uses the following reactions on the triggering comment as a compact status signal:

ReactionMeaning
+1Command accepted (merge queued, or cancel recorded).
-1Command denied (unauthorized, or refused due to push-time drift).
rocketThe requested merge succeeded (added after the PR merges).
confusedA previously-accepted comment-merge was cancelled due to post-acceptance SHA drift.

Examples

Queue a squash-merge:

Looks good to me!
@eka-ci merge squash

Queue the repository-default merge method:

@eka-ci merge

Withdraw a pending request:

Actually, hold off — I want to add another commit.
@eka-ci merge cancel

Multi-line comment where prose precedes the command:

LGTM after the last round of fixes.
@eka-ci merge rebase
Thanks for the reviews!

Things that are NOT supported

These are intentionally out of scope as of this writing:

  • Commands from comment edits or deletions — only newly-created comments trigger anything.
  • Commands embedded inside prose (cc @eka-ci please merge). The mention must be the first non-whitespace token on its line.
  • Verbs other than merge (@eka-ci rebuild, @eka-ci retry, etc.) are parsed and silently ignored. They may be added in future versions.
  • Queueing multiple merge requests against the same PR — the most recent accepted request overwrites the previous one.
  • Explicit SHA arguments (@eka-ci merge <sha>). The current head SHA is always captured implicitly at command time.

Post-Build Hooks Implementation

Overview

This document describes the implementation of Nix-style post-build hooks in eka-ci, allowing per-job configuration of cache push and other post-build operations.

Status: ✅ Production Ready - Cache push functionality is fully implemented and operational.

Architecture

Components

  1. Hook Types (backend/server/src/hooks/types.rs)

    • PostBuildHook: Configuration for individual hooks
    • HookTask: Task sent to the executor
    • HookContext: Build context passed to hooks
    • HookResult: Result of hook execution
  2. Hook Executor (backend/server/src/hooks/executor.rs)

    • Async service that processes hook tasks
    • Executes hook commands with environment variable substitution
    • Logs output to {logs_dir}/{drv_hash}/hook-{name}.log
  3. Database Integration

    • Migration: backend/server/sql/migrations/20260409_job_config.sql
    • Stores job config JSON in GitHubJobSets.config_json
    • Tracks hook executions in HookExecution table
  4. Recorder Integration (backend/server/src/scheduler/recorder.rs)

    • Executes hooks after successful builds
    • Retrieves job config from database
    • Sends hook tasks to HookExecutor (non-blocking)

New in 2024-04-11: eka-ci now supports automatic cache push using server-side cache configuration with multi-source credential support.

Instead of configuring post-build hooks manually, you can use the built-in cache push system which provides:

  • ✅ Secure credential management (Vault, AWS Secrets Manager, systemd, etc.)
  • ✅ Permission controls (repository and branch restrictions)
  • ✅ Automatic credential loading
  • ✅ Support for multiple caches per job

Server-Side Cache Configuration

Configure caches in your server config (~/.config/ekaci/ekaci.toml):

[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { vault = {
    address = "https://vault.example.com:8200",
    secret_path = "eka-ci/s3-credentials",
    token_env = "VAULT_TOKEN"
}}

[caches.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = ["main", "release/*"]

Repository Cache Reference

Reference caches by ID in your .eka-ci/config.json:

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "caches": ["production-s3"]
    }
  }
}

See configure-caches.md for detailed cache configuration options.


Manual Post-Build Hooks (Advanced)

For custom post-build operations beyond cache push, you can configure manual hooks.

Job Configuration Format

{
  "jobs": {
    "my-package": {
      "file": "default.nix",
      "post_build_hooks": [
        {
          "name": "push-to-cache",
          "command": ["nix", "copy", "--to", "s3://my-cache"],
          "env": {
            "AWS_PROFILE": "production"
          }
        }
      ],
      "fod_post_build_hooks": [
        {
          "name": "push-fods-public",
          "command": ["cachix", "push", "public-cache"]
        }
      ]
    }
  }
}

Hook Fields

  • name: Unique identifier for the hook (used in logging)
  • command: Array of command and arguments
  • env: (Optional) Additional environment variables

Hook Execution Behavior

  • Regular hooks: Run for all successful builds
  • FOD hooks: Run in addition to regular hooks for fixed-output derivations
  • Additive: Both regular and FOD-specific hooks execute for FODs
  • Async: Hooks run asynchronously and don't block build recording
  • Failure handling: Hook failures are logged but don't fail the build

Environment Variables

Each hook receives:

Nix-Compatible Variables

  • DRV_PATH: Path to the derivation file
  • OUT_PATHS: Space-separated list of output store paths

Extended eka-ci Variables

  • EKA_JOB_NAME: Name of the job from config
  • EKA_IS_FOD: "true" or "false"
  • EKA_SYSTEM: Build system (e.g., "x86_64-linux")
  • EKA_PNAME: Package name (if available)
  • EKA_BUILD_LOG_PATH: Path to build log
  • EKA_COMMIT_SHA: Git commit SHA

Custom Variables

Any additional variables defined in the hook's env field.

Example Use Cases

Push to S3 Binary Cache

{
  "post_build_hooks": [
    {
      "name": "push-s3",
      "command": ["nix", "copy", "--to", "s3://my-cache?region=us-west-2"],
      "env": {
        "AWS_PROFILE": "ci"
      }
    }
  ]
}

Push to Cachix

{
  "post_build_hooks": [
    {
      "name": "push-cachix",
      "command": ["cachix", "push", "mycache", "$OUT_PATHS"],
      "env": {
        "CACHIX_AUTH_TOKEN": "secret-token"
      }
    }
  ]
}

Different Caches for FODs

{
  "post_build_hooks": [
    {
      "name": "push-private",
      "command": ["nix", "copy", "--to", "s3://private-cache"]
    }
  ],
  "fod_post_build_hooks": [
    {
      "name": "push-public",
      "command": ["nix", "copy", "--to", "s3://public-cache"]
    }
  ]
}

Implementation Status

✅ Completed (Production Ready)

  • Hook types and data structures
  • Hook executor service with async processing
  • Database schema for config storage and hook tracking
  • Integration with RecorderService
  • Environment variable setup (Nix-compatible + extended)
  • FOD detection and additive hook execution
  • Logging infrastructure
  • HookExecutor initialized in SchedulerService
  • Job config stored in database when creating jobsets
  • Actual output paths queried from nix-store (implemented 2024-04-11)
  • Automatic cache push with credential loading (implemented 2024-04-11)
  • Support for all cache types (NixCopy, Cachix, Attic)

🚧 TODO (Future Enhancements)

  • Query pname from DrvInfo for richer context (low priority)
  • Implement actual log path lookup (low priority)
  • Add metrics for hook execution
  • Add tests for hook functionality

Testing

Testing Automatic Cache Push

To test the automatic cache push:

  1. Configure a cache in server config (~/.config/ekaci/ekaci.toml)
  2. Reference the cache in repository .eka-ci/config.json:
    {
      "jobs": {
        "my-package": {
          "file": "default.nix",
          "caches": ["production-s3"]
        }
      }
    }
    
  3. Trigger a build by opening a PR
  4. Monitor logs for cache push execution:
    journalctl -u eka-ci -f | grep -E "(cache|hook|nix copy)"
    
  5. Verify artifacts appear in your cache (S3, Cachix, etc.)

Expected log output:

DEBUG eka_ci_server::scheduler::recorder: Loaded credentials for cache 'production-s3'
DEBUG eka_ci_server::scheduler::recorder: Created cache push hook for cache 'production-s3'
DEBUG eka_ci_server::scheduler::recorder: Found 1 output path(s) for drv
DEBUG eka_ci_server::hooks::executor: Executing hook: push-production-s3
INFO  eka_ci_server::hooks::executor: Hook 'push-production-s3' completed successfully

Testing Manual Post-Build Hooks

To test manual hook implementation:

  1. Create a .eka-ci/config.json with post_build_hooks
  2. Trigger a build
  3. Check logs in {logs_dir}/{drv_hash}/hook-{name}.log
  4. Verify hook environment variables are set correctly
  5. Confirm FOD-specific hooks run for FODs

Future Enhancements

  1. Conditional Execution: Allow hooks to specify conditions (e.g., only on main branch)
  2. Retry Logic: Implement retry with backoff for failed hooks
  3. Hook Templates: Define reusable hook templates
  4. Dependency Graph: Allow hooks to depend on other hooks
  5. Timeout Configuration: Per-hook timeout configuration
  6. Rate Limiting: Limit concurrent hook executions to prevent resource exhaustion

Migration Guide

From No Hooks to Post-Build Hooks

  1. Run database migration: 20260409_job_config.sql
  2. Update .eka-ci/config.json to include post_build_hooks
  3. Deploy updated server with HookExecutor initialized
  4. Monitor hook execution logs

Nix post-build-hook Equivalents

Nix Featureeka-ci Equivalent
post-build-hook in nix.confpost_build_hooks in job config
$OUT_PATHS env var$OUT_PATHS env var
$DRV_PATH env var$DRV_PATH env var
Global hook scriptPer-job hook configuration
Synchronous executionAsynchronous execution (non-blocking)

Implementation Details

Cache Push Implementation (2024-04-11)

The automatic cache push feature was completed with the following additions to backend/server/src/scheduler/recorder.rs:

  1. build_cache_push_hook() - Async function that:

    • Loads credentials from configured source (Vault, AWS SM, etc.)
    • Builds appropriate command based on cache type (NixCopy, Cachix, Attic)
    • Returns PostBuildHook with credentials in environment
  2. get_drv_output_paths() - Queries actual output paths using nix-store --query --outputs

  3. Updated execute_hooks_for_drv() - Integration that:

    • Resolves cache IDs from job config
    • Checks cache permissions (repo/branch restrictions)
    • Loads credentials and builds hooks
    • Queries output paths from nix-store
    • Sends hook tasks to HookExecutor

How It Works

  1. After successful build, RecorderService calls execute_hooks_for_drv()
  2. Job config is retrieved from database (contains cache IDs)
  3. For each cache:
    • Cache config is looked up from server registry
    • Permissions are checked
    • Credentials are loaded asynchronously
    • Hook command is built
  4. Output paths are queried from nix-store
  5. HookTask is sent to HookExecutor with all hooks and credentials
  6. HookExecutor runs each hook sequentially, logging output

Security

  • Credentials loaded fresh for each build
  • Credentials passed through environment, never logged
  • Permission checks before any cache access
  • Separate credentials for each cache
  • Async execution doesn't block builds
  • Failures don't cascade (one bad cache doesn't affect others)

References

  • Cache Configuration Guide - Detailed cache setup
  • GitHub App Setup Guide - Credential sources
  • Nix post-build-hook documentation
  • Database schema: backend/server/sql/migrations/20260409_job_config.sql
  • Hook executor: backend/server/src/hooks/executor.rs
  • Cache push implementation: backend/server/src/scheduler/recorder.rs (lines 462-551, 630-644, 663-676)

LRU Cache Operational Runbook

Version: 1.0 Date: 2026-04-07 Status: Production Ready

Table of Contents

  1. Quick Reference
  2. Monitoring
  3. Capacity Tuning
  4. Troubleshooting
  5. Alerts
  6. Performance Optimization

Quick Reference

Configuration

Environment Variable:

export EKA_CI_GRAPH_LRU_CAPACITY=100000

Config File (~/.config/ekaci/ekaci.toml):

graph_lru_capacity = 100000

Default: 100,000 nodes

Key Metrics

MetricDescriptionHealthy Range
eka_ci_graph_cache_utilizationCache fullness (0.0-1.0)0.5 - 0.8
eka_ci_graph_cache_reloads_totalCache misses (counter)< 100/day
eka_ci_graph_pinned_nodes_totalProtected nodes50 - 500
eka_ci_graph_nodes_totalTotal nodes< capacity

Log Messages

Normal Operation:

INFO Cache status: 45000/100000 nodes (45.0% utilized), 123 pinned

Warning (80% utilization):

WARN Cache utilization elevated (82.3%): Monitor for potential capacity issues

Critical (90% utilization):

WARN Cache utilization HIGH (93.1%): Consider increasing EKA_CI_GRAPH_LRU_CAPACITY (current: 100000)

Monitoring

Grafana Dashboard

Panel 1: Cache Utilization (Gauge)

eka_ci_graph_cache_utilization * 100
  • Unit: Percent
  • Thresholds:
    • Green: < 70%
    • Yellow: 70-85%
    • Red: > 85%

Panel 2: Cache Size (Graph)

sum(eka_ci_graph_nodes_total)
  • Unit: Nodes
  • Show: Current, Max capacity

Panel 3: Cache Reload Rate (Graph)

rate(eka_ci_graph_cache_reloads_total[5m]) * 60
  • Unit: Reloads/min
  • Alert: > 10/min for 15 minutes

Panel 4: Reload Latency (Graph)

histogram_quantile(0.50, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
histogram_quantile(0.90, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
  • Unit: Seconds
  • Labels: p50, p90, p99

Panel 5: Pinned Nodes (Stat)

eka_ci_graph_pinned_nodes_total
  • Unit: Nodes
  • Description: Active builds

Panel 6: Eviction Candidates by Tier (Stacked Graph)

eka_ci_graph_eviction_candidates_total{tier="tier1_transitive_failure"}
eka_ci_graph_eviction_candidates_total{tier="tier2_completed_failure"}
eka_ci_graph_eviction_candidates_total{tier="tier3_completed_success"}

Key Performance Indicators (KPIs)

Healthy System:

  • Utilization: 50-70%
  • Reload rate: < 5/min
  • Reload latency (p99): < 50ms
  • Pinned nodes: 50-200

Concerning:

  • Utilization: > 80%
  • Reload rate: > 10/min
  • Reload latency (p99): > 100ms
  • Pinned nodes: > 1000

Critical:

  • Utilization: > 90%
  • Reload rate: > 50/min
  • Reload latency (p99): > 500ms
  • Cache thrashing

Capacity Tuning

Determining Optimal Capacity

Formula:

Optimal Capacity = (Peak Node Count × 1.5) + Buffer

Example:

  • Peak node count: 60,000
  • Optimal capacity: 60,000 × 1.5 = 90,000
  • Add buffer: 90,000 + 10,000 = 100,000

Capacity Sizing Guide

WorkloadNode CountRecommended CapacityMemory Usage
Small< 10k20,000~22 MB
Medium10k - 50k75,000~83 MB
Large50k - 100k150,000~165 MB
Very Large100k - 200k300,000~330 MB

Increasing Capacity

When to increase:

  • Utilization consistently > 80%
  • Reload rate > 10/min
  • Warnings in logs every 5 minutes

How to increase:

  1. Calculate new capacity:

    New Capacity = Current Capacity × 1.5
    
  2. Set environment variable:

    export EKA_CI_GRAPH_LRU_CAPACITY=150000
    
  3. Restart service:

    systemctl restart eka-ci
    
  4. Monitor for 1 hour:

    eka_ci_graph_cache_utilization
    
  5. Verify:

    • Utilization < 70%
    • Reload rate < 5/min
    • No warnings

Decreasing Capacity

When to decrease:

  • Utilization consistently < 30%
  • Memory usage high (> 200 MB)
  • Zero cache reloads for 24+ hours

How to decrease:

  1. Calculate new capacity:

    New Capacity = Peak Node Count × 1.3
    
  2. Set environment variable:

    export EKA_CI_GRAPH_LRU_CAPACITY=75000
    
  3. Restart service:

    systemctl restart eka-ci
    
  4. Monitor closely for 24 hours:

    • Watch reload rate (should stay < 10/min)
    • Monitor utilization (should be 50-70%)

Troubleshooting

Problem 1: High Utilization (> 90%)

Symptoms:

  • Log warnings every 5 minutes
  • Potential cache thrashing
  • Slow build dispatch

Diagnosis:

# Check utilization
eka_ci_graph_cache_utilization

# Check growth rate
rate(sum(eka_ci_graph_nodes_total)[1h])

Solution:

  1. Immediate: Increase capacity by 50%

    export EKA_CI_GRAPH_LRU_CAPACITY=150000
    systemctl restart eka-ci
    
  2. Long-term: Calculate proper capacity based on workload

Prevention:

  • Set alert for 85% utilization
  • Review capacity quarterly

Problem 2: High Reload Rate (> 10/min)

Symptoms:

  • Frequent cache misses
  • Elevated database load
  • Slow API responses

Diagnosis:

# Reload rate
rate(eka_ci_graph_cache_reloads_total[5m]) * 60

# Which nodes are being reloaded?
# Check logs for "Cache miss: reloading"

Possible Causes:

Cause 1: Capacity Too Small

  • Utilization > 85%
  • Solution: Increase capacity

Cause 2: Workload Pattern Changed

  • Many terminal nodes evicted, then accessed again
  • Solution: Increase tier age thresholds

Cause 3: Hot Path Not Protected

  • is_buildable() nodes being evicted
  • Solution: Ensure touch_buildable_check() is called

Problem 3: High Memory Usage

Symptoms:

  • Process memory > 500 MB
  • OOM risk
  • Swap usage

Diagnosis:

# Memory estimate
eka_ci_graph_memory_bytes_estimate

# Utilization
eka_ci_graph_cache_utilization

Solutions:

If utilization < 50%:

  • Cause: Capacity too large
  • Fix: Decrease capacity to match peak workload

If utilization > 80%:

  • Cause: Legitimate high usage
  • Fix: Add more RAM or optimize elsewhere

Problem 4: Zero Reloads Despite Low Utilization

Symptoms:

  • Utilization < 30%
  • Zero cache reloads for days
  • High memory usage

Diagnosis:

# Reload count
eka_ci_graph_cache_reloads_total

# Utilization
eka_ci_graph_cache_utilization

Cause: Capacity oversized

Solution:

  1. Decrease capacity to improve efficiency
  2. Free up memory for other services

Problem 5: Slow Reload Latency (p99 > 100ms)

Symptoms:

  • High reload latency
  • Slow API responses
  • Database contention

Diagnosis:

# Reload latency
histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))

# Reload rate
rate(eka_ci_graph_cache_reloads_total[5m])

Possible Causes:

Cause 1: High Reload Rate

  • Too many concurrent reloads
  • Database overwhelmed
  • Solution: Increase capacity to reduce reload frequency

Cause 2: Database Slow

  • Check database metrics
  • Optimize queries
  • Add indexes if needed

Cause 3: Large Nodes

  • Nodes with many dependencies
  • Solution: Optimize edge loading (future work)

Alerts

Prometheus Alert Rules

groups:
  - name: lru_cache_alerts
    rules:
      # Critical: High utilization
      - alert: LRUCacheUtilizationHigh
        expr: eka_ci_graph_cache_utilization > 0.90
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "LRU cache utilization is high ({{ $value | humanizePercentage }})"
          description: "Cache is {{ $value | humanizePercentage }} full. Consider increasing capacity."

      # Warning: Elevated utilization
      - alert: LRUCacheUtilizationElevated
        expr: eka_ci_graph_cache_utilization > 0.80
        for: 1h
        labels:
          severity: info
        annotations:
          summary: "LRU cache utilization is elevated ({{ $value | humanizePercentage }})"
          description: "Cache is {{ $value | humanizePercentage }} full. Monitor for growth."

      # Critical: High reload rate
      - alert: LRUCacheReloadRateHigh
        expr: rate(eka_ci_graph_cache_reloads_total[5m]) * 60 > 10
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Cache reload rate is high ({{ $value }} reloads/min)"
          description: "Frequent cache misses detected. Capacity may be too small."

      # Warning: Slow reloads
      - alert: LRUCacheReloadSlow
        expr: histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m])) > 0.1
        for: 15m
        labels:
          severity: info
        annotations:
          summary: "Cache reloads are slow (p99: {{ $value }}s)"
          description: "Database may be under load or capacity is causing thrashing."

      # Info: Many pinned nodes
      - alert: LRUCacheManyPinnedNodes
        expr: eka_ci_graph_pinned_nodes_total > 1000
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "Many nodes pinned ({{ $value }})"
          description: "High number of active builds. This is normal during large builds."

Performance Optimization

Best Practices

  1. Set Capacity to 1.5× Peak Usage

    • Provides headroom for growth
    • Minimizes reload rate
    • Optimal utilization: 60-70%
  2. Call touch_buildable_check() After is_buildable()

    • Protects hot path nodes
    • Prevents thrashing on active builds
    #![allow(unused)]
    fn main() {
    if graph_handle.is_buildable(&drv_id) {
        graph_handle.touch_buildable_check(&drv_id);
        // ... dispatch build ...
    }
    }
  3. Monitor Utilization Trends

    • Review every quarter
    • Adjust capacity as workload changes
    • Plan for growth
  4. Avoid Frequent Restarts

    • LRU cache is warmed up over time
    • Restarts cause cold cache (100% reload rate initially)
    • Allow 1 hour for warmup

Capacity Planning

Formula for Growth:

Future Capacity = Current Peak × Growth Factor × Headroom

Where:
- Growth Factor = Expected growth (1.2 = 20% growth)
- Headroom = Safety margin (1.5 = 50% headroom)

Example:

  • Current peak: 50,000 nodes
  • Expected 20% growth: 50,000 × 1.2 = 60,000
  • With 50% headroom: 60,000 × 1.5 = 90,000

Common Scenarios

Scenario 1: Large Build (200k drvs)

Expected Behavior:

  • Utilization rises to 80-90%
  • Pinned nodes: 500-2000 (active builds)
  • Reload rate: 5-10/min (terminal nodes evicted)
  • Warnings logged (normal)

Action: Monitor, no action needed unless reload rate > 20/min


Scenario 2: Idle System

Expected Behavior:

  • Utilization: 5-10% (only completed builds)
  • Pinned nodes: 0-5
  • Reload rate: 0/min
  • No warnings

Action: Consider decreasing capacity to save memory


Scenario 3: Continuous Integration

Expected Behavior:

  • Utilization: 40-60% (steady state)
  • Pinned nodes: 50-200 (concurrent builds)
  • Reload rate: < 5/min
  • No warnings

Action: Optimal state, no action needed


Maintenance

Quarterly Review

  1. Check peak utilization (last 90 days):

    max_over_time(eka_ci_graph_cache_utilization[90d])
    
  2. Check reload rate:

    avg_over_time(rate(eka_ci_graph_cache_reloads_total[1h])[90d:1h]) * 60
    
  3. Adjust capacity if needed:

    • If peak > 80%: Increase by 50%
    • If peak < 40%: Decrease by 25%

Version Upgrades

Before upgrade:

  • Note current capacity setting
  • Export metrics for comparison

After upgrade:

  • Verify capacity setting persists
  • Compare metrics (should be similar)
  • Monitor for 24 hours

Emergency Procedures

Cache Thrashing (Reload Rate > 50/min)

Immediate Action:

  1. Double capacity:

    export EKA_CI_GRAPH_LRU_CAPACITY=200000
    systemctl restart eka-ci
    
  2. Monitor for 15 minutes

  3. If still thrashing, double again

Follow-up:

  • Investigate root cause
  • Review workload patterns
  • Consider permanent capacity increase

Out of Memory

Immediate Action:

  1. Restart service (clears cache):

    systemctl restart eka-ci
    
  2. Reduce capacity by 50%:

    export EKA_CI_GRAPH_LRU_CAPACITY=50000
    systemctl start eka-ci
    
  3. Monitor memory usage

Follow-up:

  • Identify memory leak (if any)
  • Right-size capacity for available RAM
  • Consider adding more RAM

Support

Logs to Collect

# Cache status logs (last hour)
journalctl -u eka-ci --since "1 hour ago" | grep "Cache status"

# Warnings (last 24 hours)
journalctl -u eka-ci --since "1 day ago" | grep -E "WARN|ERROR"

# Cache misses (last hour)
journalctl -u eka-ci --since "1 hour ago" | grep "Cache miss"

Metrics to Export

# Current state
curl http://localhost:8080/metrics | grep eka_ci_graph

# Or via Prometheus query
eka_ci_graph_cache_utilization
eka_ci_graph_cache_reloads_total
eka_ci_graph_nodes_total

Summary

Key Takeaways:

  1. Monitor utilization - Keep between 50-80%
  2. Watch reload rate - Should be < 5/min normally
  3. Tune capacity - 1.5× peak usage is optimal
  4. Set alerts - For 85% utilization and high reload rate
  5. Review quarterly - Adjust as workload changes

Healthy System Checklist:

  • ✅ Utilization: 50-70%
  • ✅ Reload rate: < 5/min
  • ✅ No warnings in logs
  • ✅ Pinned nodes: 50-500
  • ✅ Reload latency (p99): < 50ms

Monitoring & Metrics

Eka CI exposes Prometheus metrics and structured logs. Together they cover build queue health, cache utilization, GitHub integration, and rebuild detection.

Prometheus metrics

Metrics are served at /metrics on the address configured in [web]. Common series:

MetricTypeDescription
eka_ci_build_queue_depthgaugePending builds per platform queue.
eka_ci_build_duration_secondshistogramEnd-to-end build wall time.
eka_ci_build_outcome_totalcounterBuilds by outcome (success, failed, cancelled).
eka_ci_graph_cache_hits_totalcounterLRU cache hits for the dependency graph.
eka_ci_graph_cache_misses_totalcounterLRU cache misses.
eka_ci_graph_cache_sizegaugeCurrent number of nodes in the LRU cache.
eka_ci_webhook_processing_secondshistogramWebhook handler latency.
eka_ci_rebuild_counthistogramRebuilds detected per PR.
eka_ci_change_summary_render_secondshistogramChange-summary render time.

For deeper guidance on the cache metrics specifically, see LRU Cache Tuning.

Useful queries

# Build queue depth, per platform
eka_ci_build_queue_depth

# Cache hit rate over 5 minutes
rate(eka_ci_graph_cache_hits_total[5m])
  / (rate(eka_ci_graph_cache_hits_total[5m])
     + rate(eka_ci_graph_cache_misses_total[5m]))

# 95th percentile webhook latency
histogram_quantile(0.95,
  rate(eka_ci_webhook_processing_seconds_bucket[5m]))

Logging

Logs are emitted via the tracing crate as structured records. Verbosity is controlled through RUST_LOG:

# Set a global level
RUST_LOG=info eka-ci-server

# Per-module filters
RUST_LOG=eka_ci_server::scheduler=debug,eka_ci_server=info eka-ci-server

When run under systemd, view logs with:

journalctl -u eka-ci -f

Key log targets:

  • eka_ci_server::scheduler — build scheduling and queue transitions.
  • eka_ci_server::webhooks — incoming GitHub events.
  • eka_ci_server::graph — dependency graph and LRU cache activity.
  • eka_ci_server::change_summary — change-summary pipeline.
  • eka_ci_server::cache_push — cache push results and post-build hooks.

A starting set of alerts for production:

  • eka_ci_build_queue_depth is high for too long — pending work is not draining.
  • Webhook 5xx rate non-zero — GitHub deliveries are being rejected.
  • Cache hit rate < 0.6 sustained — LRU is undersized; see LRU Cache Tuning.
  • Change-summary check stuck pending > 10 minutes — see Change Summaries.

The runbook pages for the LRU cache and change-summary pipeline include more specific threshold and remediation guidance.

Eka CI Backend Architecture

Overview

Eka CI is a Nix-based Continuous Integration server built in Rust using async/await with Tokio. The system follows a multi-service actor-based architecture where independent services communicate via message passing through Tokio's mpsc channels.

The backend is designed to efficiently build large Nix monorepos (like Nixpkgs) with intelligent scheduling, remote builder support, and comprehensive build state tracking.

Core Design Principles

  • Service isolation: Each major component runs as an independent service with its own message queue
  • Async-first: Built on Tokio for high concurrency and I/O efficiency
  • Type-safe state machine: Build states are explicitly modeled with exhaustive pattern matching
  • Dependency awareness: Tracks derivation dependency graphs for intelligent scheduling
  • GitHub-native: First-class integration with GitHub Pull Requests and check runs

System Architecture

Services Overview

The system consists of 7 core services initialized in backend/server/src/services/mod.rs:

  1. DbService - SQLite database operations with connection pooling
  2. GitService - Git repository cloning, fetching, and worktree management
  3. RepoReader - CI configuration parsing from .ekaci/config.json
  4. EvalService - Nix expression evaluation using nix-eval-jobs
  5. SchedulerService - Multi-tier build orchestration (composed of 3 sub-services)
  6. GitHubService - GitHub API integration (check runs, PR status updates via Octocrab)
  7. WebService - HTTP API and web interface (Axum)
  8. UnixService - Unix domain socket for CLI client communication

Service Communication Pattern

┌─────────────┐
│   GitHub    │
│   Webhook   │
└──────┬──────┘
       │
       ▼
┌─────────────┐    GitTask     ┌─────────────┐
│ WebService  │───────────────▶│ GitService  │
└─────────────┘                └──────┬──────┘
                                      │ RepoTask
                                      ▼
                               ┌─────────────┐
                               │ RepoReader  │
                               └──────┬──────┘
                                      │
                        ┌─────────────┴─────────────┐
                        │                           │
                   EvalTask                    CheckTask
                        │                           │
                        ▼                           ▼
                 ┌─────────────┐           ┌──────────────┐
                 │ EvalService │           │ChecksExecutor│
                 └──────┬──────┘           └──────┬───────┘
                        │                         │
                  IngressTask                CheckResult
                        │                         │
                        ▼                         │
              ┌──────────────────┐                │
              │ IngressService   │                │
              └────────┬─────────┘                │
                       │                          │
                 BuildRequest                     │
                       │                          │
                       ▼                          │
              ┌──────────────────┐                │
              │   BuildQueue     │                │
              └────────┬─────────┘                │
                       │                          │
        ┌──────────────┼──────────────┐           │
        │              │              │           │
     [FOD]         [Local]        [Remote]        │
        │              │              │           │
        └──────────────┴──────────────┘           │
                       │                          │
                       ▼                          │
              ┌──────────────────┐                │
              │  BuilderThread   │                │
              └────────┬─────────┘                │
                       │                          │
                  NixBuild                        │
                       │                          │
                       ▼                          │
              ┌──────────────────┐                │
              │ RecorderService  │◀───────────────┘
              └────────┬─────────┘
                       │
          ┌────────────┴────────────┐
          │                         │
    Update Database          GitHubTask
          │                         │
          ▼                         ▼
    ┌──────────┐           ┌──────────────┐
    │ DbService│           │GitHubService │
    └──────────┘           └──────────────┘

Data Model

Build State Machine

The DrvBuildState enum (backend/server/src/ci/mod.rs:44) defines a comprehensive state machine:

Queued
  │
  ├─▶ Buildable ─────▶ Building ─────▶ Completed(Success)
  │                                 │
  │                                 └─▶ Completed(Failure)
  │                                      │
  │                                      └─▶ FailedRetry ─┐
  │                                           (1st fail)   │
  │                                                        │
  │◀───────────────────────────────────────────────────────┘
  │
  ├─▶ TransitiveFailure (dep failed, propagated)
  │
  ├─▶ Blocked (dep interrupted)
  │
  └─▶ Interrupted(kind)
       ├─ OutOfMemory
       ├─ Timeout
       ├─ Cancelled
       └─ ProcessDeath

State Guarantees:

  • Queued: All drvs start here when discovered by evaluation
  • Buildable: Scheduler guarantees all dependencies are successful
  • FailedRetry: Automatic retry (one chance for transient failures)
  • TransitiveFailure: Permanent block until upstream fixed
  • Completed: Terminal state, never changes
  • All transitions recorded in DrvBuildEvent with timestamps

CI Workflow Processing

Configuration Format

CI configurations are stored in .ekaci/config.json at the repository root:

{
  "jobs": {
    "job-name": {
      "file": "/path/to/nix/file.nix",
      "allow_eval_failures": true
    }
  }
}

Jobs - Nix-based builds evaluated by nix-eval-jobs Checks - Sandboxed imperative commands (linters, formatters, tests)

Complete End-to-End Flow

1. GitHub Trigger

GitHub PR opened/updated → Webhook → WebService → GitTask::GitHubCheckout

2. Repository Checkout (backend/server/src/services/git.rs)

#![allow(unused)]
fn main() {
GitService:
  - Clone base repo if not exists
  - Fetch PR branch
  - Create git worktrees for base and head commits
  - Send RepoTask::ReadGitHub to RepoReader
}

Git worktrees allow parallel access to different commits without checkout conflicts.

3. Configuration Parsing (backend/server/src/services/repo_reader.rs)

#![allow(unused)]
fn main() {
RepoReader:
  - Read .ekaci/config.json
  - Parse via CIConfig::from_str() (serde)
  - For each job:
    - Check if already processed (db.has_jobset)
    - Create EvalJob with file path
    - Send EvalTask::GithubJobPR to EvalService
  - For each check:
    - Send CheckTask to ChecksExecutor
}

4. Nix Evaluation (backend/server/src/services/eval.rs)

#![allow(unused)]
fn main() {
EvalService:
  - Create "CI Configure Gate" GitHub check run
  - Run: nix-eval-jobs --flake .#{job.file}
  - Parse JSON stream of NixEvalDrv structs

  If evaluation fails and allow_failures=false:
    - Send GitHubTask::FailCIEvalJob
    - STOP (don't queue builds)

  - Send GitHubTask::CreateJobSet to GitHubService
  - deep_traverse() to discover all dependencies:
    - Run: nix-store --query --requisites {drv}
    - Fetch info: nix derivation show {drv}...
    - Batch insert into database (150 drvs at a time)
    - Insert dependency relationships (DrvRefs)
    - Send IngressTask::EvalRequest for each drv
}

Optimization: LRU cache (5000 entries) avoids re-fetching drv info.

5. Build Scheduling

IngressService (backend/server/src/scheduler/ingress.rs)
#![allow(unused)]
fn main() {
IngressService receives IngressTask::EvalRequest:
  - Skip if drv in terminal state
  - Query: is_drv_buildable()
    - Check all dependencies in DrvRefs
    - Ensure all are Completed(Success)
  - If buildable:
    - Update state to Buildable
    - Send BuildRequest to BuildQueue
}
BuildQueue (backend/server/src/scheduler/build/queue.rs)

Routes builds to platform-specific queues based on drv.system:

  • x86_64-linux
  • aarch64-linux
  • x86_64-darwin
  • aarch64-darwin
PlatformQueue (backend/server/src/scheduler/build/system_queue.rs)

Intelligent routing based on build characteristics:

#![allow(unused)]
fn main() {
if drv.is_fod {
    → FOD Builder (dedicated for Fixed-Output Derivations)
} else if drv.prefer_local_build {
    → Local Builder
} else {
    → Remote Builder Pool or Local
}
}

Rationale: FOD builds (like fetchurl) are network-bound and benefit from dedicated capacity to avoid blocking compute-heavy builds.

6. Build Execution

BuilderThread (backend/server/src/scheduler/build/builder_thread.rs)
#![allow(unused)]
fn main() {
- Manages JoinSet of concurrent builds
- Respects max_jobs limit per builder
- Spawns NixBuild task for each build
}
NixBuild (backend/server/src/scheduler/build/nix_build.rs)

Core build executor:

#![allow(unused)]
fn main() {
1. Create log file: {logs_dir}/{drv_hash}/build.log
2. Spawn: nix-build {drv_path} --builders '{config}'
3. Stream stdout/stderr to log file in real-time
4. Monitor for timeout:
   - Reset timer on each line of output
   - Kill process if no_output_timeout_seconds exceeded
5. On completion:
   - Success: Attempt to fetch substituter logs via 'nix log'
   - Failure: Return BuildOutcome::Failure
   - Timeout: Return BuildOutcome::Timeout
}

Timeout Behavior: The timeout is output-based, not total time. A build can run indefinitely as long as it produces output.

7. Build Recording (backend/server/src/scheduler/recorder.rs)

#![allow(unused)]
fn main() {
RecorderService receives RecorderTask:
  - Update Drv.build_state in database

  On Success:
    - Clear transitive failures for this drv
    - Re-queue blocked downstream drvs
    - Send IngressTask::CheckBuildable for each downstream drvs

  On Failure (first time):
    - Update state to FailedRetry
    - Re-queue immediately (same IngressTask)

  On Failure (second time):
    - Mark as permanent Completed(Failure)
    - Propagate TransitiveFailure to all downstream drvs

  - Send GitHubTask::UpdateBuildStatus

  If all jobs in jobset concluded:
    - Determine conclusion (success if no new/changed failures)
    - Send GitHubTask::CompleteCIEvalJob
}

Retry Logic: Single automatic retry handles transient failures (network issues, temporary resource constraints).

8. GitHub Status Updates (backend/server/src/services/github.rs)

#![allow(unused)]
fn main() {
GitHubService:
  - Use Octocrab (GitHub API client)
  - Create check runs (lazily on failure to reduce noise)
  - Update check run status:
    - queued → in_progress → completed
  - Set conclusion:
    - success / failure / timed_out / cancelled
  - Complete eval gate check runs when jobset finished
}

Lazy Check Run Creation: Only create check runs for new/changed jobs that fail. This reduces PR noise for large derivation sets where most builds are unchanged and successful.

Scheduler Architecture Deep Dive

The scheduler is the most complex component, consisting of 3 tiers:

Tier 1: IngressService

Responsibility: Determine build eligibility

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/ingress.rs
IngressTask::EvalRequest { drv_path } → {
    if is_terminal_state(drv_path) {
        return; // Already built
    }

    if is_drv_buildable(drv_path) {
        update_state(drv_path, Buildable);
        send(BuildRequest { drv_path });
    }
}
}

Key query: is_drv_buildable() checks that all dependencies are Completed(Success).

Tier 2: BuildQueue

Responsibility: Platform routing

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/build/queue.rs
BuildRequest { drv } → {
    match drv.system {
        "x86_64-linux" => x86_64_linux_queue,
        "aarch64-linux" => aarch64_linux_queue,
        "x86_64-darwin" => x86_64_darwin_queue,
        "aarch64-darwin" => aarch64_darwin_queue,
    }
}
}

Tier 3: PlatformQueue

Responsibility: Builder selection and capacity management

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/build/system_queue.rs
struct PlatformQueue {
    fod_builder: BuilderHandle,      // Fixed-Output Derivations
    local_builder: BuilderHandle,     // Local builds
    remote_builders: Vec<BuilderHandle>, // Remote builder pool
}

fn route_build(drv: &Drv) -> BuilderHandle {
    if drv.is_fod {
        self.fod_builder
    } else if drv.prefer_local_build {
        self.local_builder
    } else {
        // Round-robin or capacity-aware selection
        self.select_remote_builder()
    }
}
}

Builder Types:

  • FOD Builder: Dedicated capacity for network-bound builds (fetchurl, fetchFromGitHub)
  • Local Builder: Runs nix-build on the CI server itself
  • Remote Builder: SSH-based remote Nix builders (configured via RemoteBuilder)

Capacity Management: Each builder has a max_jobs limit. The PlatformQueue tracks active builds per builder and queues excess work.

Builder Health Checking

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/build/builder.rs
impl Builder {
    async fn is_available(&self) -> bool {
        // Run: nix store ping --store {uri}
        Command::new("nix")
            .args(["store", "ping", "--store", &self.uri])
            .status()
            .await
            .map(|s| s.success())
            .unwrap_or(false)
    }
}
}

Builders are health-checked before use to avoid queueing builds to unavailable remotes.

Security Model

Sandboxing via birdcage:

  • Filesystem isolation (only sees /nix/store and checkout directory)
  • Network isolation (configurable per-check)
  • No access to home directory or system files
  • Prevents arbitrary file access outside checkout

Nix Package Provisioning:

nix-shell -p nixfmt statix --run 'env'

Fetches PATH and environment variables with packages available, then runs the command in that environment within the sandbox.

Use Cases

  • Linters: nixfmt --check, statix check
  • Tests: pytest, cargo test (without Nix wrapping)
  • Formatters: prettier --check, black --check
  • Custom scripts: Any command that doesn't require a full Nix build

Database Storage

GitHubCheckSets:
  (sha, check_name, owner, repo_name) → check_id

CheckResult:
  check_id → (success, exit_code, stdout, stderr, duration_ms, executed_at)

CheckRunInfo:
  check_id → (check_run_id, check_run_node_id)

Similar to job-based builds, checks integrate with GitHub check runs for PR status reporting.

Build Features

Remote Builders

Configured via RemoteBuilder struct:

#![allow(unused)]
fn main() {
struct RemoteBuilder {
    uri: String,              // ssh://user@host or nix-daemon:///
    platforms: Vec<String>,   // ["x86_64-linux", "i686-linux"]
    max_jobs: u32,            // Concurrent build limit
    speed_factor: u32,        // Priority hint (higher = prefer)
}
}

Passed to nix-build via --builders flag:

nix-build /nix/store/xxx.drv --builders 'ssh://builder1 x86_64-linux,i686-linux 10 1'

Build Timeout Handling

Output-based timeout (not total duration):

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/build/nix_build.rs
let no_output_timeout = Duration::from_secs(config.no_output_timeout_seconds);
let mut last_output = Instant::now();

while let Some(line) = stdout.next_line().await? {
    last_output = Instant::now(); // Reset timeout
    log_file.write_all(line.as_bytes()).await?;
}

if last_output.elapsed() > no_output_timeout {
    process.kill().await?;
    return BuildOutcome::Timeout;
}
}

Rationale: Large builds (like LLVM) may run for hours but should timeout if they hang (no output).

Log Capture

Real-time streaming:

#![allow(unused)]
fn main() {
let log_path = format!("{}/{}/build.log", logs_dir, drv_hash);
let mut log_file = BufWriter::new(File::create(log_path).await?);

while let Some(line) = stdout.next_line().await? {
    log_file.write_all(line.as_bytes()).await?;
}

log_file.flush().await?;
}

Post-build log fetching:

nix log /nix/store/xxx.drv

For substituted builds (downloaded from cache), nix-build may not produce output. Attempt to fetch logs from the binary cache.

Log serving: WebService exposes logs via /logs/{drv_hash} endpoint.

Dependency Graph Tracking

Insertion (backend/server/src/services/eval.rs:deep_traverse):

#![allow(unused)]
fn main() {
// Query all dependencies
let deps = nix_store_query_requisites(drv_path).await?;

// Batch insert relationships
db.insert_drv_refs(deps).await?;
}

Query (backend/server/src/scheduler/ingress.rs:is_drv_buildable):

SELECT COUNT(*) FROM DrvRefs
WHERE referrer = ?
  AND reference NOT IN (
    SELECT drv_path FROM Drv
    WHERE build_state = 'Completed(Success)'
  )

If count > 0, drv has unbuildable dependencies.

Transitive Failure Propagation

When a drv fails permanently:

#![allow(unused)]
fn main() {
// backend/server/src/scheduler/recorder.rs
async fn propagate_transitive_failure(drv_path: &str) {
    // Find all downstream drvs
    let downstream = db.query_downstream_drvs(drv_path).await?;

    for dep_drv in downstream {
        db.insert_transitive_failure(dep_drv, drv_path).await?;
        db.update_build_state(dep_drv, TransitiveFailure).await?;
    }
}
}

Benefits:

  • Prevents wasting resources building drvs that cannot succeed
  • Clear attribution of why a build is blocked
  • Can be cleared if upstream drv is fixed and rebuilt

GitHub Integration

OAuth Authentication

Flow (backend/server/src/auth/):

  1. User clicks "Login with GitHub"
  2. Redirect to GitHub OAuth authorize URL
  3. GitHub redirects back with code
  4. Exchange code for access token
  5. Fetch user info, create JWT session token
  6. Store in browser cookie

Authorization:

  • Optional require_approval flag
  • ApprovedUsers table tracks allowed GitHub usernames
  • Non-approved users can view but not trigger builds

Check Runs

Lazy Creation Strategy:

Only create GitHub check runs for:

  • New jobs (not in base commit)
  • Changed jobs (drv_path differs between base and head)
  • AND the job failed

Rationale: Large repos may have thousands of derivations. Creating check runs for every successful unchanged build clutters the PR.

Implementation (backend/server/src/services/github.rs:update_build_status):

#![allow(unused)]
fn main() {
async fn update_build_status(drv: &Drv, job: &Job) {
    // Only create check run if:
    // 1. Job is New or Changed
    // 2. Build failed
    if (job.difference_type == DifferenceType::New ||
        job.difference_type == DifferenceType::Changed) &&
       drv.build_state.is_failure() {

        let check_run = octocrab
            .checks(owner, repo)
            .create_check_run(job.job_name, sha)
            .status(Status::Completed)
            .conclusion(Conclusion::Failure)
            .send()
            .await?;

        db.insert_check_run_info(drv.drv_path, check_run.id).await?;
    }
}
}

Eval Gate Check Run

Purpose: Indicates whether CI configuration evaluation succeeded.

Created for: Every jobset (combination of commit + job_name)

Completion: When all drvs in the jobset reach terminal state, determine overall conclusion:

  • Success: No new or changed drvs failed
  • Failure: At least one new or changed drv failed

This provides a single aggregated status for the entire job.

Monitoring and Observability

Prometheus Metrics

Exposed endpoint: /metrics

Metrics (backend/server/src/metrics.rs):

#![allow(unused)]
fn main() {
// Build queue depth by platform
active_builds{platform="x86_64-linux"} 15
queued_builds{platform="x86_64-linux"} 42

// Process metrics (via process_collector)
process_cpu_seconds_total 1234.56
process_resident_memory_bytes 524288000
process_virtual_memory_bytes 2147483648
}

Grafana Dashboard (suggested):

  • Build throughput (builds/minute)
  • Queue depth trends
  • Builder utilization
  • Failure rate by job

Structured Logging

Libraries: tracing, tracing-subscriber

Log levels:

  • ERROR: Build failures, service panics
  • WARN: Retry attempts, transitive failures
  • INFO: Build starts/completions, state transitions
  • DEBUG: Service message passing, database queries
  • TRACE: Detailed build output

Key spans:

#![allow(unused)]
fn main() {
#[tracing::instrument(skip(db))]
async fn build_drv(drv_path: &str, db: &DbService) -> BuildResult {
    // Automatically logs function entry/exit with timing
}
}

Build Logs

Storage: {logs_dir}/{drv_hash}/build.log

Rotation: Not implemented (logs accumulate)

Access:

  • Web UI: View builds and their logs
  • API: GET /logs/{drv_hash}
  • CLI: eka-cli logs {drv_hash}

Configuration

Server Configuration

Environment variables:

DATABASE_URL=sqlite:///var/lib/eka-ci/eka.db
LOGS_DIR=/var/lib/eka-ci/logs
GITHUB_APP_ID=123456
GITHUB_APP_PRIVATE_KEY=/path/to/key.pem
BIND_ADDR=0.0.0.0:8080

Builder Configuration

Local builder:

#![allow(unused)]
fn main() {
Builder::local(max_jobs: 40)
}

Remote builders:

#![allow(unused)]
fn main() {
RemoteBuilder {
    uri: "ssh://builder@10.0.0.5".to_string(),
    platforms: vec!["x86_64-linux".to_string()],
    max_jobs: 20,
    speed_factor: 1,
}
}

FOD builder (special local pool):

#![allow(unused)]
fn main() {
Builder::local(max_jobs: 10) // Separate capacity for FODs
}

Timeout Configuration

#![allow(unused)]
fn main() {
no_output_timeout_seconds: 3600  // 1 hour default
}

Graceful Shutdown

Signal handling (backend/server/src/main.rs):

#![allow(unused)]
fn main() {
let cancellation_token = CancellationToken::new();

tokio::signal::ctrl_c().await?;
cancellation_token.cancel();

// All services use run_until_cancelled()
tokio::select! {
    _ = service.run() => {}
    _ = cancellation_token.cancelled() => {}
}

// Wait for all services to drain
tokio::time::sleep(Duration::from_secs(5)).await;

// Close database pool
db.close().await;
}

Service shutdown behavior:

  • Services drain message queues (process pending tasks)
  • In-flight builds are not interrupted (complete naturally)
  • Database writes are flushed before exit

Performance Characteristics

Concurrency

  • Service isolation: Each service runs independently, maximizing CPU utilization
  • Per-builder parallelism: Configurable max_jobs (default: 40 local, 10 FOD)
  • Database pooling: SQLite connection pool with multiple readers
  • Async I/O: Tokio runtime with work-stealing scheduler

Scalability

Horizontal scaling (not implemented):

  • Services could be split across processes
  • Message passing via network channels (e.g., Redis pub/sub)
  • Distributed builder pool

Vertical scaling:

  • Increase max_jobs for builders
  • Add remote builders for more capacity
  • Increase database connection pool size

Bottlenecks

  1. SQLite: Single-writer limitation for database updates
  2. Evaluation: nix-eval-jobs is CPU-bound (mitigated by caching)
  3. Local builder: Limited by machine resources
  4. Log I/O: Large builds produce MB of logs (mitigated by streaming)

Security Considerations

Checks Sandboxing

Threat model: Malicious .ekaci/config.json in untrusted PR

Mitigations:

  • Filesystem isolation (only sees checkout + /nix/store)
  • Network isolation (disabled by default)
  • No home directory access
  • No access to CI server's SSH keys, secrets, etc.

Limitations:

  • Commands run as the CI server user (within sandbox)
  • No resource limits (memory, CPU time) enforced
  • DOS possible via infinite loops (timeout required)

Nix Build Isolation

Threat model: Malicious Nix expressions in untrusted PR

Mitigations:

  • Nix sandbox (enabled by default)
    • Isolated /tmp
    • No network access (except FODs)
    • Limited filesystem view
  • Separate user account for Nix builds (nixbld group)

Limitations:

  • Sandbox escapes may exist in Nix itself
  • Resource exhaustion possible (disk, memory)
  • Recommendation: Run CI server in isolated environment (container, VM)

GitHub Authentication

Webhook validation:

  • HMAC signature verification using GitHub App secret
  • Prevents forged webhook events

API authentication:

  • GitHub App installation tokens (short-lived)
  • Scoped permissions (checks:write, contents:read)

Session management:

  • JWT tokens with expiration
  • Httponly cookies (XSS mitigation)

Future Enhancements

Potential Improvements

  1. Distributed architecture: Split services across machines for scale
  2. PostgreSQL support: Overcome SQLite concurrency limits
  3. Build caching: Integrate with Nix binary caches (Cachix, attic)
  4. Resource limits: Enforce memory/CPU limits via cgroups
  5. Multi-tenancy: Support multiple organizations with isolation
  6. Build prioritization: Prioritize PRs from approved contributors
  7. Log compression: Gzip old logs to reduce storage
  8. Web UI improvements: Real-time build progress, failure analysis
  9. Metrics dashboards: Built-in Grafana/Prometheus integration
  10. Notification system: Email/Slack alerts for build failures

Known Limitations

  1. SQLite scaling: Single writer becomes bottleneck at high concurrency
  2. No build artifact storage: Relies on Nix store (garbage collected)
  3. Limited failure analysis: No automatic error categorization
  4. Manual builder management: No auto-scaling of remote builders
  5. No incremental evaluation: Re-evaluates entire job on every commit

References

Key Files

  • Service initialization: backend/server/src/services/mod.rs
  • CI configuration: backend/server/src/ci/config.rs
  • Build state machine: backend/server/src/ci/mod.rs
  • Scheduler tiers: backend/server/src/scheduler/
  • Nix build execution: backend/server/src/scheduler/build/nix_build.rs
  • Checks executor: backend/server/src/checks/executor.rs
  • GitHub integration: backend/server/src/services/github.rs
  • Database migrations: backend/server/sql/migrations/

External Dependencies

  • Tokio: Async runtime
  • Axum: Web framework
  • SQLx: Async SQL with compile-time query checking
  • Octocrab: GitHub API client
  • Serde: JSON serialization
  • Birdcage: Sandboxing via Linux namespaces
  • nix-eval-jobs: Parallel Nix evaluation (external tool)
  • Nix: Build execution and derivation management

Last Updated: 2026-02-13 Nix Version: 2.x compatible Rust Edition: 2021