Introduction
Eka CI is a Continuous Integration server purpose-built for Nix projects. It is designed to make reviewing Nix-based pull requests fast and trustworthy, especially for repositories that are too large to be reviewed by hand on every change.
The goal is to answer one question as quickly and as reliably as possible:
Should I merge this PR?
To do that, Eka CI focuses on the things that actually matter for a Nix repository:
- Does evaluation still succeed?
- Which packages were added, removed, newly succeed, or newly fail?
- What is the closure-size and dependency impact of the change?
- What does the rebuild blast radius look like across systems?
Manual review processes do not scale to repositories the size of Nixpkgs. Eka CI replaces that workflow with a small set of strong signals attached directly to each pull request.
What Eka CI provides
- GitHub App integration — webhook-based event handling, check runs, merge queue support, and fine-grained credential management.
- Nix-aware build orchestration — dependency graph tracking with an LRU cache, multi-tier
build queues, dedicated FOD queue, remote builders, and
requiredSystemFeaturessupport. - Binary cache integration — S3, Cachix, and Attic, with credential sources ranging from
environment variables to Vault, AWS Secrets Manager, and
systemd-creds. - Change summaries and rebuild impact — per-PR diffs of which packages changed and how many derivations have to rebuild, posted as a single GitHub check.
- Build metrics — output (NAR) size and closure size tracked over time, compared against the base branch, with configurable thresholds.
- PR comment commands —
@eka-ci mergeand friends for queueing merges from a comment.
Components
Eka CI is a Cargo workspace with two main binaries:
eka-ci-server— the long-running CI server that talks to GitHub and orchestrates builds.ekaci— a CLI client that talks to the server over a Unix socket.
A web frontend (Elm) lives alongside but is partially implemented; the HTTP API and WebSocket endpoints are the supported integration surface today.
How to read these docs
If you are setting Eka CI up for the first time, start with Quick Start and then work through Installation and GitHub App Setup.
If you are operating an existing deployment, the LRU Cache Tuning and Monitoring & Metrics pages are the most useful starting points.
For a deeper picture of how the server is built, see Architecture.
Quick Start
This page walks through the minimum set of steps required to get Eka CI watching a single repository. For deeper detail on each step, follow the linked pages.
Prerequisites
- Nix package manager installed (with flakes enabled)
- A GitHub organization with admin access
- A publicly reachable HTTPS endpoint for receiving webhooks
1. Create a GitHub App
Eka CI authenticates to GitHub as a GitHub App. Create one at
https://github.com/organizations/YOUR_ORG/settings/apps with:
- Permissions: Checks (read/write), Contents (read), Pull Requests (read)
- Events:
pull_request,workflow_run,merge_group,installation
Generate and download the private key. The full walkthrough, including all eight credential sources, lives in GitHub App Setup.
2. Configure the server
Create ~/.config/ekaci/ekaci.toml:
[[github_apps]]
id = "main"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }
[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }
[caches.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = ["main", "release/*"]
See Server Configuration and Configuring Caches for the full set of options.
3. Configure the repository
Add .eka-ci/config.json at the root of any repository you want Eka CI to build:
{
"jobs": {
"my-package": {
"file": "default.nix",
"allow_eval_failures": true,
"caches": ["production-s3"]
}
},
"checks": {
"nixfmt": {
"shell": "formatting",
"command": "nixfmt --check **/*.nix",
"allow_network": false
}
}
}
See Repository Configuration for the full schema.
4. Run the server
nix build
./result/bin/eka-ci-server
For a long-running deployment, run it under systemd. A minimal unit file is included in Installation.
5. Open a pull request
Once the server is running and the GitHub App is installed on a repository, opening a pull request will trigger:
- An evaluation of the repository against the PR head and base.
- A diff of derivations and a queued build for the changes.
- One or more check runs reporting build status.
- A
EkaCI: Change Summarycheck posting a per-PR summary of changed packages and rebuild impact (see Change Summaries).
From there, reviewers can merge through the GitHub UI or by commenting @eka-ci merge —
see PR Comment Commands.
Installation
Eka CI is distributed as a Nix flake. The build produces two binaries:
eka-ci-server— the CI server daemon.ekaci— a CLI client that talks to the server over a Unix socket.
Build from source
git clone https://github.com/ekala-project/eka-ci.git
cd eka-ci
nix build
./result/bin/eka-ci-server --help
The flake exposes the standard packages.default attribute, so it can also be consumed from
another flake:
{
inputs.eka-ci.url = "github:ekala-project/eka-ci";
outputs = { self, nixpkgs, eka-ci, ... }: {
# ...
nixosConfigurations.example = nixpkgs.lib.nixosSystem {
modules = [
({ pkgs, ... }: {
environment.systemPackages = [ eka-ci.packages.${pkgs.system}.default ];
})
];
};
};
}
Run as a systemd service
A minimal unit file:
[Unit]
Description=eka-ci server
After=network.target
[Service]
Type=simple
ExecStart=/path/to/eka-ci-server
Restart=on-failure
User=eka-ci
Environment="RUST_LOG=info"
# Example: provide credentials for cache backends
Environment="VAULT_TOKEN=s.your-token"
[Install]
WantedBy=multi-user.target
For production deployments you will likely want to add systemd hardening
(ProtectSystem=strict, PrivateTmp=true, NoNewPrivileges=true, etc.) and to load
secrets via LoadCredential= and the systemd credential source documented in
GitHub App Setup.
Required state directories
By default the server stores state under paths that can be overridden in
ekaci.toml:
| Purpose | Default | Setting |
|---|---|---|
| SQLite database | ~/.local/share/ekaci/sqlite.db | db_path |
| Build logs | ~/.local/share/ekaci/logs | logs_dir |
| Unix socket | $XDG_RUNTIME_DIR/ekaci.sock | socket_path |
For a multi-user system service you typically want these under /var/lib/ekaci and
/var/log/ekaci. See Server Configuration.
Verify the install
Once the server is running you can ping it via the CLI:
ekaci status
And confirm the metrics endpoint is reachable:
curl http://127.0.0.1:3030/metrics | head
If both succeed, continue to GitHub App Setup.
GitHub App Setup and Configuration Guide
Complete guide for creating, configuring, and securing GitHub Apps for eka-ci.
Table of Contents
- Introduction
- Part 1: Creating the GitHub App
- Part 2: Obtaining Credentials
- Part 3: Securing Your Credentials
- Development: Environment Variables
- Production Option 1: File-Based with Restricted Permissions
- Production Option 2: GitHub App Key File
- Production Option 3: HashiCorp Vault (Recommended)
- Production Option 4: AWS Secrets Manager
- Production Option 5: systemd Credentials (Linux with TPM2)
- Production Option 6: Instance Metadata (Cloud VMs)
- Production Option 7: AWS Profile
- Part 4: Installing the GitHub App
- Part 5: Configuring eka-ci Server
- Part 6: Testing the Setup
- Security Best Practices
- Troubleshooting
- Advanced Topics
- API Reference
- FAQ
- Related Documentation
Introduction
What is a GitHub App?
A GitHub App is a first-class integration with GitHub that provides:
- Fine-grained permissions
- Webhook-based event delivery
- Organization-wide installation
- Higher API rate limits
- Better security than personal access tokens
Why GitHub Apps?
eka-ci uses GitHub Apps because they:
- ✅ Fine-grained permissions - Request only the access you need
- ✅ Organization-wide installation - One setup for all repositories
- ✅ Better security - Credentials can't be used to access user data
- ✅ Webhook integration - Automatic notifications for CI events
- ✅ Rate limit advantages - Higher API rate limits (5,000 vs 1,000 requests/hour)
Prerequisites
Before you begin:
- Administrative access to the GitHub organization where you want to install eka-ci
- A running eka-ci server with a publicly accessible URL (for webhooks)
- Access to secure credential storage (Vault, AWS Secrets Manager, or similar for production)
Part 1: Creating the GitHub App
Step 1: Navigate to GitHub App Settings
-
Go to your organization's settings page:
https://github.com/organizations/YOUR_ORG/settings/appsOr for personal accounts:
https://github.com/settings/apps -
Click "New GitHub App"
Step 2: Configure Basic Information
Fill in the basic app information:
| Field | Value | Notes |
|---|---|---|
| GitHub App name | eka-ci (or your preferred name) | Must be unique across GitHub |
| Homepage URL | https://your-eka-ci-server.com | Your eka-ci server's public URL |
| Description | Continuous Integration for Nix projects | Optional but recommended |
| Callback URL | Leave empty | Not used by eka-ci |
| Setup URL | Leave empty | Not used by eka-ci |
Step 3: Configure Webhook Settings
This is critical for eka-ci to receive events:
| Field | Value | Notes |
|---|---|---|
| Webhook URL | https://your-eka-ci-server.com/github/webhook | Must be publicly accessible |
| Webhook secret | Generate a strong secret | IMPORTANT: Save this securely! |
Generating a webhook secret:
# Generate a random secret
openssl rand -hex 32
# Example output:
# 3f8a9c7b2e1d6f4a8b9c7e2d1f6a4b9c8e7d2f1a6b4c9e8d7f2a1b6c4e9d8f7
⚠️ Security Note: The webhook secret verifies that webhook payloads come from GitHub. While eka-ci currently doesn't verify this signature (pending implementation), you should still configure it for future use.
Webhook settings:
- Content type:
application/json - SSL verification: ✅ Enable SSL verification (required for production)
Step 4: Configure Permissions
eka-ci requires these Repository permissions:
| Permission | Access Level | Purpose |
|---|---|---|
| Checks | Read & Write | Create and update CI check runs on PRs |
| Contents | Read only | Clone repositories and read source code |
| Pull requests | Read only | Receive PR events and read PR metadata |
| Metadata | Read only | Default permission (automatically included) |
Do NOT grant:
- Write access to Contents, Pull Requests, or Issues (not needed)
- Any Organization permissions
- Any Account permissions
Step 5: Subscribe to Events
Enable these webhook events:
- ✅ Pull request - Triggers builds on PR open, update, close
- ✅ Pull request review - Re-checks auto-merge eligibility when maintainers approve or dismiss reviews
- ✅ Workflow run - For approval workflow integration
- ✅ Merge group - For GitHub merge queue support
- ✅ Installation - Tracks when app is installed/uninstalled
- ✅ Installation repositories - Tracks repository access changes
Do NOT enable:
- Push events (eka-ci is PR-focused)
- Issue events (not used)
- Other events (creates unnecessary webhook traffic)
Step 6: Installation Scope
Choose "Only on this account" unless you plan to distribute eka-ci as a public service.
Step 7: Create the App
- Review your settings
- Click "Create GitHub App"
- You'll be redirected to your app's settings page
Part 2: Obtaining Credentials
After creating the app, you need two pieces of information:
App ID
- On your GitHub App's settings page, find "App ID" near the top
- It's a numeric value like
123456 - Save this - you'll need it for eka-ci configuration
Private Key
- Scroll down to the "Private keys" section
- Click "Generate a private key"
- A
.pemfile will download automatically - CRITICAL: Store this file securely - it cannot be recovered if lost!
The downloaded file looks like:
your-app-name.YYYY-MM-DD.private-key.pem
Contents:
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA1234567890abcdefghijklmnopqrstuvwxyz...
...multiple lines of base64-encoded key data...
-----END RSA PRIVATE KEY-----
Part 3: Securing Your Credentials
⚠️ NEVER commit credentials to version control!
eka-ci supports 8 different methods for securing GitHub App credentials. Choose based on your environment:
Development: Environment Variables
Pros: Simple, quick setup Cons: Not suitable for production, credentials in memory
export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY="$(cat your-app-name.private-key.pem)"
./eka-ci-server
eka-ci configuration (~/.config/ekaci/ekaci.toml):
# No configuration needed - automatic fallback to environment variables
# OR explicitly:
[[github_apps]]
id = "dev-app"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }
⚠️ Not recommended for production! Use one of the secure methods below.
Production Option 1: File-Based with Restricted Permissions
Pros: Simple, no external dependencies Cons: Credentials on disk, manual rotation
- Create a secure directory:
sudo mkdir -p /etc/eka-ci
sudo chmod 700 /etc/eka-ci
- Create a credentials file (JSON format):
sudo tee /etc/eka-ci/github-app.json <<EOF
{
"GITHUB_APP_ID": "123456",
"GITHUB_APP_PRIVATE_KEY": "$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')"
}
EOF
sudo chmod 600 /etc/eka-ci/github-app.json
sudo chown eka-ci:eka-ci /etc/eka-ci/github-app.json
- Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }
Supported file formats:
JSON:
{
"GITHUB_APP_ID": "123456",
"GITHUB_APP_PRIVATE_KEY": "-----BEGIN RSA PRIVATE KEY-----\n..."
}
Key=value:
GITHUB_APP_ID=123456
GITHUB_APP_PRIVATE_KEY=-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
Production Option 2: GitHub App Key File
Pros: Keeps private key in original PEM format Cons: Credentials on disk
- Store the private key securely:
sudo cp your-app-name.private-key.pem /etc/eka-ci/github-app-key.pem
sudo chmod 600 /etc/eka-ci/github-app-key.pem
sudo chown eka-ci:eka-ci /etc/eka-ci/github-app-key.pem
- Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { github-app-key-file = {
app_id_env = "GITHUB_APP_ID",
key_file = "/etc/eka-ci/github-app-key.pem"
}}
- Set the App ID:
export GITHUB_APP_ID=123456
Production Option 3: HashiCorp Vault (Recommended)
Pros: Best security, audit logs, automatic rotation Cons: Requires Vault infrastructure
- Store credentials in Vault:
# First, format the private key for JSON (escape newlines)
PRIVATE_KEY=$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')
# Store in Vault
vault kv put secret/eka-ci/github-app \
GITHUB_APP_ID="123456" \
GITHUB_APP_PRIVATE_KEY="${PRIVATE_KEY}"
- Configure eka-ci:
[[github_apps]]
id = "production"
[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
namespace = "production" # Optional, for Vault Enterprise
- Run eka-ci with Vault token:
export VAULT_TOKEN=s.your-vault-token
./eka-ci-server
Production Option 4: AWS Secrets Manager
Pros: Managed service, integrates with IAM Cons: AWS-specific, costs money
- Create secret in AWS Secrets Manager:
# Format the private key
PRIVATE_KEY=$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')
# Create secret
aws secretsmanager create-secret \
--name eka-ci/github-app \
--description "eka-ci GitHub App credentials" \
--secret-string "{\"GITHUB_APP_ID\":\"123456\",\"GITHUB_APP_PRIVATE_KEY\":\"${PRIVATE_KEY}\"}"
- Configure eka-ci:
[[github_apps]]
id = "production"
[github_apps.credentials.aws-secrets-manager]
secret_name = "eka-ci/github-app"
region = "us-east-1" # Optional, defaults to AWS_REGION env var
- Ensure eka-ci has IAM permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:eka-ci/github-app-*"
}
]
}
Production Option 5: systemd Credentials (Linux with TPM2)
Pros: Hardware-encrypted, no external dependencies Cons: Requires systemd 250+, TPM2 chip
- Create credential file:
# Format credentials as JSON
cat > /tmp/github-app.json <<EOF
{
"GITHUB_APP_ID": "123456",
"GITHUB_APP_PRIVATE_KEY": "$(cat your-app-name.private-key.pem | sed 's/$/\\n/' | tr -d '\n')"
}
EOF
- Encrypt with systemd:
# Encrypt using TPM2
sudo systemd-creds encrypt \
--name=github-app-credentials \
/tmp/github-app.json \
/var/lib/systemd/credential/github-app.cred
# Clean up plaintext
shred -u /tmp/github-app.json
- Configure systemd service:
[Service]
LoadCredential=github-app-credentials:/var/lib/systemd/credential/github-app.cred
- Configure eka-ci:
[[github_apps]]
id = "production"
credentials = { systemd-credential = { name = "github-app-credentials" } }
Production Option 6: Instance Metadata (Cloud VMs)
Pros: No credentials on disk, automatic rotation Cons: Requires IAM role setup, cloud-specific
For EC2/GCP/Azure instances with IAM roles that can access AWS Secrets Manager:
[[github_apps]]
id = "cloud-production"
credentials = "instance-metadata"
The instance profile must have permissions to access Secrets Manager (see Option 4).
Production Option 7: AWS Profile
Pros: Uses existing AWS credentials Cons: AWS-specific
Use credentials from ~/.aws/credentials:
[[github_apps]]
id = "aws-profile-app"
credentials = { aws-profile = { profile = "eka-ci-production" } }
~/.aws/credentials:
[eka-ci-production]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Part 4: Installing the GitHub App
Step 1: Install on Your Organization
-
Go to your GitHub App's settings page
-
Click "Install App" in the left sidebar
-
Select your organization
-
Choose repository access:
- "All repositories" - eka-ci will build all repos (recommended)
- "Only select repositories" - Choose specific repos
-
Click "Install"
Step 2: Verify Installation
eka-ci automatically tracks installations via webhooks. Check the logs:
# Look for installation confirmation
journalctl -u eka-ci -f | grep -i "installation"
# Expected output:
# INFO eka_ci_server::github::webhook: Received installation event: created
# INFO eka_ci_server::db: Stored GitHub installation: id=12345678
Part 5: Configuring eka-ci Server
Basic Configuration
Create or edit ~/.config/ekaci/ekaci.toml:
# GitHub App configuration
[[github_apps]]
id = "main"
# Choose ONE credential source from Part 3
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }
# Permissions (optional, defaults to allow all)
[github_apps.permissions]
allow_all = true
All Credential Source Options
Quick reference for all available credential sources:
# 1. Environment Variables
[[github_apps]]
id = "env-based"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }
# 2. File (JSON or key=value)
[[github_apps]]
id = "file-based"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }
# 3. GitHub App Key File
[[github_apps]]
id = "key-file"
credentials = { github-app-key-file = {
app_id_env = "GITHUB_APP_ID",
key_file = "/etc/eka-ci/github-app-key.pem"
}}
# 4. HashiCorp Vault
[[github_apps]]
id = "vault"
[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
namespace = "production" # Optional
# 5. AWS Secrets Manager
[[github_apps]]
id = "aws-sm"
[github_apps.credentials.aws-secrets-manager]
secret_name = "eka-ci/github-app"
region = "us-east-1" # Optional
# 6. systemd Credentials
[[github_apps]]
id = "systemd"
credentials = { systemd-credential = { name = "github-app-credentials" } }
# 7. Instance Metadata
[[github_apps]]
id = "imds"
credentials = "instance-metadata"
# 8. AWS Profile
[[github_apps]]
id = "aws-profile"
credentials = { aws-profile = { profile = "eka-ci-production" } }
Permission Controls
GitHub App permissions allow you to restrict which repositories and branches can use specific GitHub App credentials.
Allow All (Default)
[[github_apps]]
id = "unrestricted-app"
credentials = { /* ... */ }
[github_apps.permissions]
allow_all = true
Repository Restrictions
Only specific repositories can use this GitHub App:
[[github_apps]]
id = "restricted-app"
credentials = { /* ... */ }
[github_apps.permissions]
allow_all = false
allowed_repos = [
"myorg/repo1",
"myorg/repo2",
"anotherorg/special-repo"
]
Branch Restrictions
Restrict to specific branches or branch patterns:
[[github_apps]]
id = "production-only-app"
credentials = { /* ... */ }
[github_apps.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = [
"main",
"master",
"release/*",
"hotfix/*"
]
Glob pattern support:
"main"- exact match"release/*"- prefix match (e.g., release/v1.0, release/v2.0)"*/staging"- suffix match"*"- match all
Complete Configuration Examples
Development Setup
Simple environment variable-based setup:
# ~/.config/ekaci/ekaci.toml
# No github_apps section needed - falls back to environment variables
# Set: GITHUB_APP_ID and GITHUB_APP_PRIVATE_KEY
Or explicitly:
[[github_apps]]
id = "dev"
credentials = { env = { vars = ["GITHUB_APP_ID", "GITHUB_APP_PRIVATE_KEY"] } }
[github_apps.permissions]
allow_all = true
Production with Vault
Multi-environment setup with Vault:
# Production app - restricted to production repos
[[github_apps]]
id = "production"
[github_apps.credentials.vault]
address = "https://vault.prod.example.com:8200"
secret_path = "eka-ci/github-app-prod"
token_env = "VAULT_TOKEN"
namespace = "production"
[github_apps.permissions]
allow_all = false
allowed_repos = ["company/production-*"]
allowed_branches = ["main", "release/*"]
# Staging app - restricted to staging repos
[[github_apps]]
id = "staging"
[github_apps.credentials.vault]
address = "https://vault.staging.example.com:8200"
secret_path = "eka-ci/github-app-staging"
token_env = "VAULT_TOKEN"
namespace = "staging"
[github_apps.permissions]
allow_all = false
allowed_repos = ["company/staging-*"]
allowed_branches = ["main", "develop", "feature/*"]
AWS-Based Production
Using AWS Secrets Manager with instance metadata:
[[github_apps]]
id = "aws-production"
[github_apps.credentials.aws-secrets-manager]
secret_name = "prod/eka-ci/github-app"
region = "us-east-1"
[github_apps.permissions]
allow_all = false
allowed_repos = ["mycompany/*"]
allowed_branches = ["main", "release/*"]
Part 6: Testing the Setup
Test 1: Check Server Startup
Start eka-ci and verify GitHub App registration:
./eka-ci-server
# Expected log output:
# INFO eka_ci_server::github: Registering GitHub App from configuration: main
# INFO eka_ci_server::github: Successfully registered as GitHub app
Test 2: Create a Test Pull Request
- Create a test branch in one of your repositories
- Make a trivial change and open a PR
- Check that eka-ci creates a check run on the PR
What to look for:
- A check run appears on the PR (usually named "eka-ci")
- Initial status is "Queued" or "In Progress"
- Check the eka-ci logs for webhook receipt:
journalctl -u eka-ci -f | grep webhook # Expected: # INFO eka_ci_server::github::webhook: Received pull_request event: opened # INFO eka_ci_server::scheduler: Queued build for PR #123
Test 3: Verify Webhook Delivery
On GitHub:
- Go to your GitHub App's settings
- Click "Advanced" tab
- Scroll to "Recent Deliveries"
- Verify webhooks are being delivered successfully (green checkmarks)
- If you see red X's, click to view the error details
Security Best Practices
Protect Your Private Key
- ✅ DO: Store in a secret manager (Vault, AWS Secrets Manager)
- ✅ DO: Use file permissions 600 (owner read/write only)
- ✅ DO: Encrypt with TPM2 (systemd credentials)
- ❌ DON'T: Commit to Git
- ❌ DON'T: Store in Docker images
- ❌ DON'T: Share via chat/email
- ❌ DON'T: Log to files or stdout
Rotate Credentials Regularly
How to rotate the private key:
- Generate a new private key on GitHub App settings
- Update the key in your secret manager
- Restart eka-ci to load the new key
- Delete the old key from GitHub
Recommended rotation schedule:
- Production: Every 90 days
- Staging: Every 180 days
- Development: Yearly
Use Webhook Secrets
Configure a webhook secret and verify signatures in your eka-ci deployment.
⚠️ Current Status: Webhook signature verification is planned but not yet implemented in eka-ci. You should still configure a webhook secret for future use.
To implement verification (for contributors):
#![allow(unused)] fn main() { // In webhook handler, verify HMAC-SHA256 signature let signature = headers.get("X-Hub-Signature-256"); let payload = request.body(); let expected = hmac_sha256(webhook_secret, payload); assert_eq!(signature, expected); }
Principle of Least Privilege
Only grant the minimum required permissions:
- ✅ Checks: Read & Write (required)
- ✅ Contents: Read only (required)
- ✅ Pull Requests: Read only (required)
- ❌ Never grant write access to Contents, PRs, or Issues
- ❌ Never grant Organization or Account permissions
Monitor and Audit
Monitor webhook delivery:
# Check for webhook failures
journalctl -u eka-ci | grep -i "webhook.*error"
# Monitor installation changes
journalctl -u eka-ci | grep -i "installation"
Audit credential access:
- Enable Vault audit logging
- Enable AWS CloudTrail for Secrets Manager
- Review systemd journal for credential loads
Additional security practices:
- Never commit credentials to version control
- Use
.gitignorefor credential files - Always use secret management systems in production
- Create separate GitHub Apps for different environments
- Use permission restrictions to limit blast radius
- Enable audit logging in Vault/AWS
- Monitor who accesses GitHub App credentials
- Review permission configurations regularly
- Use instance metadata in cloud deployments to avoid storing long-lived credentials
Network Security
Webhook endpoint security:
- ✅ Use HTTPS (required for production)
- ✅ Use a valid SSL certificate
- ✅ Configure firewall to allow GitHub IPs only (optional)
- ✅ Use webhook secrets when implemented
GitHub IP ranges (for firewall rules):
# Download GitHub's IP ranges
curl https://api.github.com/meta | jq -r '.hooks[]'
# Example firewall rule (iptables)
iptables -A INPUT -p tcp --dport 443 -s 192.30.252.0/22 -j ACCEPT
Secure the Server
Server hardening checklist:
- ✅ Run eka-ci as non-root user
- ✅ Use systemd sandboxing features
- ✅ Enable SELinux or AppArmor
- ✅ Keep dependencies updated
- ✅ Enable automatic security updates
- ✅ Monitor logs for suspicious activity
Troubleshooting
GitHub App Registration Fails
Error: failed to locate $GITHUB_APP_ID
Cause: Credentials not properly configured
Solution:
- Check your configuration file syntax
- Verify the credentials file exists and has correct permissions
- For Vault/AWS, verify connectivity and permissions
- Check eka-ci logs for detailed error messages
If using the configuration file, ensure you have:
[[github_apps]]
id = "..."
credentials = { /* valid credential source */ }
Webhooks Not Received
Symptoms: PRs don't trigger builds
Debugging steps:
-
Verify webhook URL is correct:
curl https://your-eka-ci-server.com/github/webhook # Should return 405 Method Not Allowed (GET not supported) -
Check GitHub webhook deliveries:
- Go to GitHub App settings → Advanced → Recent Deliveries
- Look for failed deliveries (red X)
- Click to see error details
-
Common webhook errors:
- SSL certificate error: Fix your SSL cert or disable verification (dev only)
- Timeout: Server is slow or down
- Connection refused: Firewall blocking GitHub IPs
-
Check eka-ci logs:
journalctl -u eka-ci -n 100 | grep webhook
Permission Denied Errors
Error: Repository myorg/myrepo is not allowed to use GitHub App production
Solution: Update permissions in config:
[github_apps.permissions]
allow_all = false
allowed_repos = ["myorg/myrepo", "myorg/*"]
Or if the issue is GitHub App permissions:
Cause: GitHub App doesn't have required permissions
Solution:
- Go to GitHub App settings → Permissions
- Verify:
- Checks: Read & Write
- Contents: Read
- Pull Requests: Read
- If you changed permissions, you must reinstall the app:
- Go to Installations
- Click "Configure"
- Accept new permissions
Check Runs Not Appearing
Symptoms: Webhook received but no check run created
Debugging:
-
Check logs for errors:
journalctl -u eka-ci -f | grep -E "(check|error)" -
Verify repository is configured:
- Repository must have
.eka-ci/config.json - Configuration must define jobs
- Repository must have
-
Check GitHub API rate limits:
# The eka-ci server logs should show rate limit status journalctl -u eka-ci | grep "rate limit"
Private Key Format Errors
Error: "invalid value for $GITHUB_APP_PRIVATE_KEY"
Cause: Private key not properly formatted for storage
Solution:
For JSON files, escape newlines:
# Convert PEM to JSON-safe format
PRIVATE_KEY=$(cat key.pem | sed 's/$/\\n/' | tr -d '\n')
echo "{\"GITHUB_APP_PRIVATE_KEY\":\"${PRIVATE_KEY}\"}"
For environment variables, preserve newlines:
# Use actual newlines
export GITHUB_APP_PRIVATE_KEY="$(cat key.pem)"
Vault Connection Fails
Error: Failed to read secret from Vault path
Checklist:
- Verify Vault address is correct
- Check VAULT_TOKEN environment variable is set
- Verify secret path exists:
vault kv get secret/eka-ci/github-app - Check Vault namespace if using enterprise Vault
- Ensure Vault token has read permissions
AWS Secrets Manager Fails
Error: Failed to retrieve secret from AWS Secrets Manager
Checklist:
- Verify AWS credentials are configured (env vars or instance profile)
- Check secret name is correct
- Verify region setting
- Ensure IAM permissions include
secretsmanager:GetSecretValue - Check secret is in JSON format with correct keys
Multiple GitHub Apps Not Supported
Current Status: eka-ci uses the first configured GitHub App for all repositories.
Workaround: Use permission restrictions to limit apps to specific repos:
[[github_apps]]
id = "prod"
credentials = { /* ... */ }
[github_apps.permissions]
allowed_repos = ["myorg/prod-*"]
[[github_apps]]
id = "dev"
credentials = { /* ... */ }
[github_apps.permissions]
allowed_repos = ["myorg/dev-*"]
⚠️ Note: Only the first app will be used currently. Multi-app support is planned.
Advanced Topics
Approval Workflow Integration
eka-ci supports requiring approval before running builds (to prevent malicious PRs from external contributors):
-
Enable approval requirement:
./eka-ci-server --require-approval -
Or in systemd:
[Service] Environment="EKA_CI_REQUIRE_APPROVAL=true" -
Approve users via the web UI or API
Merge Queue Support
eka-ci supports GitHub's merge queue feature:
- Enable merge queue on your repository (Settings → General → Merge queue)
- eka-ci will automatically receive
merge_groupevents - Builds will run for merge queue entries
OAuth Integration (Optional)
eka-ci also supports OAuth for web UI authentication:
[oauth]
client_id = "your-oauth-app-client-id"
client_secret = "your-oauth-app-client-secret"
redirect_url = "https://your-eka-ci-server.com/github/auth/callback"
Note: This is separate from the GitHub App and is optional.
Migration from Environment Variables
From Environment Variables to Vault
Before:
export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
./eka-ci-server
After:
- Store credentials in Vault:
vault kv put secret/eka-ci/github-app \
GITHUB_APP_ID="123456" \
GITHUB_APP_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----..."
- Update configuration:
[[github_apps]]
id = "main"
[github_apps.credentials.vault]
address = "https://vault.example.com:8200"
secret_path = "eka-ci/github-app"
token_env = "VAULT_TOKEN"
- Start server with Vault token:
export VAULT_TOKEN=your-vault-token
./eka-ci-server
API Reference
CredentialSource Enum
All available credential source variants:
#![allow(unused)] fn main() { pub enum CredentialSource { // Environment variables Env { vars: Vec<String> }, // File-based (JSON or key=value) File { path: PathBuf }, // AWS profile from ~/.aws/credentials AwsProfile { profile: String }, // Cachix token (for backward compatibility) CachixToken { env_var: String }, // HashiCorp Vault Vault { address: String, secret_path: String, token_env: String, namespace: Option<String>, }, // AWS Secrets Manager AwsSecretsManager { secret_name: String, region: Option<String>, }, // systemd credentials SystemdCredential { name: String }, // Instance metadata service InstanceMetadata, // GitHub App key file GitHubAppKeyFile { app_id_env: String, key_file: PathBuf, }, // No authentication None, } }
GitHubAppConfig Structure
#![allow(unused)] fn main() { pub struct GitHubAppConfig { pub id: String, pub credentials: CredentialSource, pub permissions: GitHubAppPermissions, } pub struct GitHubAppPermissions { pub allow_all: bool, pub allowed_repos: Vec<String>, pub allowed_branches: Vec<String>, } }
FAQ
Q: Can I use the same GitHub App for multiple eka-ci servers? A: Not recommended. Each eka-ci instance should have its own GitHub App to avoid conflicts.
Q: What happens if my private key is compromised? A: Immediately revoke it on GitHub App settings and generate a new one. Update your secret manager and restart eka-ci.
Q: Can I use a GitHub Personal Access Token instead? A: No, eka-ci requires a GitHub App. Personal access tokens don't support the required webhooks and permissions model.
Q: Do I need a separate GitHub App for each repository? A: No, one GitHub App can be installed on multiple repositories in the same organization.
Q: How do I migrate from environment variables to Vault? A: See Migration from Environment Variables.
Q: Is webhook signature verification implemented? A: Not yet. It's mentioned in the architecture but not implemented. You should still configure a webhook secret for future use.
Q: Which credential source should I use for production? A: HashiCorp Vault is recommended for best security. AWS Secrets Manager is good for AWS deployments. systemd credentials are excellent for single-server deployments with TPM2.
Q: Can I configure multiple GitHub Apps? A: Yes, you can configure multiple apps in the config file, but currently only the first one will be used. Use permission controls to route different repos to different apps (with the caveat that only the first app is active).
Related Documentation
- Configuring Caches - Similar credential source configuration for caches
- Security Best Practices - General security guidelines (if exists)
- Deployment Guide - Production deployment strategies (if exists)
Contributing
If you encounter issues with this setup process:
- Check existing issues: https://github.com/ekala-project/eka-ci/issues
- Report bugs with detailed logs and configuration (redact secrets!)
- Contribute improvements to this documentation
Last Updated: 2024-04-10 eka-ci Version: Latest
NixOS Module
The eka-ci flake provides a NixOS module at nixosModules.daemon that exposes the
service under services.eka-ci. The module uses the RFC-42 "settings" pattern: most
configuration is freeform TOML that gets serialized to ekaci.toml, with common fields
typed for validation and auto-generated documentation.
Quick Start
{
inputs.eka-ci.url = "github:ekala-project/eka-ci";
outputs = { self, nixpkgs, eka-ci, ... }: {
nixosConfigurations.example = nixpkgs.lib.nixosSystem {
modules = [
eka-ci.nixosModules.daemon
{
services.eka-ci = {
enable = true;
environmentFile = "/run/secrets/eka-ci.env";
settings = {
github_apps = [{
id = "main";
credentials.systemd-credential.name = "github-app-key";
}];
security.allow_insecure_webhooks = false;
};
};
}
];
};
};
}
The service runs as a systemd DynamicUser by default, stores state under
/var/lib/eka-ci, and listens on 127.0.0.1:3030.
Top-Level Options
services.eka-ci.enable
Type: boolean
Default: false
Enable the EkaCI server.
services.eka-ci.package
Type: package
Default: pkgs.eka-ci
Package providing the eka_ci_server binary.
services.eka-ci.user / services.eka-ci.group
Type: string
Default: "eka-ci"
User and group the service runs as when dynamicUser = false. Ignored when
dynamicUser = true.
services.eka-ci.dynamicUser
Type: boolean
Default: true
Use systemd's DynamicUser= to run the service under an ephemeral user/group. Recommended
unless you need a stable UID for filesystem permissions on shared storage.
services.eka-ci.openFirewall
Type: boolean
Default: false
Open settings.web.port in the system firewall.
services.eka-ci.environmentFile
Type: null or path
Default: null
Path to a file passed to systemd as EnvironmentFile=. Use this to provide secrets such
as WEBHOOK_SECRET, GITHUB_OAUTH_CLIENT_SECRET, JWT_SECRET, VAULT_TOKEN,
GITEA_TOKEN, GITLAB_TOKEN, AWS keys, and any environment variables referenced from
settings.caches.*.credentials.env.vars.
The file is read by systemd at start time and never enters the Nix store.
Example (/run/secrets/eka-ci.env):
WEBHOOK_SECRET=your-webhook-secret
GITHUB_OAUTH_CLIENT_SECRET=...
JWT_SECRET=...
VAULT_TOKEN=s.abc123...
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
# For Gitea integration (single instance)
GITEA_TOKEN=your-gitea-token
GITEA_DOMAIN=gitea.example.com
# For GitLab integration (single instance)
GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxx
GITLAB_DOMAIN=gitlab.com
services.eka-ci.credentials
Type: attribute set of path
Default: {}
Map of credential name to file path, wired through systemd's LoadCredential=. Each entry
becomes available inside the unit at $CREDENTIALS_DIRECTORY/<name> and can be referenced
from ekaci.toml via the systemd-credential credential source.
Example:
services.eka-ci = {
credentials.github-app-key = "/run/secrets/github-app.json";
settings.github_apps = [{
id = "main";
credentials.systemd-credential.name = "github-app-key";
}];
};
Pairs naturally with sops-nix, agenix, or systemd-creds.
services.eka-ci.extraEnvironment
Type: attribute set of string
Default: {}
Additional Environment= entries passed to the systemd unit.
Example:
services.eka-ci.extraEnvironment = {
RUST_LOG = "eka_ci_server::scheduler=debug,info";
};
Settings Options
The services.eka-ci.settings submodule is freeform: any key not explicitly listed below is
still accepted and serialized to ekaci.toml as-is. Typed options provide validation and
documentation for the most common fields.
settings.db_path
Type: null or path
Default: null
SQLite database path. When null the server falls back to $XDG_DATA_HOME/ekaci/sqlite.db,
which under this module resolves to /var/lib/eka-ci/ekaci/sqlite.db.
settings.logs_dir
Type: null or path
Default: null
Directory where build logs are stored. When null the server falls back to
$XDG_DATA_HOME/ekaci/build-logs.
settings.require_approval
Type: boolean
Default: false
Require maintainer approval before building PRs from external contributors.
settings.merge_queue_require_approval
Type: boolean
Default: false
Require approval before building entries pulled from the GitHub merge queue.
settings.build_no_output_timeout_seconds
Type: integer between 30 and 86400
Default: 1200
Number of seconds with no build output after which a build is considered hung.
settings.build_max_duration_seconds
Type: integer between 60 and 604800
Default: 14400
Hard upper bound, in seconds, on total build wall-clock time.
settings.graph_lru_capacity
Type: positive integer
Default: 100000
Capacity of the in-memory derivation-graph LRU cache, in nodes. See LRU Cache Tuning for sizing guidance.
settings.default_merge_method
Type: one of "merge", "squash", "rebase"
Default: "squash"
Default merge method used by the @eka-ci merge PR comment command.
settings.web
Type: submodule
HTTP server settings.
web.address(string, default"127.0.0.1"): IPv4 address the HTTP server binds to.web.port(port, default3030): TCP port the HTTP server binds to.web.bundle_path(null or path, defaultnull): Optional path to a pre-built web UI bundle.web.allowed_origins(list of string, default[]): CORS allow-list. Each entry must be a fully-qualifiedhttp://orhttps://origin with no path, query, fragment, or*wildcard. An empty list rejects all cross-origin requests.
settings.unix
Type: submodule
Unix domain socket settings used by the CLI client.
unix.socket_path(null or path, defaultnull): Unix domain socket the CLI client connects to. Whennullthe server falls back to$XDG_RUNTIME_DIR/ekaci.socket, which under this module resolves to/run/eka-ci/ekaci.socket.
settings.oauth
Type: submodule
OAuth settings for the (optional) web UI.
oauth.client_id(null or string, defaultnull): GitHub OAuth client ID. May also be supplied via theGITHUB_OAUTH_CLIENT_IDenvironment variable (preferred — seeenvironmentFile).oauth.client_secret(null or string, defaultnull): GitHub OAuth client secret. Avoid setting this in Nix — values here end up in the world-readable Nix store. UseenvironmentFileto supplyGITHUB_OAUTH_CLIENT_SECRETinstead.oauth.redirect_url(null or string, defaultnull): OAuth callback URL. Defaults tohttp://{web.address}:{web.port}/github/auth/callbackwhen unset.oauth.jwt_secret(null or string, defaultnull): JWT signing secret. Avoid setting this in Nix. ProvideJWT_SECRETviaenvironmentFile. When omitted entirely, the server generates an ephemeral 256-bit secret on each start (sessions invalidate across restarts).
settings.security
Type: submodule
Security-related settings.
security.max_hook_timeout_seconds(integer between 1 and 86400, default300): Maximum wall-clock time, in seconds, that any post-build hook is allowed to run.security.audit_hooks(boolean, defaulttrue): Emit structured audit log records every time a hook runs.security.webhook_secret(null or string, defaultnull): Webhook HMAC secret used for all platforms (GitHub, GitLab, and Gitea). Avoid setting this in Nix. ProvideWEBHOOK_SECRETviaenvironmentFile. The server refuses to start if no webhook secret is available unlessallow_insecure_webhooksistrue.security.allow_insecure_webhooks(boolean, defaultfalse): Allow the server to start without a webhook secret. Intended for local development only; never enable in production.security.allow_private_cache_hosts(boolean, defaultfalse): Allow cache destinations whose DNS resolves to private/loopback addresses. Disables built-in SSRF protection; only enable in trusted, isolated networks.
settings.caches
Type: list of submodule
Default: []
List of binary caches the server may push to.
Each cache entry has the following fields:
id(string, required): Cache identifier referenced from.eka-ci/config.json.cache_type(one of "nix-copy", "cachix", "attic", required): Backend type for this cache.destination(string, required): Destination URL passed to the chosen backend. Validated for SSRF unlesssettings.security.allow_private_cache_hostsis set.credentials(freeform, required): Credential source. See Credential Sources below.permissions(submodule, default allows all): Repository/branch access control.allow_all(boolean, defaulttrue): Whentrue, ignoresallowed_reposandallowed_branchesand grants access to every repository and branch.allowed_repos(list of string, default[]): Glob patterns ofowner/repostrings that are permitted to use this entry.allowed_branches(list of string, default[]): Glob patterns of branch names permitted to use this entry.
Example:
settings.caches = [{
id = "production-s3";
cache_type = "nix-copy";
destination = "s3://my-bucket/nix-cache?region=us-east-1";
credentials.env.vars = [ "AWS_ACCESS_KEY_ID" "AWS_SECRET_ACCESS_KEY" ];
permissions = {
allow_all = false;
allowed_repos = [ "myorg/*" ];
allowed_branches = [ "main" "release/*" ];
};
}];
settings.github_apps
Type: list of submodule
Default: []
List of GitHub Apps the server authenticates as.
Each GitHub App entry has the following fields:
id(string, required): GitHub App identifier.credentials(freeform, required): Credential source. See Credential Sources below.permissions(submodule, default allows all): Same structure assettings.caches.*.permissions.
Example:
settings.github_apps = [{
id = "main";
credentials.file.path = "/run/secrets/github-app.json";
permissions = {
allow_all = false;
allowed_repos = [ "myorg/*" ];
};
}];
settings.gitea_instances
Type: list of submodule
Default: []
List of Gitea instances the server integrates with. Each instance requires a domain and access token. Supports both Gitea.com and self-hosted instances.
Each Gitea instance entry has the following fields:
domain(string, required): Gitea instance domain (without protocol), e.g.,"gitea.example.com".token(null or string, defaultnull): Gitea access token. Avoid setting this in Nix — useenvironmentFileto supplyGITEA_TOKENinstead (for single instance setups).
Example:
settings.gitea_instances = [
{
domain = "gitea.example.com";
token = null; # Provided via environmentFile
}
{
domain = "code.company.net";
token = null; # Provided via environmentFile
}
];
For single-instance setups, you can use environment variables:
# In environmentFile
GITEA_TOKEN=your-gitea-access-token
GITEA_DOMAIN=gitea.example.com
settings.gitlab_instances
Type: list of submodule
Default: []
List of GitLab instances the server integrates with. Each instance requires a domain and project access token. Supports both GitLab.com and self-hosted instances.
Each GitLab instance entry has the following fields:
domain(string, required): GitLab instance domain (without protocol), e.g.,"gitlab.com"or"gitlab.example.com".token(null or string, defaultnull): GitLab project access token (starts withglpat-). Avoid setting this in Nix — useenvironmentFileto supplyGITLAB_TOKENinstead (for single instance setups).
Example:
settings.gitlab_instances = [
{
domain = "gitlab.com";
token = null; # Provided via environmentFile
}
{
domain = "gitlab.enterprise.com";
token = null; # Provided via environmentFile
}
];
For single-instance setups, you can use environment variables:
# In environmentFile
GITLAB_TOKEN=glpat-xxxxxxxxxxxxxxxx
GITLAB_DOMAIN=gitlab.com
Credential Sources
Both settings.caches.*.credentials and settings.github_apps.*.credentials accept one of
ten credential source variants. The field is freeform (not exhaustively typed) so all variants
serialize correctly. Choose the one that matches your secret-management setup:
1. Environment variables
credentials.env.vars = [ "AWS_ACCESS_KEY_ID" "AWS_SECRET_ACCESS_KEY" ];
The server reads the listed environment variables at runtime. Provide them via
environmentFile.
2. File
credentials.file.path = "/etc/eka-ci/creds.json";
The server reads a JSON or KEY=VALUE file at the given path.
3. AWS profile
credentials.aws-profile.profile = "production";
The server reads credentials from ~/.aws/credentials using the named profile.
4. Cachix token
credentials.cachix-token.env_var = "CACHIX_AUTH_TOKEN";
The server reads a Cachix auth token from the named environment variable.
5. HashiCorp Vault
credentials.vault = {
address = "https://vault.example.com:8200";
secret_path = "secret/data/eka-ci/s3-cache";
token_env = "VAULT_TOKEN"; # optional, defaults to "VAULT_TOKEN"
namespace = "production"; # optional
};
The server authenticates to Vault using the token from token_env and reads the secret at
secret_path.
6. AWS Secrets Manager
credentials.aws-secrets-manager = {
secret_name = "eka-ci/s3-credentials";
region = "us-east-1"; # optional, falls back to AWS_REGION env var
};
The server uses AWS SDK credential resolution (environment, instance metadata, profiles) to authenticate to AWS Secrets Manager and reads the named secret.
7. systemd credential
credentials.systemd-credential.name = "github-app-key";
The server reads the credential from $CREDENTIALS_DIRECTORY/<name>. Pair this with the
top-level services.eka-ci.credentials option:
services.eka-ci.credentials.github-app-key = "/run/secrets/github-app.json";
8. Instance metadata
credentials = "instance-metadata";
The server retrieves credentials from EC2/GCP/Azure instance metadata. No configuration needed.
9. GitHub App key file
credentials.github-app-key-file = {
app_id_env = "GITHUB_APP_ID";
key_file = "/etc/eka-ci/github-app.pem";
};
The server reads the GitHub App ID from the named environment variable and the PEM-encoded private key from the file.
10. None
credentials = "none";
No authentication. Only valid for public caches.
Systemd Hardening
The module applies aggressive systemd hardening by default:
DynamicUser = true(ephemeral user/group)ProtectSystem = "strict"(read-only/usr,/boot,/efi)ProtectHome = true(no access to/home,/root)PrivateTmp = true(isolated/tmp)PrivateDevices = true(empty/dev)NoNewPrivileges = true(no privilege escalation)ProtectKernelModules/Tunables/Logs = trueProtectControlGroups/Clock/Hostname = trueRestrictNamespaces/Realtime/SUIDSGID = trueRestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ]LockPersonality = trueMemoryDenyWriteExecute = trueSystemCallArchitectures = "native"SystemCallFilter = [ "@system-service" "~@privileged" "~@resources" ]- Empty
CapabilityBoundingSetandAmbientCapabilities UMask = "0077"
If you need to relax any of these, override systemd.services.eka-ci.serviceConfig in your
configuration.
Complete Example
{ config, ... }:
{
services.eka-ci = {
enable = true;
openFirewall = false; # Behind a reverse proxy
environmentFile = config.sops.secrets.eka-ci-env.path;
credentials = {
github-app-key = config.sops.secrets.github-app-json.path;
s3-creds = config.sops.secrets.s3-json.path;
};
extraEnvironment.RUST_LOG = "info";
settings = {
web = {
address = "127.0.0.1";
port = 3030;
allowed_origins = [ "https://ci.example.com" ];
};
graph_lru_capacity = 200000; # Large repo
default_merge_method = "squash";
security = {
audit_hooks = true;
allow_insecure_webhooks = false;
};
github_apps = [{
id = "main";
credentials.systemd-credential.name = "github-app-key";
permissions = {
allow_all = false;
allowed_repos = [ "myorg/*" ];
};
}];
gitea_instances = [{
domain = "gitea.example.com";
token = null; # Provided via environmentFile
}];
gitlab_instances = [{
domain = "gitlab.com";
token = null; # Provided via environmentFile
}];
caches = [
{
id = "s3-production";
cache_type = "nix-copy";
destination = "s3://my-bucket/nix-cache?region=us-east-1";
credentials.systemd-credential.name = "s3-creds";
permissions = {
allow_all = false;
allowed_repos = [ "myorg/production-*" ];
allowed_branches = [ "main" ];
};
}
{
id = "cachix-public";
cache_type = "cachix";
destination = "myorg";
credentials.cachix-token.env_var = "CACHIX_AUTH_TOKEN";
}
];
};
};
# Reverse proxy
services.nginx.virtualHosts."ci.example.com" = {
enableACME = true;
forceSSL = true;
locations."/" = {
proxyPass = "http://127.0.0.1:3030";
proxyWebsockets = true;
};
};
}
See Also
- Module Reference — auto-generated complete option reference
- Server Configuration — detailed
ekaci.tomlreference - Configuring Caches — cache setup and credential management
- GitHub App Setup — creating and configuring GitHub Apps
- LRU Cache Tuning — sizing
graph_lru_capacity
NixOS Module Reference
This page is auto-generated from the NixOS module options schema. For a user-friendly guide, see NixOS Module.
services.eka-ci.enable
Whether to enable EkaCI, a Nix-aware Continuous Integration server.
Type: boolean
Default:
false
Example:
true
services.eka-ci.package
The eka-ci package to use.
Type: package
Default:
pkgs.eka-ci
services.eka-ci.credentials
Map of credential name to file path, wired through systemd’s
LoadCredential=. Each entry becomes available inside the unit at
$CREDENTIALS_DIRECTORY/<name> and can be referenced from
ekaci.toml via the systemd-credential credential source, e.g.
services.eka-ci.settings.github_apps = [
{
id = "main";
credentials.systemd-credential.name = "github-app-key";
}
];
Type: attribute set of absolute path
Default:
{ }
Example:
{
github-app-key = "/run/secrets/github-app.json";
s3-creds = "/run/secrets/s3.json";
}
services.eka-ci.dynamicUser
Use systemd’s DynamicUser= to run the service under an ephemeral
user/group. Recommended unless you need a stable UID for filesystem
permissions on shared storage.
Type: boolean
Default:
true
services.eka-ci.environmentFile
Path to a file passed to systemd as EnvironmentFile=. Use this to
provide secrets such as GITHUB_WEBHOOK_SECRET,
GITHUB_OAUTH_CLIENT_SECRET, JWT_SECRET, VAULT_TOKEN, AWS keys,
and any environment variables referenced from
settings.caches.*.credentials.env.vars. The file is read
by systemd at start time and never enters the Nix store.
Type: null or absolute path
Default:
null
Example:
"/run/secrets/eka-ci.env"
services.eka-ci.extraEnvironment
Additional Environment= entries passed to the systemd unit.
Type: attribute set of string
Default:
{ }
Example:
{
RUST_LOG = "eka_ci_server=debug,info";
}
services.eka-ci.group
Group the service runs as when dynamicUser is false.
Ignored when dynamicUser = true.
Type: string
Default:
"eka-ci"
services.eka-ci.openFirewall
Open settings.web.port in the system firewall.
Type: boolean
Default:
false
services.eka-ci.settings
Configuration for EkaCI, serialised verbatim to ekaci.toml. The
submodule is freeform: any key not explicitly modelled here is still
accepted and forwarded as-is to the TOML output.
Type: open submodule of (TOML value)
Default:
{ }
services.eka-ci.settings.build_max_duration_seconds
Hard upper bound, in seconds, on total build wall-clock time.
Type: integer between 60 and 604800 (both inclusive)
Default:
14400
services.eka-ci.settings.build_no_output_timeout_seconds
Number of seconds with no build output after which a build is considered hung.
Type: integer between 30 and 86400 (both inclusive)
Default:
1200
services.eka-ci.settings.caches
List of binary caches the server may push to.
Type: list of (open submodule of (TOML value))
Default:
[ ]
services.eka-ci.settings.caches.*.cache_type
Backend type for this cache.
Type: one of “nix-copy”, “cachix”, “attic”
Example:
"nix-copy"
services.eka-ci.settings.caches.*.credentials
Credential source. One of:
{ env = { vars = [ ... ]; }; }{ file = { path = "/etc/..."; }; }{ aws-profile = { profile = "..."; }; }{ cachix-token = { env_var = "..."; }; }{ vault = { address; secret_path; token_env ? "VAULT_TOKEN"; namespace ? null; }; }{ aws-secrets-manager = { secret_name; region ? null; }; }{ systemd-credential = { name = "..."; }; }"instance-metadata"{ github-app-key-file = { app_id_env; key_file; }; }"none"
Prefer systemd-credential paired with the top-level
services.eka-ci.credentials option to keep secrets out of
the world-readable Nix store.
Type: TOML value
Example:
{
env = {
vars = [
"AWS_ACCESS_KEY_ID"
"AWS_SECRET_ACCESS_KEY"
];
};
}
services.eka-ci.settings.caches.*.destination
Destination URL passed to the chosen backend. Validated for SSRF
unless settings.security.allow_private_cache_hosts is set.
Type: string
Example:
"s3://my-bucket/nix-cache?region=us-east-1"
services.eka-ci.settings.caches.*.id
Cache identifier referenced from .eka-ci/config.json.
Type: string
Example:
"production-s3"
services.eka-ci.settings.caches.*.permissions
Repository/branch access control for this cache.
Type: submodule
Default:
{ }
services.eka-ci.settings.caches.*.permissions.allow_all
When true, ignores allowed_repos and
allowed_branches and grants access to every repository
and branch.
Type: boolean
Default:
true
services.eka-ci.settings.caches.*.permissions.allowed_branches
Glob patterns of branch names permitted to use this entry. Ignored
when allow_all is true.
Type: list of string
Default:
[ ]
Example:
[
"main"
"release/*"
]
services.eka-ci.settings.caches.*.permissions.allowed_repos
Glob patterns of owner/repo strings that are permitted to use this
entry. Ignored when allow_all is true.
Type: list of string
Default:
[ ]
Example:
[
"myorg/*"
]
services.eka-ci.settings.db_path
SQLite database path. When null the server falls back to
$XDG_DATA_HOME/ekaci/sqlite.db which, under this module, resolves
to /var/lib/eka-ci/ekaci/sqlite.db.
Type: null or absolute path
Default:
null
services.eka-ci.settings.default_merge_method
Default merge method used by the @eka-ci merge PR comment command.
Type: one of “merge”, “squash”, “rebase”
Default:
"squash"
services.eka-ci.settings.gitea_instances
List of Gitea instances the server integrates with. Each instance requires a domain and access token. Supports both Gitea.com and self-hosted instances.
Type: list of (open submodule of (TOML value))
Default:
[ ]
services.eka-ci.settings.gitea_instances.*.domain
Gitea instance domain (without protocol).
Type: string
Example:
"gitea.example.com"
services.eka-ci.settings.gitea_instances.*.token
Gitea access token. Avoid setting this in Nix — values
here end up in the world-readable Nix store. Use
services.eka-ci.environmentFile to supply
GITEA_TOKEN instead (for single instance) or configure
tokens via systemd credentials.
Type: null or string
Default:
null
services.eka-ci.settings.github_apps
List of GitHub Apps the server authenticates as.
Type: list of (open submodule of (TOML value))
Default:
[ ]
services.eka-ci.settings.github_apps.*.credentials
Credential source. Same shape as
services.eka-ci.settings.caches.*.credentials.
Type: TOML value
Example:
{
file = {
path = "/etc/eka-ci/github-app.json";
};
}
services.eka-ci.settings.github_apps.*.id
GitHub App identifier referenced from per-app permission lookups.
Type: string
Example:
"main"
services.eka-ci.settings.github_apps.*.permissions
Repository/branch access control for this GitHub App.
Type: submodule
Default:
{ }
services.eka-ci.settings.github_apps.*.permissions.allow_all
When true, ignores allowed_repos and
allowed_branches and grants access to every repository
and branch.
Type: boolean
Default:
true
services.eka-ci.settings.github_apps.*.permissions.allowed_branches
Glob patterns of branch names permitted to use this entry. Ignored
when allow_all is true.
Type: list of string
Default:
[ ]
Example:
[
"main"
"release/*"
]
services.eka-ci.settings.github_apps.*.permissions.allowed_repos
Glob patterns of owner/repo strings that are permitted to use this
entry. Ignored when allow_all is true.
Type: list of string
Default:
[ ]
Example:
[
"myorg/*"
]
services.eka-ci.settings.gitlab_instances
List of GitLab instances the server integrates with. Each instance requires a domain and project access token. Supports both GitLab.com and self-hosted instances.
Type: list of (open submodule of (TOML value))
Default:
[ ]
services.eka-ci.settings.gitlab_instances.*.domain
GitLab instance domain (without protocol).
Type: string
Example:
"gitlab.com"
services.eka-ci.settings.gitlab_instances.*.token
GitLab project access token. Avoid setting this in Nix —
values here end up in the world-readable Nix store. Use
services.eka-ci.environmentFile to supply
GITLAB_TOKEN instead (for single instance) or configure
tokens via systemd credentials.
Type: null or string
Default:
null
services.eka-ci.settings.graph_lru_capacity
Capacity of the in-memory derivation-graph LRU cache, in nodes.
See docs/lru-cache-tuning.md for sizing guidance.
Type: positive integer, meaning >0
Default:
100000
services.eka-ci.settings.logs_dir
Directory where build logs are stored. When null the server
falls back to $XDG_DATA_HOME/ekaci/build-logs.
Type: null or absolute path
Default:
null
services.eka-ci.settings.merge_queue_require_approval
Require approval before building entries pulled from the GitHub merge queue.
Type: boolean
Default:
false
services.eka-ci.settings.oauth
OAuth settings for the (optional) web UI.
Type: open submodule of (TOML value)
Default:
{ }
services.eka-ci.settings.oauth.client_id
GitHub OAuth client ID. May also be supplied via the
GITHUB_OAUTH_CLIENT_ID environment variable (preferred — see
services.eka-ci.environmentFile).
Type: null or string
Default:
null
services.eka-ci.settings.oauth.client_secret
GitHub OAuth client secret. Avoid setting this in Nix — values
here end up in the world-readable Nix store. Use
services.eka-ci.environmentFile to supply
GITHUB_OAUTH_CLIENT_SECRET instead.
Type: null or string
Default:
null
services.eka-ci.settings.oauth.jwt_secret
JWT signing secret. Avoid setting this in Nix. Provide
JWT_SECRET via services.eka-ci.environmentFile. When
omitted entirely, the server generates an ephemeral 256-bit secret
on each start (sessions invalidate across restarts).
Type: null or string
Default:
null
services.eka-ci.settings.oauth.redirect_url
OAuth callback URL. Defaults to
http://{web.address}:{web.port}/github/auth/callback when unset.
Type: null or string
Default:
null
services.eka-ci.settings.require_approval
Require maintainer approval before building PRs from external contributors.
Type: boolean
Default:
false
services.eka-ci.settings.security
Security-related settings.
Type: open submodule of (TOML value)
Default:
{ }
services.eka-ci.settings.security.allow_insecure_webhooks
Allow the server to start without a webhook secret. Intended for local development only; never enable in production.
Type: boolean
Default:
false
services.eka-ci.settings.security.allow_private_cache_hosts
Allow cache destinations whose DNS resolves to private/loopback addresses. Disables built-in SSRF protection; only enable in trusted, isolated networks.
Type: boolean
Default:
false
services.eka-ci.settings.security.audit_hooks
Emit structured audit log records every time a hook runs.
Type: boolean
Default:
true
services.eka-ci.settings.security.max_hook_timeout_seconds
Maximum wall-clock time, in seconds, that any post-build hook is allowed to run.
Type: integer between 1 and 86400 (both inclusive)
Default:
300
services.eka-ci.settings.security.webhook_secret
GitHub webhook HMAC secret. Avoid setting this in Nix. Provide
GITHUB_WEBHOOK_SECRET via
services.eka-ci.environmentFile.
The server refuses to start if no webhook secret is available
unless allow_insecure_webhooks is true.
Type: null or string
Default:
null
services.eka-ci.settings.unix
Unix-domain-socket settings used by the CLI client.
Type: open submodule of (TOML value)
Default:
{ }
services.eka-ci.settings.unix.socket_path
Unix domain socket the CLI client connects to. When null the
server falls back to $XDG_RUNTIME_DIR/ekaci.socket, which under
this module resolves to /run/eka-ci/ekaci.socket.
Type: null or absolute path
Default:
null
services.eka-ci.settings.web
HTTP server settings.
Type: open submodule of (TOML value)
Default:
{ }
services.eka-ci.settings.web.address
IPv4 address the HTTP server binds to.
Type: string
Default:
"127.0.0.1"
services.eka-ci.settings.web.allowed_origins
CORS allow-list. Each entry must be a fully-qualified http:// or
https:// origin with no path, query, fragment, or * wildcard.
An empty list rejects all cross-origin requests.
Type: list of string
Default:
[ ]
Example:
[
"https://app.example.com"
]
services.eka-ci.settings.web.bundle_path
Optional path to a pre-built web UI bundle.
Type: null or absolute path
Default:
null
services.eka-ci.settings.web.port
TCP port the HTTP server binds to.
Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)
Default:
3030
services.eka-ci.user
User the service runs as when dynamicUser is false.
Ignored when dynamicUser = true.
Type: string
Default:
"eka-ci"
Server Configuration
The server is configured via a single TOML file, by default at
~/.config/ekaci/ekaci.toml. This page covers the most common settings; for credential
sources see GitHub App Setup and
Configuring Caches.
Minimal example
[[github_apps]]
id = "main"
credentials = { file = { path = "/etc/eka-ci/github-app.json" } }
Full example
# Web server
[web]
address = "127.0.0.1"
port = 3030
# State paths
db_path = "/var/lib/ekaci/sqlite.db"
logs_dir = "/var/log/ekaci"
# Build behaviour
build_no_output_timeout_seconds = 1200 # 20 minutes
graph_lru_capacity = 100000 # see lru-cache-tuning.md
require_approval = false # require approval for external PRs
# OAuth (optional, for the web UI)
[oauth]
client_id = "github-oauth-client-id"
client_secret = "github-oauth-client-secret"
redirect_url = "https://your-server.com/github/auth/callback"
jwt_secret = "your-jwt-secret"
# Security
[security]
max_hook_timeout_seconds = 300
audit_hooks = true
# GitHub App credentials
[[github_apps]]
id = "production"
credentials = { vault = {
address = "https://vault.example.com:8200",
secret_path = "eka-ci/github-app",
token_env = "VAULT_TOKEN"
} }
[github_apps.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
# Binary caches
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://bucket/path"
credentials = { aws-secrets-manager = {
secret_name = "eka-ci/s3-credentials",
region = "us-east-1"
} }
[caches.permissions]
allow_all = false
allowed_repos = ["myorg/production-*"]
allowed_branches = ["main"]
Key settings
[web]
The HTTP API and Prometheus /metrics endpoint bind to address:port. For production
deployments behind a reverse proxy, bind to 127.0.0.1 and let the proxy terminate TLS.
graph_lru_capacity
Capacity of the in-memory derivation graph cache. Larger repositories need a larger cache; see LRU Cache Tuning for sizing guidance.
build_no_output_timeout_seconds
A build is considered hung if it produces no output for this many seconds. The default of 20 minutes is appropriate for most Nixpkgs-style packages; bump it for repos with very slow fixed-output derivations.
require_approval
When true, builds for pull requests from external (non-collaborator) authors are queued
but not executed until a maintainer approves. The approval workflow is partially
implemented — see the project README for current status.
[security]
max_hook_timeout_seconds caps the wall-clock time of any post-build hook.
audit_hooks enables structured audit log records every time a hook runs.
Credentials
All credential blocks (GitHub Apps, caches, OAuth) use a tagged enum:
credentials = { env = { vars = ["..."] } }
credentials = { file = { path = "/etc/..." } }
credentials = { vault = { address = "...", secret_path = "...", token_env = "..." } }
credentials = { aws-secrets-manager = { secret_name = "...", region = "..." } }
credentials = { systemd = { credential_id = "..." } }
credentials = { instance-metadata = { provider = "aws" } }
credentials = { aws-profile = { profile = "..." } }
credentials = { github-app-key = { app_id = "main" } }
Each source is documented in GitHub App Setup and Configuring Caches.
Permissions
Both [[github_apps]] and [[caches]] accept a permissions block:
[caches.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = ["main", "release/*"]
Glob patterns use *-style matching. When allow_all = true, the other lists are ignored.
Repository Configuration
Repositories opt in to Eka CI by adding a .eka-ci/config.json file. This file is
untrusted: it can reference caches, jobs, and checks defined on the server, but it can
never inject credentials, host paths, or arbitrary commands beyond what the server allows.
Schema
{
"jobs": {
"package-name": {
"file": "path/to/file.nix",
"attr_path": "optional.attr.path",
"allow_eval_failures": false,
"caches": ["cache-id-from-server-config"],
"size_check": {
"max_increase_percent": 10.0,
"base_branch": "main"
}
}
},
"checks": {
"check-name": {
"shell": "shell-derivation-attr",
"command": "command to run",
"allow_network": false,
"ro_bind": ["/path/to/readonly/bind"]
}
}
}
Jobs
A job describes a Nix expression to evaluate and the derivations to build from it.
| Field | Required | Description |
|---|---|---|
file | yes | Path to a .nix file relative to the repository root. |
attr_path | no | Optional sub-attribute path inside the file. |
allow_eval_failures | no | If true, evaluation errors do not fail the check. |
caches | no | List of cache IDs (defined server-side) to push successful builds to. |
size_check | no | Configures output- and closure-size monitoring. |
Size checks
When size_check is set, Eka CI:
- Calculates output (NAR) and closure size for each successful build.
- Stores sizes in historical tables, keyed by commit and repository.
- Compares against the most recent successful build on
base_branch. - Logs warnings (and surfaces them in the change summary) when the increase exceeds
max_increase_percent.
Checks
A check runs a sandboxed command in a shell derivation defined in the repository.
| Field | Required | Description |
|---|---|---|
shell | yes | Attribute name of a shell derivation that provides the tools. |
command | yes | The command line to run inside the sandbox. |
allow_network | no | Default false. When true, the check is allowed network access. |
ro_bind | no | Additional read-only bind mounts to expose to the sandbox. |
Checks are sandboxed via birdcage with no filesystem write access outside their working
directory and no network access by default. See
Architecture for details on the security model.
Cache references
The caches field on a job lists cache IDs — string identifiers from the server's
[[caches]] blocks. The repository never sees the underlying credentials, destinations, or
permissions.
If a job references a cache it is not allowed to push to (per the cache's
allowed_repos/allowed_branches), the push is silently skipped and a warning is logged.
The build itself still succeeds. See Configuring Caches.
Configuring Caches in EKA-CI
This guide explains how to configure binary caches for EKA-CI, allowing build outputs to be pushed to various cache backends.
Overview
EKA-CI uses a two-tier configuration model for security:
- Server Configuration (trusted): Defines available caches, credentials, and permissions
- Repository Configuration (untrusted): References caches by ID only
This separation ensures that repository contributors cannot inject arbitrary commands or access credentials directly.
Server Configuration
Cache definitions are stored in the server configuration file (typically ~/.config/ekaci/ekaci.toml or specified via --config-file).
Basic Structure
# Security settings for hook execution
[security]
max_hook_timeout_seconds = 300 # Maximum time for cache push operations
audit_hooks = true # Enable audit logging of all cache operations
# Cache definitions
[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }
[[caches]]
id = "public-cachix"
cache_type = "cachix"
destination = "my-cache-name"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }
Cache Types
1. Nix Copy (S3/HTTP Binary Caches)
Uses nix copy to push derivations to S3-compatible storage or HTTP binary caches.
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-west-2"
# Option 1: Environment variables
[caches.credentials]
env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] }
# Option 2: AWS profile
# [caches.credentials]
# aws-profile = { profile = "production" }
# Option 3: Credential file
# [caches.credentials]
# file = { path = "/etc/eka-ci/aws-credentials" }
Supported S3 destinations:
s3://bucket/path?region=REGION- S3 with explicit regions3://bucket/path?profile=PROFILE- S3 using AWS profiles3://bucket/path?endpoint=URL- S3-compatible services (MinIO, etc.)
HTTP binary caches:
[[caches]]
id = "http-cache"
cache_type = "nix-copy"
destination = "https://cache.example.com"
credentials = { none = {} } # Public cache, no auth needed
2. Cachix
Uses Cachix for binary cache storage with built-in authentication.
[[caches]]
id = "my-cachix"
cache_type = "cachix"
destination = "my-cache-name" # Your Cachix cache name
[caches.credentials]
cachix-token = { env_var = "CACHIX_AUTH_TOKEN" }
Getting a Cachix token:
- Sign up at cachix.org
- Create a cache
- Generate an auth token
- Set
CACHIX_AUTH_TOKENenvironment variable when running EKA-CI
3. Attic
Uses Attic for self-hosted binary caches.
[[caches]]
id = "attic-cache"
cache_type = "attic"
destination = "https://attic.example.com/my-cache"
[caches.credentials]
env = { vars = ["ATTIC_TOKEN"] }
Credential Sources
EKA-CI supports multiple credential sources, including secure secret management systems to avoid storing plain-text credentials.
HashiCorp Vault (Recommended for Production)
Retrieve credentials from HashiCorp Vault:
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
[caches.credentials]
vault = {
address = "https://vault.example.com:8200",
secret_path = "secret/data/eka-ci/s3-cache",
token_env = "VAULT_TOKEN", # Optional, defaults to VAULT_TOKEN
namespace = "prod" # Optional, for Vault Enterprise
}
Vault secret format (KV v2):
{
"data": {
"AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
"AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}
Benefits:
- Secrets never stored on disk in plain text
- Automatic secret rotation support
- Audit logging of secret access
- Fine-grained access control
AWS Secrets Manager
Retrieve credentials from AWS Secrets Manager:
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
[caches.credentials]
aws-secrets-manager = {
secret_name = "eka-ci/s3-cache-credentials",
region = "us-east-1" # Optional, defaults to AWS_REGION env var
}
Secret format (JSON):
{
"AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
"AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
Benefits:
- Native AWS integration
- Automatic encryption at rest
- IAM-based access control
- Secret rotation with Lambda
systemd Credentials (Linux Systems)
Use systemd's encrypted credentials feature:
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
[caches.credentials]
systemd-credential = { name = "s3-cache-creds" }
Setup:
# Encrypt credential
echo -n "AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=wJal..." | \
systemd-creds encrypt --name=s3-cache-creds - \
/etc/credstore.encrypted/s3-cache-creds
# Service loads it automatically
systemctl restart eka-ci.service
Benefits:
- Encrypted at rest with TPM2 or system key
- Integrated with systemd services
- No external dependencies
- OS-level security
Instance Metadata Service (Cloud VMs)
Use IAM roles/service accounts without explicit credentials:
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
[caches.credentials]
instance-metadata = {}
Supported platforms:
- AWS EC2 with IAM roles
- Google Cloud with service accounts
- Azure VMs with managed identities
Benefits:
- No credentials to manage
- Automatic credential rotation
- Follows cloud best practices
- Reduced attack surface
Environment Variables
Read credentials from environment variables (simple but less secure):
[caches.credentials]
env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] }
Note: Environment variables are visible in /proc/<pid>/environ and process listings.
File-based Credentials
Read credentials from a file (ensure proper file permissions):
[caches.credentials]
file = { path = "/etc/eka-ci/cache-credentials" }
File format: Key-value pairs, one per line
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Security: Set file permissions to 600 (readable only by EKA-CI user):
chmod 600 /etc/eka-ci/cache-credentials
chown ekaci:ekaci /etc/eka-ci/cache-credentials
AWS Profile
Use credentials from ~/.aws/credentials:
[caches.credentials]
aws-profile = { profile = "production" }
AWS credentials file (~/.aws/credentials):
[production]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region = us-west-2
Cachix Token
Specific to Cachix authentication:
[caches.credentials]
cachix-token = { env_var = "CACHIX_AUTH_TOKEN" }
No Authentication
For public caches that don't require authentication:
[caches.credentials]
none = {}
Credential Source Comparison
| Method | Security | Complexity | Rotation | Audit | Best For |
|---|---|---|---|---|---|
| HashiCorp Vault | ⭐⭐⭐⭐⭐ | Medium | Automatic | Yes | Enterprise production |
| AWS Secrets Manager | ⭐⭐⭐⭐⭐ | Low | Automatic | Yes | AWS environments |
| systemd Credentials | ⭐⭐⭐⭐ | Medium | Manual | Limited | Linux systemd systems |
| Instance Metadata | ⭐⭐⭐⭐⭐ | Low | Automatic | Yes | Cloud VMs |
| AWS Profile | ⭐⭐⭐ | Low | Manual | No | Development |
| Environment Variables | ⭐⭐ | Low | Manual | No | Development/testing |
| File-based | ⭐⭐ | Low | Manual | No | Simple deployments |
Cache Permissions
Control which repositories and branches can use each cache.
Allow All (Default)
[[caches]]
id = "public-cache"
# ... other config ...
[caches.permissions]
allow_all = true # Any repository can use this cache
Specific Repositories
[[caches]]
id = "org-cache"
# ... other config ...
[caches.permissions]
allow_all = false
allowed_repos = [
"myorg/repo1",
"myorg/repo2",
"anotherorg/special-repo"
]
Branch Restrictions
[[caches]]
id = "production-cache"
# ... other config ...
[caches.permissions]
allow_all = false
allowed_repos = ["myorg/myrepo"]
allowed_branches = [
"main", # Exact match
"release/*", # Prefix wildcard
"*" # Match all branches (if repo is allowed)
]
Branch pattern syntax:
main- Exact match onlyrelease/*- Matchesrelease/v1.0,release/v2.0, etc.*/hotfix- Matches any branch ending with/hotfix*- Matches all branches
Repository Configuration
In your repository's .eka-ci/config.json, reference caches by ID:
{
"jobs": {
"my-package": {
"file": "default.nix",
"caches": ["production-s3", "public-cachix"]
},
"another-package": {
"file": "package.nix",
"caches": ["production-s3"]
}
}
}
Security Note: Repository contributors can only reference cache IDs. They cannot:
- Define arbitrary commands
- Access credentials
- Push to caches they don't have permission for
- Create new caches
Complete Examples
Example 1: Public Open Source Project
# Server config: ~/.config/ekaci/ekaci.toml
[security]
max_hook_timeout_seconds = 300
audit_hooks = true
[[caches]]
id = "public-cachix"
cache_type = "cachix"
destination = "my-oss-project"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }
permissions = { allow_all = true }
// Repository config: .eka-ci/config.json
{
"jobs": {
"stdenv": {
"file": "default.nix",
"caches": ["public-cachix"]
}
}
}
Example 2: Private Company Repository
# Server config: /etc/eka-ci/ekaci.toml
[security]
max_hook_timeout_seconds = 600
audit_hooks = true
[[caches]]
id = "dev-cache"
cache_type = "nix-copy"
destination = "s3://company-dev-cache/nix?region=us-east-1"
credentials = { aws-profile = { profile = "dev" } }
permissions = { allow_all = false, allowed_repos = ["company/*"] }
[[caches]]
id = "prod-cache"
cache_type = "nix-copy"
destination = "s3://company-prod-cache/nix?region=us-east-1"
credentials = { aws-profile = { profile = "production" } }
[caches.permissions]
allow_all = false
allowed_repos = ["company/backend", "company/frontend"]
allowed_branches = ["main", "release/*"]
// Repository config: .eka-ci/config.json
{
"jobs": {
"backend": {
"file": "backend.nix",
"caches": ["dev-cache", "prod-cache"]
}
}
}
Example 3: Production with HashiCorp Vault
Secure production setup using Vault for secret management:
# Server config: /etc/eka-ci/ekaci.toml
[security]
max_hook_timeout_seconds = 600
audit_hooks = true
[[caches]]
id = "prod-s3"
cache_type = "nix-copy"
destination = "s3://company-prod-cache/nix?region=us-east-1"
[caches.credentials]
vault = {
address = "https://vault.company.internal:8200",
secret_path = "secret/data/eka-ci/prod-s3",
namespace = "production"
}
[caches.permissions]
allow_all = false
allowed_repos = ["company/backend", "company/frontend"]
allowed_branches = ["main", "release/*"]
[[caches]]
id = "staging-s3"
cache_type = "nix-copy"
destination = "s3://company-staging-cache/nix?region=us-east-1"
[caches.credentials]
vault = {
address = "https://vault.company.internal:8200",
secret_path = "secret/data/eka-ci/staging-s3",
namespace = "production"
}
[caches.permissions]
allow_all = false
allowed_repos = ["company/*"]
allowed_branches = ["develop", "feature/*", "main"]
Vault setup:
# Store S3 credentials in Vault
vault kv put secret/eka-ci/prod-s3 \
AWS_ACCESS_KEY_ID="AKIA..." \
AWS_SECRET_ACCESS_KEY="wJal..."
vault kv put secret/eka-ci/staging-s3 \
AWS_ACCESS_KEY_ID="AKIA..." \
AWS_SECRET_ACCESS_KEY="wJal..."
# Grant EKA-CI service access
vault policy write eka-ci-policy - <<EOF
path "secret/data/eka-ci/*" {
capabilities = ["read"]
}
EOF
vault token create -policy=eka-ci-policy
Repository config:
{
"jobs": {
"backend": {
"file": "backend.nix",
"caches": ["staging-s3", "prod-s3"]
}
}
}
Example 4: Multi-Cache Strategy
Push to both a fast internal cache and a public Cachix:
[[caches]]
id = "internal-s3"
cache_type = "nix-copy"
destination = "s3://internal-cache/nix?region=us-west-2&endpoint=https://minio.internal"
credentials = { env = { vars = ["MINIO_ACCESS_KEY", "MINIO_SECRET_KEY"] } }
permissions = { allow_all = false, allowed_repos = ["myorg/*"] }
[[caches]]
id = "public-fallback"
cache_type = "cachix"
destination = "myorg-public"
credentials = { cachix-token = { env_var = "CACHIX_AUTH_TOKEN" } }
permissions = { allow_all = false, allowed_repos = ["myorg/*"] }
{
"jobs": {
"my-app": {
"file": "default.nix",
"caches": ["internal-s3", "public-fallback"]
}
}
}
Example 5: Cloud VM with IAM Roles
AWS EC2 instance using IAM role (no credentials needed):
# Server config on EC2 instance
[security]
max_hook_timeout_seconds = 300
audit_hooks = true
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://my-cache/nix?region=us-east-1"
credentials = { instance-metadata = {} } # Uses EC2 IAM role
permissions = { allow_all = true }
Required IAM role policy:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-cache/*",
"arn:aws:s3:::my-cache"
]
}]
}
Operational Considerations
Setting Environment Variables
When running EKA-CI as a systemd service:
# /etc/systemd/system/eka-ci.service
[Service]
Environment="AWS_ACCESS_KEY_ID=AKIA..."
Environment="AWS_SECRET_ACCESS_KEY=wJal..."
Environment="CACHIX_AUTH_TOKEN=eyJ..."
EnvironmentFile=/etc/eka-ci/secrets.env
Secrets Management
Recommended: Use secure credential sources (see Credential Sources section)
Production deployments should use one of:
- HashiCorp Vault - Enterprise secret management with rotation and audit
- AWS Secrets Manager - Native AWS secret storage
- systemd Credentials - Encrypted credentials with TPM2 support
- Instance Metadata - Cloud IAM roles (no credentials to manage)
For development/testing only:
Environment file:
# /etc/eka-ci/secrets.env
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
CACHIX_AUTH_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Warning: Plain-text environment files and environment variables should only be used for development. Production systems should use Vault, AWS Secrets Manager, systemd credentials, or instance metadata.
Monitoring and Auditing
When audit_hooks = true, all cache operations are logged:
[INFO] Sent hook task for drv /nix/store/abc-foo.drv (job: my-package)
[WARN] Permission denied for cache 'prod-cache' in myorg/myrepo: Branch develop is not allowed
[WARN] Cache ID 'nonexistent-cache' not found in server registry, skipping
Testing Cache Configuration
-
Verify server config loads:
eka-ci-server --config-file ekaci.toml # Check logs for "Loading configuration file from..." -
Test permissions: Create a test PR and check logs for permission warnings
-
Verify credentials:
# For S3 nix copy /nix/store/some-drv --to 's3://bucket/path?region=us-east-1' # For Cachix cachix push my-cache /nix/store/some-drv
Troubleshooting
Cache push fails silently
Check that:
- Cache ID in
.eka-ci/config.jsonmatches server config - Repository has permission to use the cache
- Credentials are valid and accessible
- Server logs show the hook execution
Permission denied
[WARN] Permission denied for cache 'prod-cache' in myorg/myrepo
Solutions:
- Add repository to
allowed_reposlist - Check branch name matches
allowed_branchespattern - Set
allow_all = trueif appropriate
Credentials not found
[ERROR] Failed to execute hook: Environment variable AWS_ACCESS_KEY_ID not set
Solutions:
- Ensure environment variables are set when starting server
- Check systemd service file for
Environment=orEnvironmentFile= - Verify file paths for file-based credentials
Timeout errors
[WARN] Hook execution timed out after 300 seconds
Solutions:
- Increase
max_hook_timeout_secondsin security config - Check network connectivity to cache destination
- Verify cache backend is responsive
Security Best Practices
- Use minimal permissions: Only grant cache access to repositories that need it
- Separate dev/prod caches: Use branch restrictions to prevent dev builds in production caches
- Rotate credentials: Regularly rotate AWS keys and Cachix tokens
- Audit logs: Monitor
audit_hooksoutput for unauthorized access attempts - File permissions: Ensure credential files are readable only by the EKA-CI service user
- Environment isolation: Use systemd's
PrivateTmp,ProtectSystem, etc. for additional security
Migration from Arbitrary Hooks
If you previously used arbitrary post-build hooks, migrate to the secure cache reference system:
Before (insecure):
{
"jobs": {
"my-package": {
"file": "default.nix",
"post_build_hooks": [{
"name": "push-to-s3",
"command": ["nix", "copy", "--to", "s3://bucket/path"],
"env": {
"AWS_ACCESS_KEY_ID": "hardcoded-key",
"AWS_SECRET_ACCESS_KEY": "hardcoded-secret"
}
}]
}
}
}
After (secure):
Server config:
[[caches]]
id = "s3-cache"
cache_type = "nix-copy"
destination = "s3://bucket/path?region=us-east-1"
credentials = { env = { vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] } }
Repository config:
{
"jobs": {
"my-package": {
"file": "default.nix",
"caches": ["s3-cache"]
}
}
}
Further Reading
Rebuild Detection
When a pull request changes a Nix expression, Eka CI computes which derivations need to rebuild. This is a key input to both the change summary and the build queue.
How it works
For each opened or updated pull request, the server:
- Evaluates the base ref to produce a base jobset of derivations.
- Evaluates the head ref to produce a head jobset.
- Diffs the two jobsets, classifying each derivation as one of:
- Added — present at head, missing at base.
- Removed — present at base, missing at head.
- Rebuild — same attribute but a different
.drvhash.
- For each rebuild target, walks the in-memory dependency graph to compute its blast radius — the count of transitive dependents that must also rebuild.
The resulting set of derivations is what gets enqueued for the platform-specific build queues.
Configuration
Rebuild detection is controlled by a small number of settings in ekaci.toml:
[rebuild]
# Maximum number of derivations to rebuild before classifying a PR as
# "wide" and skipping per-package builds.
max_rebuild_count = 20000
# Skip rebuild evaluation entirely for PRs that touch any of these paths.
skip_paths = ["doc/**", "**/CHANGELOG.md"]
The exact set of available settings is evolving; consult the source of truth at
backend/server/src/config.rs. Repositories can additionally constrain rebuild detection
via change_set rules, allowing maintainers to mark certain files as "rebuild-only" or
"docs-only" without re-evaluating Nix.
Change sets
A change set is a per-repository declaration of which file globs map to which kind of change. They are evaluated cheaply from the Git diff before a full Nix evaluation runs. Typical uses:
- Marking
README.mdanddoc/**/*.mdas documentation-only. - Marking
flake.lockupdates as triggering a full rebuild. - Treating CI-only files (like
.github/**) as no-op.
When a PR's diff is fully covered by change-set rules that imply "no rebuild", Eka CI can skip the build phase entirely and post a "no rebuilds expected" change summary.
Metrics
Rebuild detection emits Prometheus metrics; see Monitoring & Metrics for the exact metric names. Useful series include rebuild counts per system, blast-radius histograms, and skip-due-to-change-set counters.
Related
- Change Summaries — the user-visible output of rebuild detection.
- LRU Cache Tuning — the cache that holds the dependency graph.
Change Summary Operational Runbook
Table of Contents
- Overview
- Per-Repo Configuration
- Metrics
- Endpoints
- GitHub Check Posting
- Truncation Strategy
- Cache
- Troubleshooting
- Alerts
Overview
The change-summary pipeline computes a per-PR view that combines:
- Package changes (A1): structured diff between head and base jobsets — Added / Removed / VersionBump / LicenseChange / MaintainerChange / RebuildOnly.
- Rebuild impact (A2): per-system rebuild counts plus per-package "blast radius" (count of transitive dependents) across the in-memory build graph.
Outputs:
GET /v1/commits/{sha}/package-changes— JSON, structured diff only.GET /v1/commits/{sha}/rebuild-impact— JSON, impact only.GET /v1/commits/{sha}/change-summary— JSON, combined + pre-rendered markdown.GET /v1/commits/{sha}/change-summary.md—text/markdown, ready to paste.- A single GitHub check run per PR head (
EkaCI: Change Summary), idempotently created/patched on a 5-minute debounce after each jobset evaluation.
Per-Repo Configuration
Behaviour can be tuned per repository via .ekaci/config.json. Both blocks are optional; absent ⇒ engine defaults.
{
"package_change_summary": {
"enabled": true,
"max_packages_listed": 100,
"include_rebuild_only": false
},
"rebuild_impact": {
"enabled": true,
"max_top_blast_radius": 5,
"compute_full_blast_radius": false
}
}
package_change_summary
| Field | Default | Notes |
|---|---|---|
enabled | true | Hides the package-changes section of the check when false. |
max_packages_listed | 100 | Soft cap on table rows before the renderer collapses to counts-only. The web endpoint clamps user-supplied max_packages_listed query params to ×10 this value. |
include_rebuild_only | false | When true, RebuildOnly rows render alongside Added/Removed/Bumped. Counts are still surfaced in the rebuild-only summary line regardless. |
rebuild_impact
| Field | Default | Notes |
|---|---|---|
enabled | true | Hides the blast-radius section of the check when false. |
max_top_blast_radius | 5 | Number of top-rebuild packages reported. The web endpoint clamps user-supplied max_top_blast_radius query params to ×10 this value. |
compute_full_blast_radius | false | When true, walks the full transitive dependent set (expensive on large jobsets — use sparingly). The default mode reports per-seed direct rebuild counts. |
Schema is parsed by CIConfig in backend/server/src/ci/config.rs. Partial blocks are accepted; missing inner fields fall back to the defaults shown above.
Metrics
All metrics are exposed at GET /v1/metrics with the namespace eka_ci_.
| Metric | Type | Labels | Meaning |
|---|---|---|---|
eka_ci_change_summary_total_duration_seconds | Histogram | phase | Wall-clock per phase: classify, impact, render, end_to_end. |
eka_ci_change_summary_cache_hits_total | Counter | — | RebuildImpactCache lookups served from SQLite. |
eka_ci_change_summary_cache_misses_total | Counter | — | RebuildImpactCache lookups that fell through to a cold compute. |
eka_ci_change_summary_metadata_unavailable_total | Counter | — | Calls where head-side pname/version/license/maintainers were entirely missing. Indicates the eval pipeline did not populate package metadata. |
eka_ci_change_summary_truncated_total | Counter | level | Truncation events by drop level: columns (one of maintainers/license/rebuild-only dropped) or summary (table collapsed to counts-only). |
eka_ci_rebuild_impact_traversal_duration_seconds | Histogram | system | Per-system blast-radius traversal duration. |
eka_ci_rebuild_impact_seeds_total | Histogram | — | Per-call distribution of changed-drv seed count fed into the BFS. |
Useful queries
Cache hit ratio (target: > 0.8 in steady state for re-rendered PRs):
rate(eka_ci_change_summary_cache_hits_total[5m])
/
ignoring() (
rate(eka_ci_change_summary_cache_hits_total[5m])
+ rate(eka_ci_change_summary_cache_misses_total[5m])
)
End-to-end p95:
histogram_quantile(0.95,
sum by (le) (rate(eka_ci_change_summary_total_duration_seconds_bucket{phase="end_to_end"}[5m]))
)
Truncation rate (target: < 5% of renders):
sum(rate(eka_ci_change_summary_truncated_total[15m]))
/
sum(rate(eka_ci_change_summary_total_duration_seconds_count{phase="end_to_end"}[15m]))
Endpoints
| Path | Auth | Notes |
|---|---|---|
GET /v1/commits/{sha}/package-changes | required | Returns full structured diff; never truncated by the orchestrator. Query: base_sha, job, max_packages_listed. |
GET /v1/commits/{sha}/rebuild-impact | required | Read-through RebuildImpactCache. Query: base_sha, job, max_top_blast_radius. |
GET /v1/commits/{sha}/change-summary | required | Combined; includes pre-rendered markdown. |
GET /v1/commits/{sha}/change-summary.md | public | Returns the same markdown that posts to the GitHub check. Public per design §10.1 — same data is visible on the PR check tab. |
max_packages_listed and max_top_blast_radius are clamped to ×10 their defaults to keep payload sizes predictable.
GitHub Check Posting
- One check run per PR head SHA, titled EkaCI: Change Summary.
- Posted with
status=Completed,conclusion=Neutral(informational; does not gate merge). - 5-minute debounce after the last
CreateJobSetfor a head SHA so all jobsets contribute to a single aggregated render. - Idempotent: subsequent renders for the same head PATCH the same check run id.
- Defense-in-depth: a 65,500-byte sender-side cap protects against GitHub's 65,535-char
output.summaryceiling. Hits append a_…truncated by sender safety net_footer; this is rare in practice (the markdown renderer's 60,000-byte soft limit fires first).
Truncation Strategy
The renderer drops content in priority order until the markdown fits under 60,000 bytes:
- Drop maintainer rows from the package table.
- Drop license rows.
- Drop the rebuild-only count line.
- Collapse the entire change table to a counts-only summary.
Every step that fires increments eka_ci_change_summary_truncated_total (level=columns for steps 1-3, level=summary for step 4).
Cache
The RebuildImpactCache SQLite table memoises rebuild-impact responses keyed by (head_sha, base_sha, job).
- Pruned on startup: rows older than 7 days are dropped (
DEFAULT_CACHE_TTL_DAYS). - Cache write failures are logged at WARN; the freshly-computed answer is still returned (just unmemoised).
- 404s (head jobset missing) are not cached.
To force recompute of a specific entry:
DELETE FROM RebuildImpactCache WHERE head_sha = ? AND base_sha = ? AND job = ?;
Troubleshooting
"No change summary check appearing on a PR"
- Check that the PR target jobset finished evaluating (
Jobrows for the head SHA exist). - Confirm the GitHub App has
checks: writepermission for the repo. - The 5-minute debounce means the check appears at least 5 minutes after the last jobset. Verify by waiting or by checking the
change_summary_pendinglog line in the GitHub service. - If a base SHA is missing from the PR (rare; happens on detached PR heads), the change-summary check is skipped — this is intentional.
"Rendered summary is truncated more than expected"
- Inspect
eka_ci_change_summary_truncated_totalrates. A spike usually correlates with a large fan-out PR (touches many packages). - Per design §11.1, the per-repo
max_packages_listedandmax_top_blast_radiusknobs can be raised, but the GitHub 65 535-byte cap is the hard ceiling. Larger PRs always benefit from thechange-summary.mdendpoint, which always returns full markdown.
"Cache hit ratio is low"
- Expected the first time a
(head, base, job)triple is queried. Subsequent renders should hit. - A persistent miss rate means re-evaluations are producing different
head_shavalues (e.g., force-pushes). This is normal for active PRs. - If miss rate is high without new commits, check for
RebuildImpactCachewrite failures in the WARN log.
"Metadata unavailable counter is non-zero"
- Means the eval pipeline did not populate
pname/version/license/maintainersfor any drv on the head side. Check thenix-eval-jobsinvocation producedmetablocks. - Affects display only —
Added/Removedclassification falls back toRebuildOnlyrows.
Alerts
Suggested Prometheus alert rules:
- alert: ChangeSummaryEndToEndSlow
expr: |
histogram_quantile(0.95,
sum by (le) (rate(eka_ci_change_summary_total_duration_seconds_bucket{phase="end_to_end"}[5m]))
) > 10
for: 15m
annotations:
summary: "change-summary p95 > 10s"
- alert: ChangeSummaryCacheMissesElevated
expr: |
rate(eka_ci_change_summary_cache_misses_total[15m])
/ (rate(eka_ci_change_summary_cache_hits_total[15m])
+ rate(eka_ci_change_summary_cache_misses_total[15m])) > 0.5
for: 30m
annotations:
summary: "change-summary cache miss ratio > 50%"
- alert: ChangeSummaryTruncationSpike
expr: |
sum(rate(eka_ci_change_summary_truncated_total{level="summary"}[15m])) > 0
for: 30m
annotations:
summary: "change-summary collapsing tables to counts-only"
GitHub PR Comment Commands
eka-ci listens for comments on pull requests that mention the bot and dispatches supported commands. This document lists the commands that are currently recognized, the conditions under which they succeed, and the feedback users should expect.
Summary
| Command | Purpose |
|---|---|
@eka-ci merge | Queue the PR to be merged once CI passes, using the repository's default merge method (or the configured default squash). |
@eka-ci merge merge | Same, with an explicit merge (merge-commit) method. |
@eka-ci merge squash | Same, with an explicit squash method. |
@eka-ci merge rebase | Same, with an explicit rebase method. |
@eka-ci merge cancel | Withdraw a previously-issued @eka-ci merge request. |
The bot mention is case-insensitive (@eka-ci, @Eka-CI, @EKA-CI all
work). The command verb and method are also case-insensitive.
Where commands are accepted
The comment must be on a pull request (comments on plain issues are ignored). Additionally:
- Only newly-created comments trigger commands — edits and deletions
do not revoke or re-issue commands. If you want to cancel, post
@eka-ci merge cancelas a new comment. - Bot-authored comments are ignored (the
User.typefield on the comment author must not beBot). - The
@eka-cimention must be the first non-whitespace token of a line. Mentions embedded in prose — e.g.cc @eka-ci please help— are deliberately ignored to prevent accidental triggers. - A single comment may span multiple lines; the first line that parses as a command wins. Other lines are treated as prose.
Parser behavior
- The bot handle match is case-insensitive (ASCII).
- The command verb (
merge) must immediately follow the mention. - Unknown verbs (e.g.
@eka-ci rebuild) are silently ignored. - Unknown methods (e.g.
@eka-ci merge squashh) fall back to a bare@eka-ci mergerather than rejecting the whole comment. This permissive behavior means typos still queue a merge. - Trailing tokens after the method are ignored:
@eka-ci merge squash please thanksparses as a squash-merge request.
@eka-ci merge [method]
Queues the PR for auto-merge once all CI gates pass.
Authorization
The comment author must satisfy at least one of:
- Repo permission of
write,maintain, oradminon the repository the PR targets, OR - Be a registered maintainer of every package whose source is changed by the PR (per the eka-ci maintainers table).
If neither condition holds:
- The bot reacts
-1to the command comment. - The bot posts a reply explaining that the command was denied.
- No merge request is recorded.
Push-time (force-push) protection
To guard against commits landing between when a reviewer types the command and when the webhook is processed, the bot performs a best-effort timestamp check before accepting the request:
- It fetches the current head commit via the GitHub API and reads the
committer.date. - If that timestamp is more than 30 seconds after the
created_atof the triggering comment, the bot refuses:- Reacts
-1on the command comment. - Posts a reply naming the head commit and asking the user to review the new changes and re-issue the command.
- Reacts
The 30-second grace window absorbs clock skew between GitHub's event recorder and the commit-metadata service. If the API call fails or the committer date cannot be parsed, the check falls open — the bot proceeds and relies on the post-acceptance SHA-drift check (below) as a second line of defense.
Caveat: Because the signal is the commit's own committer.date, a
force-push of a much older, cherry-picked commit will not trigger this
check. The post-acceptance SHA-drift hook still catches that case.
Request recording and acknowledgement
On acceptance, the bot:
- Records a pending comment-merge request pinned to the current head SHA, together with the requested merge method (if any), the requester's GitHub user id/login, and the comment id.
- Reacts
+1on the command comment as a visible ack. - Kicks the auto-merge evaluator immediately so that if CI gates are already green, the merge lands right away.
Merge method selection
When the auto-merger eventually runs, it selects the merge method in this order of preference:
- The method explicitly given in the comment (
merge/squash/rebase), if any. - The PR's stored merge-method preference (set via the UI), if any.
squash(the default fallback).
If the selected method is disabled in the repository's merge settings, the bot logs a warning and skips auto-merge. No comment is posted in that case — the requester is expected to re-issue with an allowed method.
Post-acceptance SHA-drift protection
After the comment-merge is recorded, the bot continues to monitor the
PR. If any new commit lands on the head branch before the merge
completes (PR Synchronize webhook), the request is cancelled:
- A
:confused:reaction is added to the original command comment. - A reply is posted naming the expected (pinned) and current head SHAs, and instructing the user to re-issue the command against the updated PR.
- The pending merge request is cleared from the database.
This is a hard guarantee: the merge bot will never land a commit that the requester did not explicitly target.
Gates the merge still has to pass
@eka-ci merge is not a force-merge. It opts the PR into the
auto-merge evaluator, which still requires:
- The head commit's jobset has fully concluded with no failing
new-or-changed jobs (
pr_head_build_succeeded). - The merge method selected is allowed by the repository settings.
- Any other CI gates configured on the commit are passing (these are enforced by GitHub's own branch-protection rules independently of eka-ci).
Note that the package-maintainer approval gate used for UI-triggered auto-merges is skipped for comment-driven merges, because the requester's authorization was already verified at command time.
@eka-ci merge cancel
Withdraws an outstanding comment-merge request.
Behavior when nothing is pending
If no @eka-ci merge is currently pending on the PR, the bot silently
no-ops. It does not react, does not post, and does not write to the
database. This is intentional: it denies unauthorized commenters any
signal about bot state.
Authorization
The comment author must satisfy at least one of:
- Be the original requester of the pending merge (identified by GitHub user id). This is a fast path that skips the permission API call.
- Have
write,maintain, oradminon the repository. - Be a maintainer of every changed package in the PR.
If none of these hold:
- The bot reacts
-1on the cancel comment. - The bot posts a reply explaining why the cancel was denied.
- The pending merge request remains in place.
This prevents random commenters from griefing pending maintainer merges.
On acceptance
- The pending merge request is cleared from the database.
- The bot reacts
+1on the cancel comment. - Any subsequent auto-merge evaluation proceeds as if no comment-merge had ever been issued (ambient auto-merge remains in effect if it was separately enabled via the UI).
Rate limiting
To protect the installation's shared GitHub API budget (5000 req/hr),
commands are rate-limited per (user_id, owner, repo) triple at
the webhook boundary:
- Minimum interval: 5 seconds between accepted commands from the same user on the same repository.
- Rejections are silent: no reaction, no comment, no DB write. Feedback would itself amplify the spam the limit is designed to contain.
- The rate-limit state is process-local and non-persistent; it resets on server restart. It protects against burst spam only; sustained abuse is left to GitHub's own abuse-detection systems.
If you legitimately need to correct a just-issued command (e.g. wrong
method), wait 5 seconds before re-issuing, or use
@eka-ci merge cancel followed by the new command.
User-visible reactions
The bot uses the following reactions on the triggering comment as a compact status signal:
| Reaction | Meaning |
|---|---|
+1 | Command accepted (merge queued, or cancel recorded). |
-1 | Command denied (unauthorized, or refused due to push-time drift). |
rocket | The requested merge succeeded (added after the PR merges). |
confused | A previously-accepted comment-merge was cancelled due to post-acceptance SHA drift. |
Examples
Queue a squash-merge:
Looks good to me!
@eka-ci merge squash
Queue the repository-default merge method:
@eka-ci merge
Withdraw a pending request:
Actually, hold off — I want to add another commit.
@eka-ci merge cancel
Multi-line comment where prose precedes the command:
LGTM after the last round of fixes.
@eka-ci merge rebase
Thanks for the reviews!
Things that are NOT supported
These are intentionally out of scope as of this writing:
- Commands from comment edits or deletions — only newly-created comments trigger anything.
- Commands embedded inside prose (
cc @eka-ci please merge). The mention must be the first non-whitespace token on its line. - Verbs other than
merge(@eka-ci rebuild,@eka-ci retry, etc.) are parsed and silently ignored. They may be added in future versions. - Queueing multiple merge requests against the same PR — the most recent accepted request overwrites the previous one.
- Explicit SHA arguments (
@eka-ci merge <sha>). The current head SHA is always captured implicitly at command time.
Post-Build Hooks Implementation
Overview
This document describes the implementation of Nix-style post-build hooks in eka-ci, allowing per-job configuration of cache push and other post-build operations.
Status: ✅ Production Ready - Cache push functionality is fully implemented and operational.
Architecture
Components
-
Hook Types (
backend/server/src/hooks/types.rs)PostBuildHook: Configuration for individual hooksHookTask: Task sent to the executorHookContext: Build context passed to hooksHookResult: Result of hook execution
-
Hook Executor (
backend/server/src/hooks/executor.rs)- Async service that processes hook tasks
- Executes hook commands with environment variable substitution
- Logs output to
{logs_dir}/{drv_hash}/hook-{name}.log
-
Database Integration
- Migration:
backend/server/sql/migrations/20260409_job_config.sql - Stores job config JSON in
GitHubJobSets.config_json - Tracks hook executions in
HookExecutiontable
- Migration:
-
Recorder Integration (
backend/server/src/scheduler/recorder.rs)- Executes hooks after successful builds
- Retrieves job config from database
- Sends hook tasks to HookExecutor (non-blocking)
Automatic Cache Push (Recommended)
New in 2024-04-11: eka-ci now supports automatic cache push using server-side cache configuration with multi-source credential support.
Instead of configuring post-build hooks manually, you can use the built-in cache push system which provides:
- ✅ Secure credential management (Vault, AWS Secrets Manager, systemd, etc.)
- ✅ Permission controls (repository and branch restrictions)
- ✅ Automatic credential loading
- ✅ Support for multiple caches per job
Server-Side Cache Configuration
Configure caches in your server config (~/.config/ekaci/ekaci.toml):
[[caches]]
id = "production-s3"
cache_type = "nix-copy"
destination = "s3://my-bucket/nix-cache?region=us-east-1"
credentials = { vault = {
address = "https://vault.example.com:8200",
secret_path = "eka-ci/s3-credentials",
token_env = "VAULT_TOKEN"
}}
[caches.permissions]
allow_all = false
allowed_repos = ["myorg/*"]
allowed_branches = ["main", "release/*"]
Repository Cache Reference
Reference caches by ID in your .eka-ci/config.json:
{
"jobs": {
"my-package": {
"file": "default.nix",
"caches": ["production-s3"]
}
}
}
See configure-caches.md for detailed cache configuration options.
Manual Post-Build Hooks (Advanced)
For custom post-build operations beyond cache push, you can configure manual hooks.
Job Configuration Format
{
"jobs": {
"my-package": {
"file": "default.nix",
"post_build_hooks": [
{
"name": "push-to-cache",
"command": ["nix", "copy", "--to", "s3://my-cache"],
"env": {
"AWS_PROFILE": "production"
}
}
],
"fod_post_build_hooks": [
{
"name": "push-fods-public",
"command": ["cachix", "push", "public-cache"]
}
]
}
}
}
Hook Fields
name: Unique identifier for the hook (used in logging)command: Array of command and argumentsenv: (Optional) Additional environment variables
Hook Execution Behavior
- Regular hooks: Run for all successful builds
- FOD hooks: Run in addition to regular hooks for fixed-output derivations
- Additive: Both regular and FOD-specific hooks execute for FODs
- Async: Hooks run asynchronously and don't block build recording
- Failure handling: Hook failures are logged but don't fail the build
Environment Variables
Each hook receives:
Nix-Compatible Variables
DRV_PATH: Path to the derivation fileOUT_PATHS: Space-separated list of output store paths
Extended eka-ci Variables
EKA_JOB_NAME: Name of the job from configEKA_IS_FOD: "true" or "false"EKA_SYSTEM: Build system (e.g., "x86_64-linux")EKA_PNAME: Package name (if available)EKA_BUILD_LOG_PATH: Path to build logEKA_COMMIT_SHA: Git commit SHA
Custom Variables
Any additional variables defined in the hook's env field.
Example Use Cases
Push to S3 Binary Cache
{
"post_build_hooks": [
{
"name": "push-s3",
"command": ["nix", "copy", "--to", "s3://my-cache?region=us-west-2"],
"env": {
"AWS_PROFILE": "ci"
}
}
]
}
Push to Cachix
{
"post_build_hooks": [
{
"name": "push-cachix",
"command": ["cachix", "push", "mycache", "$OUT_PATHS"],
"env": {
"CACHIX_AUTH_TOKEN": "secret-token"
}
}
]
}
Different Caches for FODs
{
"post_build_hooks": [
{
"name": "push-private",
"command": ["nix", "copy", "--to", "s3://private-cache"]
}
],
"fod_post_build_hooks": [
{
"name": "push-public",
"command": ["nix", "copy", "--to", "s3://public-cache"]
}
]
}
Implementation Status
✅ Completed (Production Ready)
- Hook types and data structures
- Hook executor service with async processing
- Database schema for config storage and hook tracking
- Integration with RecorderService
- Environment variable setup (Nix-compatible + extended)
- FOD detection and additive hook execution
- Logging infrastructure
- HookExecutor initialized in SchedulerService
- Job config stored in database when creating jobsets
- Actual output paths queried from nix-store (implemented 2024-04-11)
- Automatic cache push with credential loading (implemented 2024-04-11)
- Support for all cache types (NixCopy, Cachix, Attic)
🚧 TODO (Future Enhancements)
- Query pname from DrvInfo for richer context (low priority)
- Implement actual log path lookup (low priority)
- Add metrics for hook execution
- Add tests for hook functionality
Testing
Testing Automatic Cache Push
To test the automatic cache push:
- Configure a cache in server config (
~/.config/ekaci/ekaci.toml) - Reference the cache in repository
.eka-ci/config.json:{ "jobs": { "my-package": { "file": "default.nix", "caches": ["production-s3"] } } } - Trigger a build by opening a PR
- Monitor logs for cache push execution:
journalctl -u eka-ci -f | grep -E "(cache|hook|nix copy)" - Verify artifacts appear in your cache (S3, Cachix, etc.)
Expected log output:
DEBUG eka_ci_server::scheduler::recorder: Loaded credentials for cache 'production-s3'
DEBUG eka_ci_server::scheduler::recorder: Created cache push hook for cache 'production-s3'
DEBUG eka_ci_server::scheduler::recorder: Found 1 output path(s) for drv
DEBUG eka_ci_server::hooks::executor: Executing hook: push-production-s3
INFO eka_ci_server::hooks::executor: Hook 'push-production-s3' completed successfully
Testing Manual Post-Build Hooks
To test manual hook implementation:
- Create a
.eka-ci/config.jsonwithpost_build_hooks - Trigger a build
- Check logs in
{logs_dir}/{drv_hash}/hook-{name}.log - Verify hook environment variables are set correctly
- Confirm FOD-specific hooks run for FODs
Future Enhancements
- Conditional Execution: Allow hooks to specify conditions (e.g., only on main branch)
- Retry Logic: Implement retry with backoff for failed hooks
- Hook Templates: Define reusable hook templates
- Dependency Graph: Allow hooks to depend on other hooks
- Timeout Configuration: Per-hook timeout configuration
- Rate Limiting: Limit concurrent hook executions to prevent resource exhaustion
Migration Guide
From No Hooks to Post-Build Hooks
- Run database migration:
20260409_job_config.sql - Update
.eka-ci/config.jsonto includepost_build_hooks - Deploy updated server with HookExecutor initialized
- Monitor hook execution logs
Nix post-build-hook Equivalents
| Nix Feature | eka-ci Equivalent |
|---|---|
post-build-hook in nix.conf | post_build_hooks in job config |
$OUT_PATHS env var | $OUT_PATHS env var |
$DRV_PATH env var | $DRV_PATH env var |
| Global hook script | Per-job hook configuration |
| Synchronous execution | Asynchronous execution (non-blocking) |
Implementation Details
Cache Push Implementation (2024-04-11)
The automatic cache push feature was completed with the following additions to backend/server/src/scheduler/recorder.rs:
-
build_cache_push_hook()- Async function that:- Loads credentials from configured source (Vault, AWS SM, etc.)
- Builds appropriate command based on cache type (NixCopy, Cachix, Attic)
- Returns
PostBuildHookwith credentials in environment
-
get_drv_output_paths()- Queries actual output paths usingnix-store --query --outputs -
Updated
execute_hooks_for_drv()- Integration that:- Resolves cache IDs from job config
- Checks cache permissions (repo/branch restrictions)
- Loads credentials and builds hooks
- Queries output paths from nix-store
- Sends hook tasks to HookExecutor
How It Works
- After successful build, RecorderService calls
execute_hooks_for_drv() - Job config is retrieved from database (contains cache IDs)
- For each cache:
- Cache config is looked up from server registry
- Permissions are checked
- Credentials are loaded asynchronously
- Hook command is built
- Output paths are queried from nix-store
- HookTask is sent to HookExecutor with all hooks and credentials
- HookExecutor runs each hook sequentially, logging output
Security
- Credentials loaded fresh for each build
- Credentials passed through environment, never logged
- Permission checks before any cache access
- Separate credentials for each cache
- Async execution doesn't block builds
- Failures don't cascade (one bad cache doesn't affect others)
References
- Cache Configuration Guide - Detailed cache setup
- GitHub App Setup Guide - Credential sources
- Nix post-build-hook documentation
- Database schema:
backend/server/sql/migrations/20260409_job_config.sql - Hook executor:
backend/server/src/hooks/executor.rs - Cache push implementation:
backend/server/src/scheduler/recorder.rs(lines 462-551, 630-644, 663-676)
LRU Cache Operational Runbook
Version: 1.0 Date: 2026-04-07 Status: Production Ready
Table of Contents
Quick Reference
Configuration
Environment Variable:
export EKA_CI_GRAPH_LRU_CAPACITY=100000
Config File (~/.config/ekaci/ekaci.toml):
graph_lru_capacity = 100000
Default: 100,000 nodes
Key Metrics
| Metric | Description | Healthy Range |
|---|---|---|
eka_ci_graph_cache_utilization | Cache fullness (0.0-1.0) | 0.5 - 0.8 |
eka_ci_graph_cache_reloads_total | Cache misses (counter) | < 100/day |
eka_ci_graph_pinned_nodes_total | Protected nodes | 50 - 500 |
eka_ci_graph_nodes_total | Total nodes | < capacity |
Log Messages
Normal Operation:
INFO Cache status: 45000/100000 nodes (45.0% utilized), 123 pinned
Warning (80% utilization):
WARN Cache utilization elevated (82.3%): Monitor for potential capacity issues
Critical (90% utilization):
WARN Cache utilization HIGH (93.1%): Consider increasing EKA_CI_GRAPH_LRU_CAPACITY (current: 100000)
Monitoring
Grafana Dashboard
Panel 1: Cache Utilization (Gauge)
eka_ci_graph_cache_utilization * 100
- Unit: Percent
- Thresholds:
- Green: < 70%
- Yellow: 70-85%
- Red: > 85%
Panel 2: Cache Size (Graph)
sum(eka_ci_graph_nodes_total)
- Unit: Nodes
- Show: Current, Max capacity
Panel 3: Cache Reload Rate (Graph)
rate(eka_ci_graph_cache_reloads_total[5m]) * 60
- Unit: Reloads/min
- Alert: > 10/min for 15 minutes
Panel 4: Reload Latency (Graph)
histogram_quantile(0.50, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
histogram_quantile(0.90, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
- Unit: Seconds
- Labels: p50, p90, p99
Panel 5: Pinned Nodes (Stat)
eka_ci_graph_pinned_nodes_total
- Unit: Nodes
- Description: Active builds
Panel 6: Eviction Candidates by Tier (Stacked Graph)
eka_ci_graph_eviction_candidates_total{tier="tier1_transitive_failure"}
eka_ci_graph_eviction_candidates_total{tier="tier2_completed_failure"}
eka_ci_graph_eviction_candidates_total{tier="tier3_completed_success"}
Key Performance Indicators (KPIs)
Healthy System:
- Utilization: 50-70%
- Reload rate: < 5/min
- Reload latency (p99): < 50ms
- Pinned nodes: 50-200
Concerning:
- Utilization: > 80%
- Reload rate: > 10/min
- Reload latency (p99): > 100ms
- Pinned nodes: > 1000
Critical:
- Utilization: > 90%
- Reload rate: > 50/min
- Reload latency (p99): > 500ms
- Cache thrashing
Capacity Tuning
Determining Optimal Capacity
Formula:
Optimal Capacity = (Peak Node Count × 1.5) + Buffer
Example:
- Peak node count: 60,000
- Optimal capacity: 60,000 × 1.5 = 90,000
- Add buffer: 90,000 + 10,000 = 100,000
Capacity Sizing Guide
| Workload | Node Count | Recommended Capacity | Memory Usage |
|---|---|---|---|
| Small | < 10k | 20,000 | ~22 MB |
| Medium | 10k - 50k | 75,000 | ~83 MB |
| Large | 50k - 100k | 150,000 | ~165 MB |
| Very Large | 100k - 200k | 300,000 | ~330 MB |
Increasing Capacity
When to increase:
- Utilization consistently > 80%
- Reload rate > 10/min
- Warnings in logs every 5 minutes
How to increase:
-
Calculate new capacity:
New Capacity = Current Capacity × 1.5 -
Set environment variable:
export EKA_CI_GRAPH_LRU_CAPACITY=150000 -
Restart service:
systemctl restart eka-ci -
Monitor for 1 hour:
eka_ci_graph_cache_utilization -
Verify:
- Utilization < 70%
- Reload rate < 5/min
- No warnings
Decreasing Capacity
When to decrease:
- Utilization consistently < 30%
- Memory usage high (> 200 MB)
- Zero cache reloads for 24+ hours
How to decrease:
-
Calculate new capacity:
New Capacity = Peak Node Count × 1.3 -
Set environment variable:
export EKA_CI_GRAPH_LRU_CAPACITY=75000 -
Restart service:
systemctl restart eka-ci -
Monitor closely for 24 hours:
- Watch reload rate (should stay < 10/min)
- Monitor utilization (should be 50-70%)
Troubleshooting
Problem 1: High Utilization (> 90%)
Symptoms:
- Log warnings every 5 minutes
- Potential cache thrashing
- Slow build dispatch
Diagnosis:
# Check utilization
eka_ci_graph_cache_utilization
# Check growth rate
rate(sum(eka_ci_graph_nodes_total)[1h])
Solution:
-
Immediate: Increase capacity by 50%
export EKA_CI_GRAPH_LRU_CAPACITY=150000 systemctl restart eka-ci -
Long-term: Calculate proper capacity based on workload
Prevention:
- Set alert for 85% utilization
- Review capacity quarterly
Problem 2: High Reload Rate (> 10/min)
Symptoms:
- Frequent cache misses
- Elevated database load
- Slow API responses
Diagnosis:
# Reload rate
rate(eka_ci_graph_cache_reloads_total[5m]) * 60
# Which nodes are being reloaded?
# Check logs for "Cache miss: reloading"
Possible Causes:
Cause 1: Capacity Too Small
- Utilization > 85%
- Solution: Increase capacity
Cause 2: Workload Pattern Changed
- Many terminal nodes evicted, then accessed again
- Solution: Increase tier age thresholds
Cause 3: Hot Path Not Protected
is_buildable()nodes being evicted- Solution: Ensure
touch_buildable_check()is called
Problem 3: High Memory Usage
Symptoms:
- Process memory > 500 MB
- OOM risk
- Swap usage
Diagnosis:
# Memory estimate
eka_ci_graph_memory_bytes_estimate
# Utilization
eka_ci_graph_cache_utilization
Solutions:
If utilization < 50%:
- Cause: Capacity too large
- Fix: Decrease capacity to match peak workload
If utilization > 80%:
- Cause: Legitimate high usage
- Fix: Add more RAM or optimize elsewhere
Problem 4: Zero Reloads Despite Low Utilization
Symptoms:
- Utilization < 30%
- Zero cache reloads for days
- High memory usage
Diagnosis:
# Reload count
eka_ci_graph_cache_reloads_total
# Utilization
eka_ci_graph_cache_utilization
Cause: Capacity oversized
Solution:
- Decrease capacity to improve efficiency
- Free up memory for other services
Problem 5: Slow Reload Latency (p99 > 100ms)
Symptoms:
- High reload latency
- Slow API responses
- Database contention
Diagnosis:
# Reload latency
histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m]))
# Reload rate
rate(eka_ci_graph_cache_reloads_total[5m])
Possible Causes:
Cause 1: High Reload Rate
- Too many concurrent reloads
- Database overwhelmed
- Solution: Increase capacity to reduce reload frequency
Cause 2: Database Slow
- Check database metrics
- Optimize queries
- Add indexes if needed
Cause 3: Large Nodes
- Nodes with many dependencies
- Solution: Optimize edge loading (future work)
Alerts
Prometheus Alert Rules
groups:
- name: lru_cache_alerts
rules:
# Critical: High utilization
- alert: LRUCacheUtilizationHigh
expr: eka_ci_graph_cache_utilization > 0.90
for: 15m
labels:
severity: warning
annotations:
summary: "LRU cache utilization is high ({{ $value | humanizePercentage }})"
description: "Cache is {{ $value | humanizePercentage }} full. Consider increasing capacity."
# Warning: Elevated utilization
- alert: LRUCacheUtilizationElevated
expr: eka_ci_graph_cache_utilization > 0.80
for: 1h
labels:
severity: info
annotations:
summary: "LRU cache utilization is elevated ({{ $value | humanizePercentage }})"
description: "Cache is {{ $value | humanizePercentage }} full. Monitor for growth."
# Critical: High reload rate
- alert: LRUCacheReloadRateHigh
expr: rate(eka_ci_graph_cache_reloads_total[5m]) * 60 > 10
for: 15m
labels:
severity: warning
annotations:
summary: "Cache reload rate is high ({{ $value }} reloads/min)"
description: "Frequent cache misses detected. Capacity may be too small."
# Warning: Slow reloads
- alert: LRUCacheReloadSlow
expr: histogram_quantile(0.99, rate(eka_ci_graph_cache_reload_duration_seconds_bucket[5m])) > 0.1
for: 15m
labels:
severity: info
annotations:
summary: "Cache reloads are slow (p99: {{ $value }}s)"
description: "Database may be under load or capacity is causing thrashing."
# Info: Many pinned nodes
- alert: LRUCacheManyPinnedNodes
expr: eka_ci_graph_pinned_nodes_total > 1000
for: 30m
labels:
severity: info
annotations:
summary: "Many nodes pinned ({{ $value }})"
description: "High number of active builds. This is normal during large builds."
Performance Optimization
Best Practices
-
Set Capacity to 1.5× Peak Usage
- Provides headroom for growth
- Minimizes reload rate
- Optimal utilization: 60-70%
-
Call
touch_buildable_check()Afteris_buildable()- Protects hot path nodes
- Prevents thrashing on active builds
#![allow(unused)] fn main() { if graph_handle.is_buildable(&drv_id) { graph_handle.touch_buildable_check(&drv_id); // ... dispatch build ... } } -
Monitor Utilization Trends
- Review every quarter
- Adjust capacity as workload changes
- Plan for growth
-
Avoid Frequent Restarts
- LRU cache is warmed up over time
- Restarts cause cold cache (100% reload rate initially)
- Allow 1 hour for warmup
Capacity Planning
Formula for Growth:
Future Capacity = Current Peak × Growth Factor × Headroom
Where:
- Growth Factor = Expected growth (1.2 = 20% growth)
- Headroom = Safety margin (1.5 = 50% headroom)
Example:
- Current peak: 50,000 nodes
- Expected 20% growth: 50,000 × 1.2 = 60,000
- With 50% headroom: 60,000 × 1.5 = 90,000
Common Scenarios
Scenario 1: Large Build (200k drvs)
Expected Behavior:
- Utilization rises to 80-90%
- Pinned nodes: 500-2000 (active builds)
- Reload rate: 5-10/min (terminal nodes evicted)
- Warnings logged (normal)
Action: Monitor, no action needed unless reload rate > 20/min
Scenario 2: Idle System
Expected Behavior:
- Utilization: 5-10% (only completed builds)
- Pinned nodes: 0-5
- Reload rate: 0/min
- No warnings
Action: Consider decreasing capacity to save memory
Scenario 3: Continuous Integration
Expected Behavior:
- Utilization: 40-60% (steady state)
- Pinned nodes: 50-200 (concurrent builds)
- Reload rate: < 5/min
- No warnings
Action: Optimal state, no action needed
Maintenance
Quarterly Review
-
Check peak utilization (last 90 days):
max_over_time(eka_ci_graph_cache_utilization[90d]) -
Check reload rate:
avg_over_time(rate(eka_ci_graph_cache_reloads_total[1h])[90d:1h]) * 60 -
Adjust capacity if needed:
- If peak > 80%: Increase by 50%
- If peak < 40%: Decrease by 25%
Version Upgrades
Before upgrade:
- Note current capacity setting
- Export metrics for comparison
After upgrade:
- Verify capacity setting persists
- Compare metrics (should be similar)
- Monitor for 24 hours
Emergency Procedures
Cache Thrashing (Reload Rate > 50/min)
Immediate Action:
-
Double capacity:
export EKA_CI_GRAPH_LRU_CAPACITY=200000 systemctl restart eka-ci -
Monitor for 15 minutes
-
If still thrashing, double again
Follow-up:
- Investigate root cause
- Review workload patterns
- Consider permanent capacity increase
Out of Memory
Immediate Action:
-
Restart service (clears cache):
systemctl restart eka-ci -
Reduce capacity by 50%:
export EKA_CI_GRAPH_LRU_CAPACITY=50000 systemctl start eka-ci -
Monitor memory usage
Follow-up:
- Identify memory leak (if any)
- Right-size capacity for available RAM
- Consider adding more RAM
Support
Logs to Collect
# Cache status logs (last hour)
journalctl -u eka-ci --since "1 hour ago" | grep "Cache status"
# Warnings (last 24 hours)
journalctl -u eka-ci --since "1 day ago" | grep -E "WARN|ERROR"
# Cache misses (last hour)
journalctl -u eka-ci --since "1 hour ago" | grep "Cache miss"
Metrics to Export
# Current state
curl http://localhost:8080/metrics | grep eka_ci_graph
# Or via Prometheus query
eka_ci_graph_cache_utilization
eka_ci_graph_cache_reloads_total
eka_ci_graph_nodes_total
Summary
Key Takeaways:
- Monitor utilization - Keep between 50-80%
- Watch reload rate - Should be < 5/min normally
- Tune capacity - 1.5× peak usage is optimal
- Set alerts - For 85% utilization and high reload rate
- Review quarterly - Adjust as workload changes
Healthy System Checklist:
- ✅ Utilization: 50-70%
- ✅ Reload rate: < 5/min
- ✅ No warnings in logs
- ✅ Pinned nodes: 50-500
- ✅ Reload latency (p99): < 50ms
Monitoring & Metrics
Eka CI exposes Prometheus metrics and structured logs. Together they cover build queue health, cache utilization, GitHub integration, and rebuild detection.
Prometheus metrics
Metrics are served at /metrics on the address configured in [web]. Common series:
| Metric | Type | Description |
|---|---|---|
eka_ci_build_queue_depth | gauge | Pending builds per platform queue. |
eka_ci_build_duration_seconds | histogram | End-to-end build wall time. |
eka_ci_build_outcome_total | counter | Builds by outcome (success, failed, cancelled). |
eka_ci_graph_cache_hits_total | counter | LRU cache hits for the dependency graph. |
eka_ci_graph_cache_misses_total | counter | LRU cache misses. |
eka_ci_graph_cache_size | gauge | Current number of nodes in the LRU cache. |
eka_ci_webhook_processing_seconds | histogram | Webhook handler latency. |
eka_ci_rebuild_count | histogram | Rebuilds detected per PR. |
eka_ci_change_summary_render_seconds | histogram | Change-summary render time. |
For deeper guidance on the cache metrics specifically, see LRU Cache Tuning.
Useful queries
# Build queue depth, per platform
eka_ci_build_queue_depth
# Cache hit rate over 5 minutes
rate(eka_ci_graph_cache_hits_total[5m])
/ (rate(eka_ci_graph_cache_hits_total[5m])
+ rate(eka_ci_graph_cache_misses_total[5m]))
# 95th percentile webhook latency
histogram_quantile(0.95,
rate(eka_ci_webhook_processing_seconds_bucket[5m]))
Logging
Logs are emitted via the tracing crate as structured records. Verbosity is controlled
through RUST_LOG:
# Set a global level
RUST_LOG=info eka-ci-server
# Per-module filters
RUST_LOG=eka_ci_server::scheduler=debug,eka_ci_server=info eka-ci-server
When run under systemd, view logs with:
journalctl -u eka-ci -f
Key log targets:
eka_ci_server::scheduler— build scheduling and queue transitions.eka_ci_server::webhooks— incoming GitHub events.eka_ci_server::graph— dependency graph and LRU cache activity.eka_ci_server::change_summary— change-summary pipeline.eka_ci_server::cache_push— cache push results and post-build hooks.
Recommended alerts
A starting set of alerts for production:
eka_ci_build_queue_depthis high for too long — pending work is not draining.- Webhook 5xx rate non-zero — GitHub deliveries are being rejected.
- Cache hit rate < 0.6 sustained — LRU is undersized; see LRU Cache Tuning.
- Change-summary check stuck pending > 10 minutes — see Change Summaries.
The runbook pages for the LRU cache and change-summary pipeline include more specific threshold and remediation guidance.
Eka CI Backend Architecture
Overview
Eka CI is a Nix-based Continuous Integration server built in Rust using async/await with Tokio. The system follows a multi-service actor-based architecture where independent services communicate via message passing through Tokio's mpsc channels.
The backend is designed to efficiently build large Nix monorepos (like Nixpkgs) with intelligent scheduling, remote builder support, and comprehensive build state tracking.
Core Design Principles
- Service isolation: Each major component runs as an independent service with its own message queue
- Async-first: Built on Tokio for high concurrency and I/O efficiency
- Type-safe state machine: Build states are explicitly modeled with exhaustive pattern matching
- Dependency awareness: Tracks derivation dependency graphs for intelligent scheduling
- GitHub-native: First-class integration with GitHub Pull Requests and check runs
System Architecture
Services Overview
The system consists of 7 core services initialized in backend/server/src/services/mod.rs:
- DbService - SQLite database operations with connection pooling
- GitService - Git repository cloning, fetching, and worktree management
- RepoReader - CI configuration parsing from
.ekaci/config.json - EvalService - Nix expression evaluation using
nix-eval-jobs - SchedulerService - Multi-tier build orchestration (composed of 3 sub-services)
- GitHubService - GitHub API integration (check runs, PR status updates via Octocrab)
- WebService - HTTP API and web interface (Axum)
- UnixService - Unix domain socket for CLI client communication
Service Communication Pattern
┌─────────────┐
│ GitHub │
│ Webhook │
└──────┬──────┘
│
▼
┌─────────────┐ GitTask ┌─────────────┐
│ WebService │───────────────▶│ GitService │
└─────────────┘ └──────┬──────┘
│ RepoTask
▼
┌─────────────┐
│ RepoReader │
└──────┬──────┘
│
┌─────────────┴─────────────┐
│ │
EvalTask CheckTask
│ │
▼ ▼
┌─────────────┐ ┌──────────────┐
│ EvalService │ │ChecksExecutor│
└──────┬──────┘ └──────┬───────┘
│ │
IngressTask CheckResult
│ │
▼ │
┌──────────────────┐ │
│ IngressService │ │
└────────┬─────────┘ │
│ │
BuildRequest │
│ │
▼ │
┌──────────────────┐ │
│ BuildQueue │ │
└────────┬─────────┘ │
│ │
┌──────────────┼──────────────┐ │
│ │ │ │
[FOD] [Local] [Remote] │
│ │ │ │
└──────────────┴──────────────┘ │
│ │
▼ │
┌──────────────────┐ │
│ BuilderThread │ │
└────────┬─────────┘ │
│ │
NixBuild │
│ │
▼ │
┌──────────────────┐ │
│ RecorderService │◀───────────────┘
└────────┬─────────┘
│
┌────────────┴────────────┐
│ │
Update Database GitHubTask
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ DbService│ │GitHubService │
└──────────┘ └──────────────┘
Data Model
Build State Machine
The DrvBuildState enum (backend/server/src/ci/mod.rs:44) defines a comprehensive state machine:
Queued
│
├─▶ Buildable ─────▶ Building ─────▶ Completed(Success)
│ │
│ └─▶ Completed(Failure)
│ │
│ └─▶ FailedRetry ─┐
│ (1st fail) │
│ │
│◀───────────────────────────────────────────────────────┘
│
├─▶ TransitiveFailure (dep failed, propagated)
│
├─▶ Blocked (dep interrupted)
│
└─▶ Interrupted(kind)
├─ OutOfMemory
├─ Timeout
├─ Cancelled
└─ ProcessDeath
State Guarantees:
- Queued: All drvs start here when discovered by evaluation
- Buildable: Scheduler guarantees all dependencies are successful
- FailedRetry: Automatic retry (one chance for transient failures)
- TransitiveFailure: Permanent block until upstream fixed
- Completed: Terminal state, never changes
- All transitions recorded in
DrvBuildEventwith timestamps
CI Workflow Processing
Configuration Format
CI configurations are stored in .ekaci/config.json at the repository root:
{
"jobs": {
"job-name": {
"file": "/path/to/nix/file.nix",
"allow_eval_failures": true
}
}
}
Jobs - Nix-based builds evaluated by nix-eval-jobs
Checks - Sandboxed imperative commands (linters, formatters, tests)
Complete End-to-End Flow
1. GitHub Trigger
GitHub PR opened/updated → Webhook → WebService → GitTask::GitHubCheckout
2. Repository Checkout (backend/server/src/services/git.rs)
#![allow(unused)] fn main() { GitService: - Clone base repo if not exists - Fetch PR branch - Create git worktrees for base and head commits - Send RepoTask::ReadGitHub to RepoReader }
Git worktrees allow parallel access to different commits without checkout conflicts.
3. Configuration Parsing (backend/server/src/services/repo_reader.rs)
#![allow(unused)] fn main() { RepoReader: - Read .ekaci/config.json - Parse via CIConfig::from_str() (serde) - For each job: - Check if already processed (db.has_jobset) - Create EvalJob with file path - Send EvalTask::GithubJobPR to EvalService - For each check: - Send CheckTask to ChecksExecutor }
4. Nix Evaluation (backend/server/src/services/eval.rs)
#![allow(unused)] fn main() { EvalService: - Create "CI Configure Gate" GitHub check run - Run: nix-eval-jobs --flake .#{job.file} - Parse JSON stream of NixEvalDrv structs If evaluation fails and allow_failures=false: - Send GitHubTask::FailCIEvalJob - STOP (don't queue builds) - Send GitHubTask::CreateJobSet to GitHubService - deep_traverse() to discover all dependencies: - Run: nix-store --query --requisites {drv} - Fetch info: nix derivation show {drv}... - Batch insert into database (150 drvs at a time) - Insert dependency relationships (DrvRefs) - Send IngressTask::EvalRequest for each drv }
Optimization: LRU cache (5000 entries) avoids re-fetching drv info.
5. Build Scheduling
IngressService (backend/server/src/scheduler/ingress.rs)
#![allow(unused)] fn main() { IngressService receives IngressTask::EvalRequest: - Skip if drv in terminal state - Query: is_drv_buildable() - Check all dependencies in DrvRefs - Ensure all are Completed(Success) - If buildable: - Update state to Buildable - Send BuildRequest to BuildQueue }
BuildQueue (backend/server/src/scheduler/build/queue.rs)
Routes builds to platform-specific queues based on drv.system:
x86_64-linuxaarch64-linuxx86_64-darwinaarch64-darwin
PlatformQueue (backend/server/src/scheduler/build/system_queue.rs)
Intelligent routing based on build characteristics:
#![allow(unused)] fn main() { if drv.is_fod { → FOD Builder (dedicated for Fixed-Output Derivations) } else if drv.prefer_local_build { → Local Builder } else { → Remote Builder Pool or Local } }
Rationale: FOD builds (like fetchurl) are network-bound and benefit from dedicated capacity to avoid blocking compute-heavy builds.
6. Build Execution
BuilderThread (backend/server/src/scheduler/build/builder_thread.rs)
#![allow(unused)] fn main() { - Manages JoinSet of concurrent builds - Respects max_jobs limit per builder - Spawns NixBuild task for each build }
NixBuild (backend/server/src/scheduler/build/nix_build.rs)
Core build executor:
#![allow(unused)] fn main() { 1. Create log file: {logs_dir}/{drv_hash}/build.log 2. Spawn: nix-build {drv_path} --builders '{config}' 3. Stream stdout/stderr to log file in real-time 4. Monitor for timeout: - Reset timer on each line of output - Kill process if no_output_timeout_seconds exceeded 5. On completion: - Success: Attempt to fetch substituter logs via 'nix log' - Failure: Return BuildOutcome::Failure - Timeout: Return BuildOutcome::Timeout }
Timeout Behavior: The timeout is output-based, not total time. A build can run indefinitely as long as it produces output.
7. Build Recording (backend/server/src/scheduler/recorder.rs)
#![allow(unused)] fn main() { RecorderService receives RecorderTask: - Update Drv.build_state in database On Success: - Clear transitive failures for this drv - Re-queue blocked downstream drvs - Send IngressTask::CheckBuildable for each downstream drvs On Failure (first time): - Update state to FailedRetry - Re-queue immediately (same IngressTask) On Failure (second time): - Mark as permanent Completed(Failure) - Propagate TransitiveFailure to all downstream drvs - Send GitHubTask::UpdateBuildStatus If all jobs in jobset concluded: - Determine conclusion (success if no new/changed failures) - Send GitHubTask::CompleteCIEvalJob }
Retry Logic: Single automatic retry handles transient failures (network issues, temporary resource constraints).
8. GitHub Status Updates (backend/server/src/services/github.rs)
#![allow(unused)] fn main() { GitHubService: - Use Octocrab (GitHub API client) - Create check runs (lazily on failure to reduce noise) - Update check run status: - queued → in_progress → completed - Set conclusion: - success / failure / timed_out / cancelled - Complete eval gate check runs when jobset finished }
Lazy Check Run Creation: Only create check runs for new/changed jobs that fail. This reduces PR noise for large derivation sets where most builds are unchanged and successful.
Scheduler Architecture Deep Dive
The scheduler is the most complex component, consisting of 3 tiers:
Tier 1: IngressService
Responsibility: Determine build eligibility
#![allow(unused)] fn main() { // backend/server/src/scheduler/ingress.rs IngressTask::EvalRequest { drv_path } → { if is_terminal_state(drv_path) { return; // Already built } if is_drv_buildable(drv_path) { update_state(drv_path, Buildable); send(BuildRequest { drv_path }); } } }
Key query: is_drv_buildable() checks that all dependencies are Completed(Success).
Tier 2: BuildQueue
Responsibility: Platform routing
#![allow(unused)] fn main() { // backend/server/src/scheduler/build/queue.rs BuildRequest { drv } → { match drv.system { "x86_64-linux" => x86_64_linux_queue, "aarch64-linux" => aarch64_linux_queue, "x86_64-darwin" => x86_64_darwin_queue, "aarch64-darwin" => aarch64_darwin_queue, } } }
Tier 3: PlatformQueue
Responsibility: Builder selection and capacity management
#![allow(unused)] fn main() { // backend/server/src/scheduler/build/system_queue.rs struct PlatformQueue { fod_builder: BuilderHandle, // Fixed-Output Derivations local_builder: BuilderHandle, // Local builds remote_builders: Vec<BuilderHandle>, // Remote builder pool } fn route_build(drv: &Drv) -> BuilderHandle { if drv.is_fod { self.fod_builder } else if drv.prefer_local_build { self.local_builder } else { // Round-robin or capacity-aware selection self.select_remote_builder() } } }
Builder Types:
- FOD Builder: Dedicated capacity for network-bound builds (fetchurl, fetchFromGitHub)
- Local Builder: Runs nix-build on the CI server itself
- Remote Builder: SSH-based remote Nix builders (configured via
RemoteBuilder)
Capacity Management: Each builder has a max_jobs limit. The PlatformQueue tracks active builds per builder and queues excess work.
Builder Health Checking
#![allow(unused)] fn main() { // backend/server/src/scheduler/build/builder.rs impl Builder { async fn is_available(&self) -> bool { // Run: nix store ping --store {uri} Command::new("nix") .args(["store", "ping", "--store", &self.uri]) .status() .await .map(|s| s.success()) .unwrap_or(false) } } }
Builders are health-checked before use to avoid queueing builds to unavailable remotes.
Security Model
Sandboxing via birdcage:
- Filesystem isolation (only sees
/nix/storeand checkout directory) - Network isolation (configurable per-check)
- No access to home directory or system files
- Prevents arbitrary file access outside checkout
Nix Package Provisioning:
nix-shell -p nixfmt statix --run 'env'
Fetches PATH and environment variables with packages available, then runs the command in that environment within the sandbox.
Use Cases
- Linters:
nixfmt --check,statix check - Tests:
pytest,cargo test(without Nix wrapping) - Formatters:
prettier --check,black --check - Custom scripts: Any command that doesn't require a full Nix build
Database Storage
GitHubCheckSets:
(sha, check_name, owner, repo_name) → check_id
CheckResult:
check_id → (success, exit_code, stdout, stderr, duration_ms, executed_at)
CheckRunInfo:
check_id → (check_run_id, check_run_node_id)
Similar to job-based builds, checks integrate with GitHub check runs for PR status reporting.
Build Features
Remote Builders
Configured via RemoteBuilder struct:
#![allow(unused)] fn main() { struct RemoteBuilder { uri: String, // ssh://user@host or nix-daemon:/// platforms: Vec<String>, // ["x86_64-linux", "i686-linux"] max_jobs: u32, // Concurrent build limit speed_factor: u32, // Priority hint (higher = prefer) } }
Passed to nix-build via --builders flag:
nix-build /nix/store/xxx.drv --builders 'ssh://builder1 x86_64-linux,i686-linux 10 1'
Build Timeout Handling
Output-based timeout (not total duration):
#![allow(unused)] fn main() { // backend/server/src/scheduler/build/nix_build.rs let no_output_timeout = Duration::from_secs(config.no_output_timeout_seconds); let mut last_output = Instant::now(); while let Some(line) = stdout.next_line().await? { last_output = Instant::now(); // Reset timeout log_file.write_all(line.as_bytes()).await?; } if last_output.elapsed() > no_output_timeout { process.kill().await?; return BuildOutcome::Timeout; } }
Rationale: Large builds (like LLVM) may run for hours but should timeout if they hang (no output).
Log Capture
Real-time streaming:
#![allow(unused)] fn main() { let log_path = format!("{}/{}/build.log", logs_dir, drv_hash); let mut log_file = BufWriter::new(File::create(log_path).await?); while let Some(line) = stdout.next_line().await? { log_file.write_all(line.as_bytes()).await?; } log_file.flush().await?; }
Post-build log fetching:
nix log /nix/store/xxx.drv
For substituted builds (downloaded from cache), nix-build may not produce output. Attempt to fetch logs from the binary cache.
Log serving: WebService exposes logs via /logs/{drv_hash} endpoint.
Dependency Graph Tracking
Insertion (backend/server/src/services/eval.rs:deep_traverse):
#![allow(unused)] fn main() { // Query all dependencies let deps = nix_store_query_requisites(drv_path).await?; // Batch insert relationships db.insert_drv_refs(deps).await?; }
Query (backend/server/src/scheduler/ingress.rs:is_drv_buildable):
SELECT COUNT(*) FROM DrvRefs
WHERE referrer = ?
AND reference NOT IN (
SELECT drv_path FROM Drv
WHERE build_state = 'Completed(Success)'
)
If count > 0, drv has unbuildable dependencies.
Transitive Failure Propagation
When a drv fails permanently:
#![allow(unused)] fn main() { // backend/server/src/scheduler/recorder.rs async fn propagate_transitive_failure(drv_path: &str) { // Find all downstream drvs let downstream = db.query_downstream_drvs(drv_path).await?; for dep_drv in downstream { db.insert_transitive_failure(dep_drv, drv_path).await?; db.update_build_state(dep_drv, TransitiveFailure).await?; } } }
Benefits:
- Prevents wasting resources building drvs that cannot succeed
- Clear attribution of why a build is blocked
- Can be cleared if upstream drv is fixed and rebuilt
GitHub Integration
OAuth Authentication
Flow (backend/server/src/auth/):
- User clicks "Login with GitHub"
- Redirect to GitHub OAuth authorize URL
- GitHub redirects back with code
- Exchange code for access token
- Fetch user info, create JWT session token
- Store in browser cookie
Authorization:
- Optional
require_approvalflag ApprovedUserstable tracks allowed GitHub usernames- Non-approved users can view but not trigger builds
Check Runs
Lazy Creation Strategy:
Only create GitHub check runs for:
- New jobs (not in base commit)
- Changed jobs (drv_path differs between base and head)
- AND the job failed
Rationale: Large repos may have thousands of derivations. Creating check runs for every successful unchanged build clutters the PR.
Implementation (backend/server/src/services/github.rs:update_build_status):
#![allow(unused)] fn main() { async fn update_build_status(drv: &Drv, job: &Job) { // Only create check run if: // 1. Job is New or Changed // 2. Build failed if (job.difference_type == DifferenceType::New || job.difference_type == DifferenceType::Changed) && drv.build_state.is_failure() { let check_run = octocrab .checks(owner, repo) .create_check_run(job.job_name, sha) .status(Status::Completed) .conclusion(Conclusion::Failure) .send() .await?; db.insert_check_run_info(drv.drv_path, check_run.id).await?; } } }
Eval Gate Check Run
Purpose: Indicates whether CI configuration evaluation succeeded.
Created for: Every jobset (combination of commit + job_name)
Completion: When all drvs in the jobset reach terminal state, determine overall conclusion:
- Success: No new or changed drvs failed
- Failure: At least one new or changed drv failed
This provides a single aggregated status for the entire job.
Monitoring and Observability
Prometheus Metrics
Exposed endpoint: /metrics
Metrics (backend/server/src/metrics.rs):
#![allow(unused)] fn main() { // Build queue depth by platform active_builds{platform="x86_64-linux"} 15 queued_builds{platform="x86_64-linux"} 42 // Process metrics (via process_collector) process_cpu_seconds_total 1234.56 process_resident_memory_bytes 524288000 process_virtual_memory_bytes 2147483648 }
Grafana Dashboard (suggested):
- Build throughput (builds/minute)
- Queue depth trends
- Builder utilization
- Failure rate by job
Structured Logging
Libraries: tracing, tracing-subscriber
Log levels:
ERROR: Build failures, service panicsWARN: Retry attempts, transitive failuresINFO: Build starts/completions, state transitionsDEBUG: Service message passing, database queriesTRACE: Detailed build output
Key spans:
#![allow(unused)] fn main() { #[tracing::instrument(skip(db))] async fn build_drv(drv_path: &str, db: &DbService) -> BuildResult { // Automatically logs function entry/exit with timing } }
Build Logs
Storage: {logs_dir}/{drv_hash}/build.log
Rotation: Not implemented (logs accumulate)
Access:
- Web UI: View builds and their logs
- API:
GET /logs/{drv_hash} - CLI:
eka-cli logs {drv_hash}
Configuration
Server Configuration
Environment variables:
DATABASE_URL=sqlite:///var/lib/eka-ci/eka.db
LOGS_DIR=/var/lib/eka-ci/logs
GITHUB_APP_ID=123456
GITHUB_APP_PRIVATE_KEY=/path/to/key.pem
BIND_ADDR=0.0.0.0:8080
Builder Configuration
Local builder:
#![allow(unused)] fn main() { Builder::local(max_jobs: 40) }
Remote builders:
#![allow(unused)] fn main() { RemoteBuilder { uri: "ssh://builder@10.0.0.5".to_string(), platforms: vec!["x86_64-linux".to_string()], max_jobs: 20, speed_factor: 1, } }
FOD builder (special local pool):
#![allow(unused)] fn main() { Builder::local(max_jobs: 10) // Separate capacity for FODs }
Timeout Configuration
#![allow(unused)] fn main() { no_output_timeout_seconds: 3600 // 1 hour default }
Graceful Shutdown
Signal handling (backend/server/src/main.rs):
#![allow(unused)] fn main() { let cancellation_token = CancellationToken::new(); tokio::signal::ctrl_c().await?; cancellation_token.cancel(); // All services use run_until_cancelled() tokio::select! { _ = service.run() => {} _ = cancellation_token.cancelled() => {} } // Wait for all services to drain tokio::time::sleep(Duration::from_secs(5)).await; // Close database pool db.close().await; }
Service shutdown behavior:
- Services drain message queues (process pending tasks)
- In-flight builds are not interrupted (complete naturally)
- Database writes are flushed before exit
Performance Characteristics
Concurrency
- Service isolation: Each service runs independently, maximizing CPU utilization
- Per-builder parallelism: Configurable
max_jobs(default: 40 local, 10 FOD) - Database pooling: SQLite connection pool with multiple readers
- Async I/O: Tokio runtime with work-stealing scheduler
Scalability
Horizontal scaling (not implemented):
- Services could be split across processes
- Message passing via network channels (e.g., Redis pub/sub)
- Distributed builder pool
Vertical scaling:
- Increase
max_jobsfor builders - Add remote builders for more capacity
- Increase database connection pool size
Bottlenecks
- SQLite: Single-writer limitation for database updates
- Evaluation:
nix-eval-jobsis CPU-bound (mitigated by caching) - Local builder: Limited by machine resources
- Log I/O: Large builds produce MB of logs (mitigated by streaming)
Security Considerations
Checks Sandboxing
Threat model: Malicious .ekaci/config.json in untrusted PR
Mitigations:
- Filesystem isolation (only sees checkout +
/nix/store) - Network isolation (disabled by default)
- No home directory access
- No access to CI server's SSH keys, secrets, etc.
Limitations:
- Commands run as the CI server user (within sandbox)
- No resource limits (memory, CPU time) enforced
- DOS possible via infinite loops (timeout required)
Nix Build Isolation
Threat model: Malicious Nix expressions in untrusted PR
Mitigations:
- Nix sandbox (enabled by default)
- Isolated /tmp
- No network access (except FODs)
- Limited filesystem view
- Separate user account for Nix builds (
nixbldgroup)
Limitations:
- Sandbox escapes may exist in Nix itself
- Resource exhaustion possible (disk, memory)
- Recommendation: Run CI server in isolated environment (container, VM)
GitHub Authentication
Webhook validation:
- HMAC signature verification using GitHub App secret
- Prevents forged webhook events
API authentication:
- GitHub App installation tokens (short-lived)
- Scoped permissions (checks:write, contents:read)
Session management:
- JWT tokens with expiration
- Httponly cookies (XSS mitigation)
Future Enhancements
Potential Improvements
- Distributed architecture: Split services across machines for scale
- PostgreSQL support: Overcome SQLite concurrency limits
- Build caching: Integrate with Nix binary caches (Cachix, attic)
- Resource limits: Enforce memory/CPU limits via cgroups
- Multi-tenancy: Support multiple organizations with isolation
- Build prioritization: Prioritize PRs from approved contributors
- Log compression: Gzip old logs to reduce storage
- Web UI improvements: Real-time build progress, failure analysis
- Metrics dashboards: Built-in Grafana/Prometheus integration
- Notification system: Email/Slack alerts for build failures
Known Limitations
- SQLite scaling: Single writer becomes bottleneck at high concurrency
- No build artifact storage: Relies on Nix store (garbage collected)
- Limited failure analysis: No automatic error categorization
- Manual builder management: No auto-scaling of remote builders
- No incremental evaluation: Re-evaluates entire job on every commit
References
Key Files
- Service initialization:
backend/server/src/services/mod.rs - CI configuration:
backend/server/src/ci/config.rs - Build state machine:
backend/server/src/ci/mod.rs - Scheduler tiers:
backend/server/src/scheduler/ - Nix build execution:
backend/server/src/scheduler/build/nix_build.rs - Checks executor:
backend/server/src/checks/executor.rs - GitHub integration:
backend/server/src/services/github.rs - Database migrations:
backend/server/sql/migrations/
External Dependencies
- Tokio: Async runtime
- Axum: Web framework
- SQLx: Async SQL with compile-time query checking
- Octocrab: GitHub API client
- Serde: JSON serialization
- Birdcage: Sandboxing via Linux namespaces
- nix-eval-jobs: Parallel Nix evaluation (external tool)
- Nix: Build execution and derivation management
Last Updated: 2026-02-13 Nix Version: 2.x compatible Rust Edition: 2021