Bash Trap and Rollback Patterns for Safer Linux Deployments
Last updated on
Target keyword: bash trap and rollback patterns
Search intent: Problem-solving / Best-practice
If you have ever run a deployment script on a Linux server and ended up with a half-broken state, you already know the pain: services stop, symlinks point to nowhere, and logs get noisy while users see errors. The script “worked” until one command failed in the middle.
The core issue is not only failed commands. The real issue is lack of controlled failure handling. In production automation, failure is normal. What separates stable teams from chaotic ones is whether every script has a clear cleanup and rollback story.
In this guide, we’ll build a practical pattern using set -Eeuo pipefail, trap, and structured rollback actions. You can use it for app deployments, config changes, or routine maintenance jobs. The goal is simple: if one step fails, your server returns to a known-good state as automatically as possible.
Why this matters in real production
Most shell scripts are written for happy-path execution. But production is never happy-path only. You may hit:
- unavailable package mirrors,
- permission drift,
- missing env vars,
- systemd reload failures,
- full disk at the wrong time,
- or race conditions between concurrent jobs.
Without rollback, each failure becomes a manual incident. With rollback, failure becomes a controlled event.
A good rule: every script that changes runtime state should have at least one rollback path. Even a “small” change like swapping symlink targets should be reversible in seconds.
Prerequisites
- Linux server with Bash 4+
- Basic understanding of
systemctl, symlink-based release layout, and permissions - Non-root deployment user with appropriate sudo rules
- A staging environment to test failure paths before production
Recommended baseline:
shellcheckfor linting- central logs (
journalctl+ app logs) - lockfile strategy to prevent concurrent deployments
Core safety baseline before trap/rollback
Start every deployment script with strict mode and predictable defaults.
#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
umask 022
export LC_ALL=C
Why it matters:
-e: stop on error (with caveats in subshells/conditionals)-u: fail on undefined variablespipefail: pipeline fails if any segment fails- consistent
IFSreduces word-splitting bugs
This baseline is mandatory, but still not enough. Strict mode can stop a script, yet it does not automatically clean partial changes. That’s where trap + rollback stack comes in.
Step 1 — Build a rollback stack pattern
A clean pattern is to register undo actions as you go. If step 4 fails, rollback executes actions from step 3, 2, 1 (LIFO order).
#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
ROLLBACK_ACTIONS=()
declare -i DEPLOY_STARTED=0
action_push() {
local action="$1"
ROLLBACK_ACTIONS+=("$action")
}
rollback() {
local exit_code=$?
echo "[ERROR] deployment failed, running rollback..."
for (( i=${#ROLLBACK_ACTIONS[@]}-1; i>=0; i-- )); do
echo "[ROLLBACK] ${ROLLBACK_ACTIONS[$i]}"
eval "${ROLLBACK_ACTIONS[$i]}" || true
done
echo "[ERROR] rollback complete (exit=${exit_code})"
exit "$exit_code"
}
cleanup() {
rm -f /tmp/myapp.deploy.lock || true
}
trap rollback ERR
trap cleanup EXIT
This gives you two guarantees:
- On error (
ERR), rollback actions run. - On script exit (
EXIT), temporary artifacts are cleaned.
Important: keep rollback actions idempotent. If rollback runs twice, it should not cause new breakage.
Step 2 — Apply the pattern to a symlink-based deployment
A common release layout:
/srv/myapp/releases/<timestamp>/srv/myapp/currentsymlink points to active release
When switching release, always capture previous state first.
APP_DIR="/srv/myapp"
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
NEW_RELEASE="$RELEASES_DIR/$(date +%Y%m%d%H%M%S)"
mkdir -p "$NEW_RELEASE"
action_push "rm -rf '$NEW_RELEASE'"
# Copy artifact
cp -a /tmp/artifact/. "$NEW_RELEASE/"
# Preserve previous symlink target
PREV_TARGET=""
if [[ -L "$CURRENT_LINK" ]]; then
PREV_TARGET="$(readlink -f "$CURRENT_LINK")"
fi
if [[ -n "$PREV_TARGET" ]]; then
action_push "ln -sfn '$PREV_TARGET' '$CURRENT_LINK'"
fi
# Activate new release
ln -sfn "$NEW_RELEASE" "$CURRENT_LINK"
action_push "systemctl restart myapp.service"
systemctl daemon-reload
systemctl restart myapp.service
systemctl is-active --quiet myapp.service
If restart or health check fails, the symlink can be restored automatically. This saves you from SSH panic mode.
Step 3 — Add health checks and hard fail gates
Rollback only helps if failure is detected quickly. Add explicit checks after each risky step.
check_http() {
local url="$1"
local max_retry=10
for _ in $(seq 1 "$max_retry"); do
if curl -fsS "$url" >/dev/null; then
return 0
fi
sleep 2
done
return 1
}
check_http "http://127.0.0.1:8080/healthz"
Typical hard fail gates:
- service active (
systemctl is-active --quiet) - local health endpoint 200
- migration marker exists
- expected config file present
- critical env var loaded
No assumptions. Validate everything you can before declaring success.
Step 4 — Prevent concurrent runs with lockfile
Two deployment runs at once can destroy consistency. Add a lock guard.
LOCK_FILE="/tmp/myapp.deploy.lock"
exec 9>"$LOCK_FILE"
if ! flock -n 9; then
echo "[ERROR] another deployment is running"
exit 1
fi
# rollback-safe cleanup
action_push "rm -f '$LOCK_FILE'"
For system-level scheduling, you can combine this with systemd units/timers. If you still rely on cron jobs, review this comparison: systemd timer vs cron untuk automasi linux production.
Step 5 — Make rollback observable (logs and context)
If rollback runs silently, your team still loses time during incident review. Log key context:
- release ID,
- failed command,
- line number,
- current symlink target,
- service status snapshot.
on_err() {
local ec=$?
echo "[ERR] line=${BASH_LINENO[0]} cmd='${BASH_COMMAND}' code=$ec"
systemctl status myapp.service --no-pager -l || true
rollback
}
trap on_err ERR
You can also append structured logs to /var/log/myapp/deploy.log and feed them to monitoring.
Common failure scenarios and practical fixes
1) Trap not firing as expected
Cause: error occurs in contexts where -e behaves differently (subshells, conditionals, pipelines).
Fix: keep commands explicit, avoid opaque chained one-liners, and prefer direct exit checks for critical commands.
if ! systemctl restart myapp.service; then
echo "[ERROR] restart failed"
exit 1
fi
2) Rollback restores files but service still broken
Cause: process state, cache, or migrations mismatch persisted.
Fix: rollback should include service restart/reload and optional cache reset hooks.
3) Permissions drift after deploy
Cause: artifact extracted with wrong ownership.
Fix: enforce ownership/permissions as explicit step and rollback counterpart where needed.
chown -R myapp:myapp "$NEW_RELEASE"
find "$NEW_RELEASE" -type d -exec chmod 755 {} \;
find "$NEW_RELEASE" -type f -exec chmod 644 {} \;
4) Database migration is not reversible
Cause: destructive migration without down plan.
Fix: split deployment into phases, use backward-compatible migrations, and gate irreversible steps behind explicit approval.
5) Partial cleanup leaves orphan releases
Cause: missing retention policy.
Fix: after successful deploy, keep last N releases and clean older ones in a separate safe task.
Production checklist (copy-paste for your runbook)
- Script uses
set -Eeuo pipefailand deterministicIFS - Lockfile/
flockprevents concurrent runs - Every state-changing step registers rollback action
- Health checks are explicit and blocking
- Service status validated after restart/reload
- Rollback logs command + line + release ID
- Staging failure simulation has been tested
- Retention cleanup separated from critical deploy path
If your team is still improving shell quality, start with this baseline too: bash strict mode and safe automation checklist for linux servers and idempotent shell script: jalankan berkali-kali tanpa berantakan.
FAQ
1) Is trap ERR enough for production deployment scripts?
Good start, but not enough alone. You also need strict mode, explicit health checks, lockfile control, and deterministic rollback actions for each risky step.
2) Should rollback always revert database changes too?
Not always. For many production systems, DB rollback is risky. Prefer backward-compatible migrations and phased rollouts, then only do DB rollback when thoroughly tested.
3) How many rollback actions should be stored?
Only actions that restore critical state. Avoid huge rollback logic for non-critical side effects. Keep it focused, predictable, and idempotent.
4) Can I use this pattern with CI/CD pipelines?
Yes. Wrap this script in your CI job, pass release artifact path and environment variables, then fail pipeline if any post-deploy check fails.
5) What is the first improvement if my current scripts are fragile?
Start by adding strict mode + trap + one reversible symlink switch. That single improvement usually cuts recovery time drastically.
Conclusion
A deployment script is not reliable because it works once. It is reliable because it fails safely, loudly, and reversibly. With trap plus rollback stack, health gates, and lockfile control, your Linux deployments become much more predictable under pressure.
Adopt the pattern incrementally: one service, one rollback path, one tested failure scenario at a time. Within a few release cycles, your operational confidence will rise significantly.
Komentar
Memuat komentar...
Tulis Komentar