Can this rollback pattern be used inside CI/CD pipelines?

Yes. Run the script in CI with release artifact inputs and fail the pipeline if post-deploy checks do not pass.

Bash Trap and Rollback Patterns for Safer Linux Deployments

Q: Is trap ERR enough for production deployment scripts?

It is a strong baseline, but production safety also requires strict mode, explicit health checks, lockfile control, and deterministic rollback actions.

Q: Should rollback always revert database changes too?

Not always. Prefer backward-compatible migrations and phased rollouts. Database rollback should be used only when tested and safe.

Target keyword: bash trap and rollback patterns

Search intent: Problem-solving / Best-practice

If you have ever run a deployment script on a Linux server and ended up with a half-broken state, you already know the pain: services stop, symlinks point to nowhere, and logs get noisy while users see errors. The script “worked” until one command failed in the middle.

The core issue is not only failed commands. The real issue is lack of controlled failure handling. In production automation, failure is normal. What separates stable teams from chaotic ones is whether every script has a clear cleanup and rollback story.

In this guide, we’ll build a practical pattern using set -Eeuo pipefail, trap, and structured rollback actions. You can use it for app deployments, config changes, or routine maintenance jobs. The goal is simple: if one step fails, your server returns to a known-good state as automatically as possible.

Why this matters in real production

Most shell scripts are written for happy-path execution. But production is never happy-path only. You may hit:

unavailable package mirrors,
permission drift,
missing env vars,
systemd reload failures,
full disk at the wrong time,
or race conditions between concurrent jobs.

Without rollback, each failure becomes a manual incident. With rollback, failure becomes a controlled event.

A good rule: every script that changes runtime state should have at least one rollback path. Even a “small” change like swapping symlink targets should be reversible in seconds.

Prerequisites

Linux server with Bash 4+
Basic understanding of systemctl, symlink-based release layout, and permissions
Non-root deployment user with appropriate sudo rules
A staging environment to test failure paths before production

Recommended baseline:

shellcheck for linting
central logs (journalctl + app logs)
lockfile strategy to prevent concurrent deployments

Core safety baseline before trap/rollback

Start every deployment script with strict mode and predictable defaults.

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

umask 022
export LC_ALL=C

Why it matters:

-e: stop on error (with caveats in subshells/conditionals)
-u: fail on undefined variables
pipefail: pipeline fails if any segment fails
consistent IFS reduces word-splitting bugs

This baseline is mandatory, but still not enough. Strict mode can stop a script, yet it does not automatically clean partial changes. That’s where trap + rollback stack comes in.

Step 1 — Build a rollback stack pattern

A clean pattern is to register undo actions as you go. If step 4 fails, rollback executes actions from step 3, 2, 1 (LIFO order).

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

ROLLBACK_ACTIONS=()

declare -i DEPLOY_STARTED=0

action_push() {
  local action="$1"
  ROLLBACK_ACTIONS+=("$action")
}

rollback() {
  local exit_code=$?
  echo "[ERROR] deployment failed, running rollback..."

  for (( i=${#ROLLBACK_ACTIONS[@]}-1; i>=0; i-- )); do
    echo "[ROLLBACK] ${ROLLBACK_ACTIONS[$i]}"
    eval "${ROLLBACK_ACTIONS[$i]}" || true
  done

  echo "[ERROR] rollback complete (exit=${exit_code})"
  exit "$exit_code"
}

cleanup() {
  rm -f /tmp/myapp.deploy.lock || true
}

trap rollback ERR
trap cleanup EXIT

This gives you two guarantees:

On error (ERR), rollback actions run.
On script exit (EXIT), temporary artifacts are cleaned.

Important: keep rollback actions idempotent. If rollback runs twice, it should not cause new breakage.

Step 2 — Apply the pattern to a symlink-based deployment

A common release layout:

/srv/myapp/releases/<timestamp>
/srv/myapp/current symlink points to active release

When switching release, always capture previous state first.

APP_DIR="/srv/myapp"
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
NEW_RELEASE="$RELEASES_DIR/$(date +%Y%m%d%H%M%S)"

mkdir -p "$NEW_RELEASE"
action_push "rm -rf '$NEW_RELEASE'"

# Copy artifact
cp -a /tmp/artifact/. "$NEW_RELEASE/"

# Preserve previous symlink target
PREV_TARGET=""
if [[ -L "$CURRENT_LINK" ]]; then
  PREV_TARGET="$(readlink -f "$CURRENT_LINK")"
fi

if [[ -n "$PREV_TARGET" ]]; then
  action_push "ln -sfn '$PREV_TARGET' '$CURRENT_LINK'"
fi

# Activate new release
ln -sfn "$NEW_RELEASE" "$CURRENT_LINK"

action_push "systemctl restart myapp.service"
systemctl daemon-reload
systemctl restart myapp.service
systemctl is-active --quiet myapp.service

If restart or health check fails, the symlink can be restored automatically. This saves you from SSH panic mode.

Step 3 — Add health checks and hard fail gates

Rollback only helps if failure is detected quickly. Add explicit checks after each risky step.

check_http() {
  local url="$1"
  local max_retry=10

  for _ in $(seq 1 "$max_retry"); do
    if curl -fsS "$url" >/dev/null; then
      return 0
    fi
    sleep 2
  done

  return 1
}

check_http "http://127.0.0.1:8080/healthz"

Typical hard fail gates:

service active (systemctl is-active --quiet)
local health endpoint 200
migration marker exists
expected config file present
critical env var loaded

No assumptions. Validate everything you can before declaring success.

Step 4 — Prevent concurrent runs with lockfile

Two deployment runs at once can destroy consistency. Add a lock guard.

LOCK_FILE="/tmp/myapp.deploy.lock"

exec 9>"$LOCK_FILE"
if ! flock -n 9; then
  echo "[ERROR] another deployment is running"
  exit 1
fi

# rollback-safe cleanup
action_push "rm -f '$LOCK_FILE'"

For system-level scheduling, you can combine this with systemd units/timers. If you still rely on cron jobs, review this comparison: systemd timer vs cron untuk automasi linux production.

Step 5 — Make rollback observable (logs and context)

If rollback runs silently, your team still loses time during incident review. Log key context:

release ID,
failed command,
line number,
current symlink target,
service status snapshot.

on_err() {
  local ec=$?
  echo "[ERR] line=${BASH_LINENO[0]} cmd='${BASH_COMMAND}' code=$ec"
  systemctl status myapp.service --no-pager -l || true
  rollback
}

trap on_err ERR

You can also append structured logs to /var/log/myapp/deploy.log and feed them to monitoring.

Common failure scenarios and practical fixes

1) Trap not firing as expected

Cause: error occurs in contexts where -e behaves differently (subshells, conditionals, pipelines).
Fix: keep commands explicit, avoid opaque chained one-liners, and prefer direct exit checks for critical commands.

if ! systemctl restart myapp.service; then
  echo "[ERROR] restart failed"
  exit 1
fi

2) Rollback restores files but service still broken

Cause: process state, cache, or migrations mismatch persisted.
Fix: rollback should include service restart/reload and optional cache reset hooks.

3) Permissions drift after deploy

Cause: artifact extracted with wrong ownership.
Fix: enforce ownership/permissions as explicit step and rollback counterpart where needed.

chown -R myapp:myapp "$NEW_RELEASE"
find "$NEW_RELEASE" -type d -exec chmod 755 {} \;
find "$NEW_RELEASE" -type f -exec chmod 644 {} \;

4) Database migration is not reversible

Cause: destructive migration without down plan.
Fix: split deployment into phases, use backward-compatible migrations, and gate irreversible steps behind explicit approval.

5) Partial cleanup leaves orphan releases

Cause: missing retention policy.
Fix: after successful deploy, keep last N releases and clean older ones in a separate safe task.

Production checklist (copy-paste for your runbook)

Script uses set -Eeuo pipefail and deterministic IFS
Lockfile/flock prevents concurrent runs
Every state-changing step registers rollback action
Health checks are explicit and blocking
Service status validated after restart/reload
Rollback logs command + line + release ID
Staging failure simulation has been tested
Retention cleanup separated from critical deploy path

If your team is still improving shell quality, start with this baseline too: bash strict mode and safe automation checklist for linux servers and idempotent shell script: jalankan berkali-kali tanpa berantakan.

FAQ

1) Is `trap ERR` enough for production deployment scripts?

Good start, but not enough alone. You also need strict mode, explicit health checks, lockfile control, and deterministic rollback actions for each risky step.

2) Should rollback always revert database changes too?

Not always. For many production systems, DB rollback is risky. Prefer backward-compatible migrations and phased rollouts, then only do DB rollback when thoroughly tested.

3) How many rollback actions should be stored?

Only actions that restore critical state. Avoid huge rollback logic for non-critical side effects. Keep it focused, predictable, and idempotent.

4) Can I use this pattern with CI/CD pipelines?

Yes. Wrap this script in your CI job, pass release artifact path and environment variables, then fail pipeline if any post-deploy check fails.

5) What is the first improvement if my current scripts are fragile?

Start by adding strict mode + trap + one reversible symlink switch. That single improvement usually cuts recovery time drastically.

Conclusion

A deployment script is not reliable because it works once. It is reliable because it fails safely, loudly, and reversibly. With trap plus rollback stack, health gates, and lockfile control, your Linux deployments become much more predictable under pressure.

Adopt the pattern incrementally: one service, one rollback path, one tested failure scenario at a time. Within a few release cycles, your operational confidence will rise significantly.

Bash Trap and Rollback Patterns for Safer Linux Deployments

Why this matters in real production

Prerequisites

Core safety baseline before trap/rollback

Step 1 — Build a rollback stack pattern

Step 2 — Apply the pattern to a symlink-based deployment

Step 3 — Add health checks and hard fail gates

Step 4 — Prevent concurrent runs with lockfile

Step 5 — Make rollback observable (logs and context)

Common failure scenarios and practical fixes

1) Trap not firing as expected

2) Rollback restores files but service still broken

3) Permissions drift after deploy

4) Database migration is not reversible

5) Partial cleanup leaves orphan releases

Production checklist (copy-paste for your runbook)

FAQ

1) Is `trap ERR` enough for production deployment scripts?

2) Should rollback always revert database changes too?

3) How many rollback actions should be stored?

4) Can I use this pattern with CI/CD pipelines?

5) What is the first improvement if my current scripts are fragile?

Conclusion

Komentar

Tulis Komentar

Why this matters in real production

Prerequisites

Core safety baseline before trap/rollback

Step 1 — Build a rollback stack pattern

Step 2 — Apply the pattern to a symlink-based deployment

Step 3 — Add health checks and hard fail gates

Step 4 — Prevent concurrent runs with lockfile

Step 5 — Make rollback observable (logs and context)

Common failure scenarios and practical fixes

1) Trap not firing as expected

2) Rollback restores files but service still broken

3) Permissions drift after deploy

4) Database migration is not reversible

5) Partial cleanup leaves orphan releases

Production checklist (copy-paste for your runbook)

FAQ

1) Is trap ERR enough for production deployment scripts?

2) Should rollback always revert database changes too?

3) How many rollback actions should be stored?

4) Can I use this pattern with CI/CD pipelines?

5) What is the first improvement if my current scripts are fragile?

Conclusion

Komentar

Tulis Komentar

1) Is `trap ERR` enough for production deployment scripts?