Shell Script Code Review Checklist for Linux Teams: Prevent Production Incidents
Last updated on
Target keyword: shell script code review checklist
Monthly keyword cluster: linux shell scripting, bash scripting linux, secure shell scripting, automasi tugas linux
Search intent: Best-practice / Implementation
Most Linux teams have at least one “small Bash script” that quietly became business-critical.
At first, it was just a helper for backups, deploys, or log cleanup. Then it got scheduled by cron or systemd timer. Then more people touched it. Then one day, a tiny change broke production.
This is exactly why you need a shell script code review checklist.
Not to make reviews bureaucratic. Not to slow down delivery. But to catch dangerous mistakes early—before they become 2 AM incidents.
In this guide, you’ll get a practical, production-oriented checklist your team can apply today. It is designed for real environments where scripts touch services, filesystems, secrets, and deployment flows.
Why code review for shell scripts is often ignored (and why that is risky)
Many teams review application code strictly, but shell scripts are treated as “ops glue.”
That mindset is expensive.
Shell scripts usually run with elevated privileges, execute destructive commands (rm, mv, chown, systemctl restart), and interact with production state directly. A single unquoted variable or missing guard can trigger data loss or downtime.
If your automation depends on Bash, then bash scripting linux quality standards should be as serious as backend code standards.
Quick rule: review by failure modes, not by style only
A good review should ask:
- What can fail?
- How does the script detect failure?
- How does it recover or stop safely?
- How observable is the failure?
This mindset aligns with reliability and security goals at the same time.
Shell Script Code Review Checklist (Production Edition)
Use this as your pull request template for shell-based automation.
1) Safety baseline is present
Minimum baseline:
#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
What to check:
set -Eeuo pipefailexists (or documented reason if not)- script uses predictable
IFS - script does not rely on ambiguous shell behavior
Why it matters:
Without strict mode, scripts can continue after partial failure and leave systems in inconsistent state.
Related reading: Bash Strict Mode and Safe Automation Checklist for Linux Servers
2) Variables are quoted and validated
Check for unsafe expansions:
# risky
rm -rf $TARGET_DIR/*
# safer
rm -rf "${TARGET_DIR:?}"/*
Review points:
- variable expansion is quoted (
"$var") - required variables use fail-fast checks (
${VAR:?message}) - defaults are explicit (
VAR="${VAR:-default}")
This single item prevents many incidents in linux shell scripting.
3) Destructive commands are guarded
Any operation with high blast radius needs explicit safeguards.
Commands to inspect carefully:
rm -rfmvorcpover critical pathschmod -R,chown -R- database CLI calls
- firewall/service restart operations
Ask reviewers to verify:
- target path is validated
- dry-run option exists for risky operation
- critical commands have logging before/after execution
4) Idempotency is considered
Can this script run twice safely?
If not, is that clearly documented?
Typical idempotent patterns:
- check-before-create
- conditional updates
- lockfile to avoid overlap
- state markers
Reference: Idempotent Shell Script: Jalankan Berkali-kali Tanpa Berantakan
5) Error handling is explicit and useful
Look for trap and meaningful error logs:
on_error() {
local code=$?
echo "[ERROR] line=$1 cmd='$2' exit=$code"
}
trap 'on_error "${LINENO}" "${BASH_COMMAND}"' ERR
Review questions:
- does the error output include line/command context?
- is exit code preserved?
- does the script stop on critical failures?
For deployment scripts, reviewers should also check rollback behavior.
Related: Bash Trap and Rollback Patterns for Safer Linux Deployments
6) Logging is structured for debugging
A script without clear logs is hard to support in production.
Checklist:
- timestamps included
- key decision points logged
- stderr/stdout capture strategy defined
- sensitive values are masked
Example helper:
log() {
printf '[%s] %s\n' "$(date '+%F %T')" "$*"
}
If your team runs scheduled jobs, this reduces MTTR dramatically.
7) Concurrency control exists for scheduled jobs
If script can be triggered by timer/cron/webhook, guard against overlap.
exec 9>/tmp/myjob.lock
flock -n 9 || { echo "job already running"; exit 1; }
Reviewers should ask: what happens if two executions run at the same time?
For timer strategy decisions, see: Systemd Timer vs Cron untuk Automasi Linux Production
8) Dependencies and environment are verified
Never assume binary availability or environment variables.
for cmd in curl jq awk; do
command -v "$cmd" >/dev/null || { echo "missing: $cmd"; exit 1; }
done
: "${APP_ENV:?APP_ENV is required}"
Reviewers should verify that the script fails early with a clear message when dependencies are missing.
9) Security review basics are covered
For secure shell scripting, verify:
- no hardcoded secrets
- temporary files are handled safely
- user input is sanitized
- no dangerous
evalunless unavoidable and reviewed - external downloads are verified when possible
For infrastructure-facing scripts, this is non-negotiable.
Security-related follow-up: UFW vs Fail2Ban vs SSH Hardening: Kombinasi Keamanan Server Linux
10) Portability and shell compatibility are explicit
Reviewers should confirm the target shell and runtime assumptions:
- script is Bash-specific? then use shebang
#!/usr/bin/env bash - uses GNU options? document Linux distro assumptions
- avoid relying on local machine behavior only
If portability is not required, state it clearly in comments or README.
11) Test strategy exists (even lightweight)
Not every shell script needs full test suite, but every production script needs some test path.
Minimum acceptable:
- lint check via
shellcheck - test run in staging
- one failure-path simulation
Great baseline:
shellcheck ./script.sh
bash -n ./script.sh
Review PR should include evidence that these checks were run.
12) Runbook and ownership are clear
A script is not production-ready if nobody knows how to operate it.
Reviewers should look for:
- who owns this script?
- where is the runbook?
- what are rollback steps?
- where are logs located?
This connects code quality with real incident response readiness.
Lightweight PR template you can copy
Use this template in your repo for shell-script pull requests:
- Strict mode enabled (
set -Eeuo pipefail) - Variables quoted and required vars validated
- Destructive commands guarded
- Idempotency behavior documented
- Error handling and
trapare clear - Logging includes timestamps and useful context
- Concurrency guard (
flock/equivalent) considered - Dependency checks implemented
- Security checks (secrets, input, temp files)
-
shellcheck+bash -npassed - Rollback/runbook notes included
This is practical enough for daily operations and strict enough to prevent common production failures.
Common review anti-patterns (avoid these)
“Looks fine, approved” without running lint
If reviewers skip shellcheck, easy mistakes survive.
Focusing only on formatting
Style matters, but resilience matters more. Ask failure-mode questions first.
Ignoring scheduler context
Scripts that run fine manually can still fail in cron/systemd due to PATH/env differences.
No post-incident feedback loop
After an incident, checklist should evolve. A static checklist becomes stale quickly.
Final recommendation
If your team already uses shell for deployment and operations, introduce this checklist incrementally:
- Start with strict mode + quoting + lint.
- Add concurrency and dependency checks.
- Add rollback and incident-oriented logging.
- Make it part of PR definition of done.
You don’t need “perfect scripts.” You need scripts that fail safely, are easy to debug, and are hard to misuse.
That’s exactly what a solid shell script code review checklist gives you.
FAQ
1) Is shell script code review necessary for small teams?
Yes. Small teams are usually more exposed to operational risk because one person often handles multiple roles. A checklist prevents avoidable mistakes and reduces incident stress.
2) What is the first checklist item with biggest impact?
Enable strict mode (set -Eeuo pipefail) and enforce variable quoting. This combination catches many high-risk issues early.
3) Should every script implement rollback?
Not always full rollback, but every state-changing script should at least define safe stop behavior and recovery steps.
4) Is ShellCheck enough as a review gate?
ShellCheck is essential but not sufficient. You still need logic review, failure-mode review, and runtime context checks.
5) How often should we update the checklist?
Update it after incidents, major platform changes, or every quarter. Treat it as a living operational standard.
Komentar
Memuat komentar...
Tulis Komentar