A healthcare logistics client we support runs a mixed fleet of about 45 Linux servers — CentOS and RHEL, mostly. Their infrastructure team had been managing those machines by hand for years. SSH in, make a change, move on. When we got involved during a quarterly review, three of their application servers were running an outdated version of httpd, two had the firewall service disabled, and nobody could explain why. The incident that followed — a two-hour service interruption during a shift change — traced back to one thing: no automation. Nobody had written any ansible playbooks. Everything was manual, and manual processes drift.
This post walks through what we put in place to fix it. It is not an exhaustive Ansible guide — the Ansible documentation covers that territory better than I ever could. What this is: a practical walkthrough for writing your first YAML playbook, grounded in a real client scenario where it actually mattered.
The Incident
The initial alert came in on a Tuesday afternoon. One of their batch processing services failed to start after a routine OS patch window. The patch had been applied manually to 12 nodes in sequence, and somewhere around node 9, the engineer forgot to re-enable a service after the restart. The service stayed down. Nobody noticed for about 90 minutes because verifying service state meant SSHing into each box individually.
When we sat down for the post-mortem, the real issue surfaced fast. There was no single source of truth for server state. Nobody could confidently answer “what should be running on these nodes?” without logging into each one and checking. That is not a monitoring problem. That is an automation problem.
Root Cause Analysis
The honest root cause was straightforward: the team had been operating reactively for so long that proactive configuration management never got prioritized. Each engineer had their own approach. One used bash scripts stored in their home directory on the control server. Another kept notes in Confluence. When we audited the 45 nodes, we found four different versions of the same config file across machines that were supposed to be identical.
Configuration drift is one of those problems that feels manageable right up until it causes an outage. The fix is not complicated — but it does require committing to a tool and a workflow. We recommended Ansible, and the reasoning was simple: it is agentless. There is nothing to install on the managed nodes. It connects over SSH, runs what it needs to run, and disconnects. For this client, whose change control process made installing agents on production systems a multi-week approval cycle, that was the deciding factor.
Prerequisites Before You Write Anything
A few things you need before starting. You should be comfortable with basic Linux operations — file management, editing files, navigating the filesystem. You do not need to be a Linux subject matter expert, but you need to be functional. You also need to understand YAML syntax, which is genuinely one of the easier formats to pick up. And you need Ansible installed on your control node — the hardware requirements are minimal. The official documentation notes that a machine with 512 MB memory and a single vCPU is sufficient for the control node. We ran theirs on a small management VM that was already sitting underutilized in the environment.
One thing worth clarifying: Ansible is built on Python, but you do not need to know Python to write playbooks. I mention this because engineers sometimes see that and assume there is a programming prerequisite. There is not. YAML handles the playbook structure, and Ansible’s modules abstract everything underneath.
What an Ansible Playbook Actually Is
An Ansible playbook is a YAML file that defines what you want to happen on which machines. The file contains one or more plays. Each play targets a set of hosts from your inventory, and each play contains a list of tasks. Each task calls an Ansible module — a pre-built unit of automation that handles something specific: installing a package, managing a service, copying a file, modifying a config.
The playbook gets parsed on the control node and then executed against the target machines. Run it once and Ansible handles the SSH connections, task execution, and reporting. Run it again on machines already in the correct state and Ansible makes no changes. That is idempotency — playbooks are designed to reach a target state, not to blindly re-run commands. The goal is efficiency: only make changes when changes actually need to be made.
YAML: The One Thing You Have to Get Right
YAML is human-readable and relatively forgiving — except for one thing: indentation. YAML uses spaces, not tabs. This trips up almost everyone on their first playbook. A tab and four spaces might look identical on screen, but in YAML they are not the same, and the file will fail to parse with an error that looks cryptic until you know what to look for.
Use an editor with YAML support. Visual Studio Code, Vim with a YAML plugin, and Eclipse all work well. The syntax highlighting and indentation helpers catch errors before you run anything. This is one of those small setup decisions that saves an hour of debugging later. The Ansible project itself recommends this approach for anyone writing playbooks for the first time.
Writing the First Playbook
Here is the playbook we used to bring the client’s intranet servers back to a known state. This covers the two tasks that caused the incident: ensuring httpd is installed, and ensuring the service is running and enabled on boot.
---
- name: Enable Intranet Services
hosts: webservers
become: true
tasks:
- name: Install httpd package
yum:
name: httpd
state: present
- name: Start and enable httpd service
service:
name: httpd
state: started
enabled: true
Walk through the structure. The --- at the top marks the start of a YAML document. The - name: line begins a play — this is a label for humans, not a technical requirement, but you should always include it for readability. hosts: webservers tells Ansible which group from your inventory to target. become: true means Ansible will escalate privileges (sudo) on the target machines.
Under tasks, each task has a name and a module call. The first task uses the yum module to ensure httpd is installed. state: present means “make sure this package exists” — if it is already installed, Ansible does nothing. The second task uses the service module to start httpd and mark it to start on boot. That is a complete, runnable playbook. The tasks are identified by their indentation level in the YAML structure, which is how Ansible understands what belongs to what.
Multiple Plays in One File
One concept that clicked quickly for the client’s team: you can stack multiple plays inside a single playbook file, each targeting a different host group. Here is a stripped-down example covering web nodes and database nodes in the same run:
---
- name: Configure Web Nodes
hosts: webservers
become: true
tasks:
- name: Install httpd
yum:
name: httpd
state: present
- name: Configure Database Nodes
hosts: dbservers
become: true
tasks:
- name: Install mariadb-server
yum:
name: mariadb-server
state: present
Same file, two plays, two host groups, two different jobs. Ansible runs them in sequence. If you need to manage your entire infrastructure topology from a single automation run, this is the structure you want. The plays and their tasks are all identified by indentation in the YAML, so consistent spacing across the file matters.
Running It
Once the playbook is written, you run it with:
ansible-playbook -i inventory.ini site.yml
The -i flag points to your inventory file, where your host groups and hostnames live. Ansible outputs each play and task as it executes: ok means no change was needed, changed means something was modified, failed means something went wrong. Before running a new playbook against production for the first time, add --check to do a dry run. It shows you what Ansible would do without actually doing it. We made this a standard step in the client’s runbook before any new playbook touched production nodes.
The Fix and What Changed
For the client, the immediate remediation took about four minutes. We ran the playbook across all 45 nodes, everything came back into compliance, and the service state mismatch that caused the two-hour incident would have been a 10-second fix if this had been in place beforehand.
The longer-term change was process. We helped them integrate playbook runs into their patch management workflow — every maintenance window now ends with an Ansible run that validates expected service state across the fleet. Playbooks live in a Git repository, which gives them version history and a review step before anything runs in production. If you are thinking about how this fits into a broader IT consulting and digital transformation effort, that is the right question to ask early. The tooling is straightforward. The process change around it is where the work actually is.
For teams managing Windows-heavy environments in parallel, some of the same automation discipline applies — we have covered related ground in automating network captures with NetEventSession and tuning concurrent operations with ThrottleLimit, which gives you a sense of how cross-platform automation thinking tends to develop at scale.
Lessons Learned
Idempotency is the feature you will appreciate most. Ansible playbooks do not re-apply changes that are already in place. That means you can run the same playbook nightly as a state verification mechanism. If everything is correct, nothing happens. If something drifted, it gets fixed automatically. That is a fundamentally different posture than reactive incident response.
Start with a narrow scope. The client wanted us to automate everything immediately. We pushed back. We started with five tasks across two host groups, got comfortable with the workflow, validated the behavior in a staging environment, then expanded. Writing playbooks for things you understand beats writing playbooks for things you are still figuring out.
YAML indentation errors will cost you time. Use an editor that shows indentation clearly. Test new playbooks with --check before running against production. No exceptions.
Ansible is not the right tool for every job. If your environment is heavily container-based, Kubernetes has its own configuration model that Ansible can complement but does not replace. Ansible works well for infrastructure configuration and service management on Linux nodes. That is where it shines. Be honest about the scope of what you are trying to solve before you start writing playbooks for everything.
If you are working through an automation project and want a second set of eyes on the approach, reach out directly. This is the kind of engagement we do regularly, and the first conversation is usually the most useful one.
The playbook above is a starting point, not a finished product. The barrier to writing your first working Ansible playbook is lower than most engineers expect. Get it running on a small host group. Understand what the output is telling you. Then build from there.


