SPOG Linux Patch Prerequisites
Following example is an overview of how Linux NFS patch prerequisites work.
This was initially implemented for the Columbia Business School GSB gsb6 stage web servers in INC1764380.
NOTE: Business school stage servers are named "test". Go figure.
Background
Patching schedules are defined as '1_Fri_1000' within the /patching/schedules/by_host NFS directory.
Naming scheme '1_Fri_1000' means "1 week after Patch Tuesday week on Fridays at 10 am".
Here are the schedule files which define their patch schedules as 1_Fri_1000:
cbsw6testapp01 -> ../by_time/1_Fri_1000.txt cbsw6testapp02 -> ../by_time/1_Fri_1000.txt |
In conjunction with Business School web programmers, we have developed a prerequisite script which servers must pass before patching can begin.
The CBS cluster has multiple requirements, all encoded into the prereq check:
Step | Detail | ||||
---|---|---|---|---|---|
1 | Identify the cluster name used by the F5 | ||||
2 | Then, for each node of the cluster run the following web service checks:
| ||||
3 | Finally, run an HTTPS check using the global cluster name discovered in the first check |
Configuration
Here are the files which define the patching prerequisites:
/patching/prereqs/bin/cbsw6testapp02 -> /usr/local/bin/cbs_cluster_prereq /patching/prereqs/bin/cbsw6testapp01 -> /usr/local/bin/host_prereq+cbs_cluster |
The above means that cbsw6testapp02 runs the cbs_cluster_prereq. BUT cbsw6testapp01 has a host prerequisite PLUS the cluster prerequisite.
This makes sense because when we view the host prereq conditions, cbsw6testapp01 does not patch until AFTER cbsw6testapp02 is successfully patched:
prereqs/host_prereqs/cbsw6testapp01 -> cbsw6testapp02 |
Failed Run Example
NOTE: A successful run is not shown because it simply proceeds like the first successful node patching process.
Before patching on cbsw6testapp02, the cbs_cluster_prereq must pass with exit code 0.
Here are the contents from that check:
Cluster: stage6.gsb.columbia.edu cbsw6testapp01: HTTP protocol okay cbsw6testapp01: DB access okay cbsw6testapp01: LDAP access okay cbsw6testapp02: HTTP protocol okay cbsw6testapp02: DB access okay cbsw6testapp02: LDAP access okay Cluster URL: HTTPS protocol okay |
After this, cbsw6testapp02 proceeded to patch, ending successfully:
Running yum_patch in run mode [stuff deleted] Reboot 1607462321 Success (number is Unix timestamp meaning: Tue Dec 8 16:18:41 EST 2020) End 1607462367 Success (number is Unix timestamp meaning: Tue Dec 8 16:19:27 EST 2020) |
After cbsw6testapp02 patched successfully, this cleared cbsw6testapp01 to patch.
cbsw6testapp01 is aware of this because a 'Success' status file is created on NFS.
But after cbsw6testapp02 patches successfully, the cbs_cluster_prereq MUST ALSO PASS before cbsw6testapp01 can patch.
Here are the results of that test:
cbs_cluster_prereq cannot run successfully after prereq host finished patching! Cluster: stage6.gsb.columbia.edu cbsw6testapp01: HTTP protocol okay cbsw6testapp01: DB access okay cbsw6testapp01: LDAP access okay cbsw6testapp02: HTTP protocol okay cbsw6testapp02: DB access okay cbsw6testapp02: LDAP access denied /patching/prereqs/bin/cbsw6testapp01 RC: 1 Cannot continue with patching. Please investigate. End 1607463002 Prerequisites (number is Unix timestamp meaning: Tue Dec 8 16:30:02 EST 2020) |
Because the prerequisite failed, patching did not proceed on cbsw6testapp01.
Here are the status files reflecting these states:
Dec 8 16:20 /patching/output/history/2020/20201208/Success/cbsw6testapp02 Dec 8 16:30 /patching/output/history/2020/20201208/Prerequisites/cbsw6testapp01 |