Concepts
Several types of log and state files are being kept by the HC tool:
-
check_health.sh.log
: log file of the main script itself. Provides a chronological detail of all script runs
-
hc.log
: log file containing a formatted records of HC executions and results. Standard format is as follows:
<timestamp>|<hc plugin>|<hc result or STC>|<hc details>|<fail id>
For example:
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'postfix:status=up' has correct value
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'postfix:state=running' has correct value
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'postfix:autorun=enabled' has correct value
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'ovpm:status=up' has correct value
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'ovpm:state=running' has correct value
2016-04-11 15:17:00|check_hpux_sg_package_status|0|'ovpm:autorun=enabled' has correct value
2016-04-11 15:20:01|check_hpux_ioscan|0|no problems detected by /usr/sbin/ioscan
2016-04-11 15:27:00|check_hpux_sg_cluster_status|0|'status=up' has correct value
2016-04-11 15:27:00|check_hpux_sg_cluster_status|0|'state=stable' has correct value
2016-04-11 15:42:01|check_hpux_ovpa_status|0|scopeux is running
2016-04-11 15:42:01|check_hpux_ovpa_status|0|midaemon is running
2016-04-11 15:42:01|check_hpux_ovpa_status|0|perfalarm is running
A HC result (or STC) of:
- <>0: indicates that the corresponding HC has failed (~problems)
- 0: indicates that the corresponding HC did not detect any issues.
- Event files: upon HC failure, a FAIL_ID will be generated (=timestamp). Such FAIL_ID is used to generate an event and corresponding evidence of the event. Typically this will lead to STDOUT/STDERR information gathered during the HC being saved into a separate event directory. For example:
# /var/opt/hc/events # cd 2016-04/20160417030000
# /var/opt/hc/events/2016-04/20160417030000 # ls -l
total 16
-rw-r--r-- 1 root sys 0 Apr 17 03:00 check_hpux_root_crontab.stderr.log
-rw-r--r-- 1 root sys 3247 Apr 17 03:00 check_hpux_root_crontab.stdout.log
In this example the FAIL_ID is 20160417030000 and can be used to retrace the event also in the hc.log:
# /var/opt/hc # grep "20160417030000" hc.log
2016-04-17 03:00:00|check_hpux_root_crontab|1|'/opt/ignite/bin/make_net_recovery -u -v -s igniteA' is not configured in cron|20160417030000
Events are organized in separate directories per month-year (e.g. 2016-04) to avoid cluttering of a single directory.
-
State files: some plugins may require the use of a state or intermediary file(s) to retain info between checks. Such files are placed in the location pointed to by the
STATE_DIR
setting. This location is also used for the enablement/disablement feature of HC plugins themselves. Never clean such files unless you know what you are doing
Logging control
--no-log
Logging can be ad-hoc switched off by using the --no-log
script option. This will run the health checker in a preview or dry-run mode. The actual health check(s) will be executed but no results will be logged
log_healthy
This option can be enabled via 2 ways:
- command-line option
--log-healthy
- plugin configuration parameter
log_healthy
(note that not all plugins support this option, see--list
)
The --log-health
y option will control the logging (and display) of passed health checks (aka healthy health checks). Most of the plugins will only log/show failed health checks (but this is dependent on the plugin code also).
You can combine the --log_healthy
option with the --no-log
command-line option to toggle messages being displayed but not being logged. --no-log
always takes highest precedence.
Your HC log file may grow very quickly though when passed health checks are also logged.
Leave a comment