Conventions
Keep in mind following conventions when working on the HC tool source code:
1) HC plugins are implemented as KSH functions in a stand-alone file. These functions are loaded upon demand (run-time) and must have the same name as the function itself (see man ksh: FPATH
). Since we prefer all scripts to carry the .sh
extension, a symbolic link must be made between the script name & function name.
For example:
matches to the file(s):
See also: check_health.sh –-fix-symlinks
2) Naming conventions for scripts: HC plugins should be named using following pattern: <check_<platform>/<customer>_<description>
For example:
check_aix_errpt
check_linux_fs_mounts
check_hpux_ovpa_status
The customer tag may be considered optional. If a plugin is cross-platform, then use ‘all’ as platform indicator.
3) By default only one, global namespace exists in the HC tool (main script & functions). However, variables that are limited to a function scope should defiend via typeset
and be prefixed with an underscore (_
). Though most KSH variants will not expose these variables outside the function, the underscore convention still allows variables to be safely used together with global variables of the same name (but without underscore)
HC plugin code
Global variables
Following global variables may be handy to remember or use:
$SCRIPT_NAME
: self-explanatory$SCRIPT_DIR
: self-explanatory$HOST_NAME
: self-explanatory$OS_NAME
: self-explanatory$LOG_DIR
: location of the log & state files$STATE_DIR
: location of the state files (=contains information that needs to be kept between consecutive runs of the same health checks)$STATE_PERM_DIR
: location of the permanent state files$STATE_TEMP_DIR
: location of the temporary state files$HC_STDOUT_LOG
: location of the event log file for STDOUT for the HC plugin$HC_STDERR_LOG
: location of the event log file for STDERR for the HC plugin$ARG_DEBUG
: debug flag (0 is off)$ARG_DEBUG_LEVEL
: debug level (0,1,2)$IS_PDKSH
: flag voor pdksh/mksh
Command-line options/parameters
Following list contains the list of command-line options/parameters of check_health.sh:
Header
Any HC plugin/function should start with the following lines:
For example:
The _CONFIG_FILE
directive should only be specified if the HC plugin requires the use of a configuration file
Reading values from the plugin configuration file
Use constructs with the data_get_lvalue_from_config()
function if possible, e.g.:
Syntax of the data_get_lvalue_from_config()
function:
Logging events
Feeding events back to the HC (main) script to allow logging, alerting or reporting should be done via the log_hc()
function, e.g.:
Syntax of the log_hc()
function:
Log healthy
Insert this into your plugin code (or copy/paste from an existing plugin):
When using a configuration file for your plugin
When NOT using a configuration file for your plugin
Incorporating fix/healing logic
The core idea of the Health Checker framework is to always work in read-only mode, i.e. limiting its functionality to only checking or verifying application/system status. However, at times it may be appropriate to add logic to plugin code that can also fix (auto-healing) problems. For this reason the --no-fix
command-line parameter + global variable ARG_NO_FIX
and the _HC_CAN_FIX
plugin variable was added (as of release 20190629).
If your code contains auto-healing logic then set:
so that the --list
option will show that the plugin has such ability. The value of ARG_NO_FIX
can be used to manipulate whether the script should actually do the healing or not (dry-run mode). Running with the --no-log
option automatically also implies that ARG_NO_FIX
is set.
Show usage (and other internal plugin functions)
If you wish to write functions (or subroutines) that are specific to certain plugin then you can add them as separate functions in the same plugin file. The name of these functions should start with an underscore to indicate that it is a plugin internal routine ( for example: _show_usage
)
Do add an _show_usage
function to each HC plugin with a short description of what the plugin does. Example:
About exit/return codes
Some conventions:
die()
: custom function in HC frameworkexit()
: refers to the standard shell functionreturn()
: refers to the standard shell function
Following rules apply:
-
When the main script and the plugin code does not encounter a flagged (script) error OR failed health check: RC = 0
-
When the main script encounters a flagged (script) error and consequently ends via
die()
orexit()
: RC = 1 or value of $EXIT_CODE -
When the main script ends without (script) error AND the plugin code encounters a flagged (script) error and consequently ends via
return():
RC = value of return() -
When both main script and plugin code end without flagged (script) error AND the plugin code encounters a failed health check: RC = 0
- When both main script and plugin code end without flagged (script) error AND the plugin code encounters a failed health check AND option
--flip-rc
is used:- when
--with-rc == Count
or not used: RC = count of STC>0 (max 255) - when
--with-rc == Max
or not used: RC = Max of all STC values (max 255) - when
--with-rc == Sum
or not used: RC = Sum of all STC values (max 255)
- when
- When the option
--no-log
is used : RC = 0
Also note that:
- plugins should never use
die()
orexit()
but always uselog_hc()
orreturn()
die()
should only be used in the main script and/or core pluginsexit()
should only be used in the main script- do not rely on the RC when running the health checker with multiple plugins enabled. The RC will then only depend on the processing of the last called plugin. There is no good or graceful way to deal with RCs from multiple HC plugins.
Useful (global) auxiliary functions
You can look in the source code of the include_data.sh
h & include_os.sh
scripts to find useful helper functions that can be used anywhere in the health checker code.
Leave a comment