CopiaFacts provides the following means of detecting critical failures:
A number of CopiaFacts programs write to a control file (with extension .OMA) at regular intervals, typically every 15 seconds. The timestamp of this file can then be checked by the supplied OMACHECK program to ensure that it is recent: if it it found to be more than a specified time period 'out-of-date', then OMACHECK will initiate a notification or recovery action.
To maximize the effectiveness of OMACHECK, it is necessary that:
•you have multiple machine nodes which can access the CopiaFacts Application Data folders, and run OMACHECK on the one which is most likely to remain operating in a failure situation. OMACHECK will only detect a failure if it is still running and can see the OMA files.
•your machines are fully time-synchronized.
•if backup or other external operations may prevent OMACHECK from seeing the OMA files, the OMACHECK 'out-of-date' tolerance must be set to long enough to avoid false triggering.
The following CopiaFacts applications can write OMA files:
•COPIAFACTS writes an OMA file if the Operations Monitor Alert option is checked in Run-Time Options. This is not checked by default.
•FFEXTERN writes an OMA file if specified on the command line.
•JOBMON and JOBSUM write an OMA file if specified on the command line.
OMA files are always written in the FAXFACTS\LOG folder, unless this folder is overridden using $log_def.
For detection of failures in outbound operations, you can use the $kill_group command to specify an outcome code which, if repeated on a group of lines or threads, will either cause the CopiaFacts node to shut down, or take some other specified action. A shutdown is then detected because OMA file updating will cease.
As soon as a kill group is triggered, the lines in the line group are taken out of service, preventing further operations on these lines from commencing.
A configuration file variable, KILL_GROUP_ACTION can specify a filename which is to initiate the action to be taken on a kill group trigger:
•If the filename is that of a .FS file, it is copied into the TOSEND folder. This file can be used either to send an e-mail notifying the action, or to initiate a $worker_box action which starts an infobox script.
If there is no defined action, or the action fails, or if taking the lines out of service left no lines in the node operational, the node is shut down and OMA file updating will cease.
It is not possible or sensible to list in advance all the possible outcome codes which you might wish to use to trigger a kill group. These depend too much on the hardware combinations you have in use, the type of telephone line or IP port, and the connected telephone service or IP provider. In practice, most times a kill group needs to be set up, it will be in reaction to a specific failure which has revealed the need to catch the next occurrence.
Failures to write the daily DBF log file from COPIAFACTS can be caused to suppress OMA file writes by checking the Quit on Logging Errors checkbox in Run-Time Options. This is not checked by default. The shutdown will be detected because OMA file updating will cease.
Special Kill Flags
For special purposes, a number of critical errors can be set to suppress OMA file writes by adding a code to the Special Kill Flags field in Run-Time Options. These codes change from time to time and will be supplied if appropriate by Copia Support. The shutdown will be detected because OMA file updating will cease.
If your infobox scripting detects a critical error which is non-recoverable and would require a restart, you can use $next_box to transfer to a pre-specified dummy box number which will initiate COPIAFACTS shutdown. The special code (which must be a number) is specified in the configuration command $shutdown_code. The shutdown will be detected because OMA file updating will cease.
The environment variables FFEXTERN_CONSECUTIVE_ERRORS, FFEXTERN_QUEUE_ERRORS and FFEXTERN_TIMEOUTS can be used to specify the corresponding counts after which updating of the FFEXTERN OMA file will be suppressed, so that OMACHECK is triggered.
Worker Box Actions
Some sites use a worker box action to create an FS file which sends a daily e-mail to indicate that the system is running, either to an automated or human recipient. The $worker_box command provides options to set the repeat interval. The failure detection of course relies on noticing that the e-mail has not arrived.
Many network-monitoring utilities are available which can monitor running processes, files or connections.