(DI-2311) Monitoring job was cancelled due to the exceeding the maximum runtime threshold

If any executed monitoring background job (job with the pattern “MON_<SYSTEMID>_<PROFILE>”) was cancelled in the transaction SM37, the collected KPI values and details are not reported as expected. The main reason why this job can be cancelled is that it reaches the maximum runtime threshold for the execution of the monitoring background jobs (by default, the maximum threshold is 600 seconds). If a specific monitoring job runs over this maximum runtime threshold value, the job is automatically cancelled by the monitoring itself.

Monitoring background job is sometimes also called collector job.

There are some recommended options how the failures of the specific monitoring job can be resolved.

Increase the maximum runtime of the monitoring background jobs

The default value of the threshold for the maximum runtime execution of the monitoring background jobs is 600 seconds. This default value can be increased into any suitable value (for example, 1800 seconds). This adjustment can be done by changing the value of the configuration property with technical name AGENT_MAX_RUNTIME available in the list of all configuration properties. Additionally, we highly recommend to create a new dependent configuration property where you only change the value for the specific monitored system ID and profile that are causing this monitoring job failure. Otherwise, this change will be applied for all existing and scheduled monitoring jobs.

The main disadvantage of this adjustment is that collected KPI values and details are reported only when this monitoring job is finished. In the case that the job is periodically long running, the data can be collected every 20 minutes instead of every 5 minutes as it was originally intended and scheduled.

Only one monitoring background job for the specific system and profile can be running at the same time. If the monitoring job is still running at the time when the next job should be started (so running more than scheduled monitoring time period, for example 5 minutes), the next monitoring job is finished but without any data collection as data is still collected from the previous run. If this scenario happens, you can also see the error message “Collector job is already running for system &1 and profile &2!” in the monitoring messages.

Exclude a long-running collector from the specific monitoring job

The most common problem of the long-running monitoring job is that one specific collector executed within this job to collect data is long-running (for example, due to the processing too many data or high time-consuming database operations). For this reason, it’s better to exclude this specific problematic collector from this monitoring job to be able to collect other predefined KPI values and details within this job. To exclude a long-running collector from the specific monitoring job, you can follow these steps:

  1. Display the content of the table /DVD/MON_COLRUN (for example, via the transaction SE16).

    1. (Optional) Specify the valid time range for the executed monitoring jobs.

    2. Define the System ID of the monitored system.

    3. Define the profile name of the monitoring job.

    4. (Optional) Specify the condition to display only records with an initial value for the field “END_TIME”.

  2. Use filter to only display the not finished collector(s), if the condition not specified in the previous step 1.d.

    1. Not finished/ended collectors have the initial value for the field “END_TIME”.

  3. Get the list of the not finished collectors and check their assigned KPIs.

  4. Exclude the identified KPI names from the monitoring profile as described on the page How to exclude default KPI(s) from default monitoring.

  5. (Optional) If you still want to monitor the excluded KPIs, you can schedule the Monitoring of these KPIs with a custom monitoring profile and increased period.

    1. In this scenario, we recommend to increase the maximum runtime of the background collector jobs for this custom monitoring profile.