(DI-1708) Monitor

The Datavard Insights^TM Monitor is the central cockpit of Datavard Insights^TM for displaying all information stored on the central server and collected from satellite systems, which notifies occurred events by messages during monitoring.

In addition, the Datavard Insights^TM Monitor is the control center for all operations on monitored systems, such as collecting KPI and event behaviour.

To run the Monitor, use the transaction:

/DVD/MON

GUI description

The main GUI layout is divided into four basic parts.
The menu bar is on the top of the screen with function buttons.
On the left side are two tree lists:

With the systems that are monitored.
KPI groups where all the KPIs are collected.

On the upper right side is a block for time selection, which defines the time interval for displaying collected information and where the user can save specific time intervals.
The rest of the right part layout is a tab-list that is composed of the system information tab, Collector jobs, KPI charts, the KPI mixer, the messages tab, the details tab and the events tab. All these parts and tabs are described in the next sections of this manual.

_{Figure 64: Main GUI layout}

Menu buttons

The main menu of the Monitor has action buttons with a basic functionality.

_{Figure 65: Main menu buttons}

These buttons are:

Refresh monitor (F5) – update information in the tree-list of systems, their servers and in all tabs.

Add a new system for monitoring (Shift + F1) – displays a popup window for adding a new system through the defined RFC R/3 connection.

_{Figure 66}

Create new Collector job (F9) – with this button, you can create, assign a profile and start Collector job for the monitoring of system. The process of starting the monitoring is described in a separate section called "Start monitoring".

_{Figure 67}

Start report generator transaction (Ctrl + F5) - opens new transaction for generating summary report from monitoring base on some selections of KPIs, areas (system ID, server names, profiles), snapshot interval. Generator works with predefined template and allows to use own template as well (created in W3 repository - transaction SMW0). Generated HTML output is possible to compress as ZIP file and store it in application server, frontend PC or to send it via e-mail.

Maintain RFC destinations (F6) - Run the transaction SM59 for maintaining RFC destinations in the system.

Job view (Shift + F7) – Run the transaction SM27 for displaying background jobs.

Distribution list (Shift + F8) – Run the transaction SO23 for maintaining list of user emails, SAP user names, private distribution list and shared distribution list for notification of events and messages.

_{Figure 68}

System settings (Shift + F5) – maintenance of parameters for monitored systems, time intervals, groups of messages used in the messages tab, messages notifications. The maintenance is described in the separate section "System definitions maintenance".

Monitoring settings (Shift + F2) - covers all these cases:

KPI Definitions - maintenance of collectors and their setups, parameters and inputs. Continue with maintenance of KPI definitions, their association to the collector, to profiles, to events and to groups. In addition, KPI groups maintenance is here, which are used for displaying of KPI groups tree in the main GUI.
KPI collectors and detail tables list
Profile definition - maintenance of profiles and association of KPIs, events, collector's parameters, collector's setups, and collector's inputs to the profile.
Event definition - maintenance of events and setup of their association to the KPI and notification for these events.

The maintenance is described in the separate section "Monitoring settings maintenance".

In User settings, you can adjust the settings for the monitor behaviour and graph display on user's level.

System tree

In the system tree, a list of systems and their servers are displayed.
Node in this tree represents a system with a system ID and lists below this node (system) represent its servers with a server name.
For each system, there is additional information in three columns. The "Type" column informs about system type. The event column notifies event occurrence by displaying appropriate icon (meaning of icons is explained in later sections) and the "Last update" column informs when a system was monitored the last time.

_{Figure 69: Tree list of monitored systems without system groups}

The systems could be divided into system groups (folders) wherein one system can be placed in multiple groups.

_{Figure 70: Tree list of monitored systems with system groups}

Each node of type system or server can be selected and checked. For the selected node, (only one can be selected at a time), the relevant information is always displayed in the first tab "System info".
For all checked nodes, all information is displayed for the appropriate/selected KPIs, messages, events and detailed information in the other tabs.
If the node of this tree is greyed out, then this means that it is inactive. This status means that for this system and server and for the selected time interval, no information is collected. Inactive node can be selected but not checked.
Each node of system and the server has an information status icon, which informs about the current state of monitoring.
This icon may have these statuses:

- At least One Collector job is executed for the system, which indicates that a background job for this Collector job is currently running for collection of KPIs.

- At least One Collector job is due to be executed on the system, so the background job for this Collector job is released and once a free process is available, it will be executed.

- At least One Collector job failed in last run on the system, so the background job of this Collector job has been aborted or cancelled.

- System is monitored by at least one scheduled Collector job and previous execution of Collector job was carried out successfully for this system.

- System is not reachable through the RFC connection, this could because, for example, the system is down or RFC connection is not valid anymore.
- System is not monitored because no monitoring of Collector job is started.
- The server of monitored system is excluded from monitoring; therefore, the accessibility of this server is not checked.

- The system/server is not monitored due to an existing restriction in monitoring which is setup for specific time range, system, server and profile.

Groups in System Pool

It is now possible to organize your systems into groups and sub-groups, based on your preferences.

Right clicking on a group enables you to select or de-select all systems associated with the particular group or select only those systems on which some alerts occur.

Figure 71: Context menu for system pool tree (select all, deselect all, Select all in alert)

KPI tree

In the KPI tree, the list of KPIs is divided into logical groups and there are sub-groups in each group. The user may make a specific KPI available in one or more groups and this association can be done in the KPI group maintenance. The maintenance is described in a separate section in this document called "KPI definitions maintenance".

The main group is called 'All KPI'. This group contains all base groups and all KPIs, which are not associated to any Groups.

_{Figure 72: KPI tree (without / with checkboxes)}

Nodes in KPI tree have either an active or an inactive status. An inactive KPI node means that KPI values were not measured in selected time interval for checked systems and servers. Inactive KPI group node means that all KPIs are inactive in this group and their sub-groups.
An active node of a KPI can be displayed as a checkbox, when KPI mixer is used. Only checked KPIs are processed to this KPI mixer. More details about this functionality are described in separate section "KPI mix tab".

Tree Context Menu

The KPI tree provides two types of context menus when the user right click on the KPI tree node.

Folder context menu - provides functionality to select/deselect all KPIs assigned to this folder and also for its sub-folders. It is useful mainly in the KPI Mix tab, where you can select/unselect KPI nodes.
KPI context menu - provides default set of functions which can be executed over each KPI. The functionality 'Detail table(s)' is available only when any detail table exist for selected KPI. When not, this functionality is not visible. All functions are explained in next chapter.

KPI Context Menu

Suggest KPI

Suggest KPI opens an email communication where the user can send a new KPI definition suggestion to the support team or any other recipient.

KPI Info

KPI Info displays a pop-up window with basic information about selected KPI (e.g. KPI name, description, unit).

Find correlated KPI(s)

This menu item consist of sub-menu with selected systems or servers in the System tree. The content of this sub-menu depends on whether the selected KPI is defined as a System or Server KPI. It means that when the user selects a System KPI, only selected systems from System tree will be available. Otherwise there will be only available the selected servers.

After the system or server selection, the correlation functionality is executed. This functionality calculates the correlation coeficients for all relevant KPIs which have the same period of collection (e.g. 5 min) and have at least two collected values in the checking time period. Correlation coeficients have the values from the interval < -1;1 >. The closer is the coeficient to the value = abs(1) then more correlating KPI is. By default, only first twenty most correlating KPIs which have the coeficient greater then 0.7, are displayed. The list of all found correlating KPIs are displayed in the KPI Mix tab (when no correlating KPI is found then only selected KPI is displayed in KPI mixer). Also when this functionality is called from another tab, it will automatically redirects the user to the KPI Mix tab.

Set threshold

Set threshold helps the user to create a new alert definition for KPI. When this functionality is selected, pop-up window is displayed in the same screen with preddefined values. The user just have to set a new threshold value (when preddefined values don't need to be changed) and a new alert definition is created in customizing with relevant KPI assignment. All generated alerts in this way have the following naming: X<KPI_NAME(21)><(NUMBER(10)> (e.g. XCPU_LOAD5_PMAX0000000046).

Report Incident

This item just helps users to jump directly into the Incident Management.

Detail tables(s)

When any detail table exists for selected KPI, these details can be displayed in modal pop-up window. It means that the user can have on the same screen also KPI details data displayed and don't have to switch between tabs in cockpit.

Time interval

In this section of the screen, the user can set a time interval, which should change the time intervals, which will be used in the information that will display in the tabs. The user can change the 'From' – 'To' dates and choose from several options available with the button which changes the time intervals automatically.
After typing, an appropriate date and time, press enter and the system refreshes all data from the current time.

_{Figure 73: Time interval definition}

When auto refresh is switched on, then the charts will refresh in the specified time interval.
The user can save predefined intervals to a list box on the right side by clicking on the save button and providing a description. Description is automatically filled by first system which is marked in system tree and time interval from – to.

_{Figure 74: Saving of selected time interval}

User can choose another option with radio button on All checked systems in system pool tree. In this case is description also automatically filled and it is necessary to do not change description between <>, because there will be system ID.

_{Figure 75: Saving of all checked systems in system pool tree}

If the user wants to load a saved interval, the interval will be available from a list box. For deleting, select the interval and click on the delete button.

_{Figure 76: Choosing the saved time interval}

Not all collected information exceeding the retention time is deleted if it was measured in a saved interval.

System info tab

In this tab, the user should find the information about all monitored systems:

The basic information is in the "System data".
The hardware configuration of hosted machine is in the "Host data".
About the database in the "Database data".
The connection to the monitored system in the "Monitor data".

_{Figure 77: Detailed system information with list of servers}

At the bottom of this tab, there is a list of servers. This list provides an overview of servers, their names, if it is an application server or a database server, whether there is a host name given, the number of the CPU(s), RAM size, as well as locality and description. In the column 'Don't mon.', if there is a flag, then the server is excluded from monitoring.

Collector jobs tab

The Collector jobs tab displays all created Collector jobs for monitored systems and provides information regarding their statuses. From this tab, Collector job runs can be controlled and monitored.
In the list of Collector jobs, following information is displayed:

SID – determine system ID for which Collector job has been created.
Profile – which customizing is in use for the Collector job, i.e. what KPIs is being checked?
Job status – informs about status of background job of Collector job. Background job can have these statuses:
- Not running – No scheduled job.
- Scheduled – Job schedule run created for Collector job.
- Waiting – Collector job informs that a job is waiting for free work processes for running.
- Running - Collector job is currently running.
Period value / Period type – informs how often a Collector job is running.
Button – If a Collector job is scheduled, the user can click stop button is required which stops and cancels the scheduled job.
Button – If Collector job is not running, the user can click the start button to run the Collector job.
Execution – Determine if Collector job is executed immediately or scheduled on specific date and time.
Start date/Start time – date and time of initial run of the Collector job.
Retention time – specifies how old collected data will be stored by Collector job. Data older then retention time are automatically deleted with the exception data, which are collected in saved time intervals.
Background Job Name/button – name for background job of Collector job. Clicking on this button will execute transaction SE37 with displaying of this job.
Job last status/Last status icon – specifies about status of last background job run. Can have these statuses:
- Finished - specifies that the last background job of Collector job was successfully finished.
- Aborted - specifies that the last background job of Collector job was terminated by error or cancelled by user
- Not running - specifies that the background job of Collector job was not started before.
Delay[s] – specifies of any background job delay in seconds while waiting for a free work process.
Run date/Run time – specifies the date and time in which background job of Collector job is started
Duration – specifies how long the background job of Collector job was running.

Figure 78: Collector jobs tab

Last delay – specifies about last background job delay in seconds when it was waiting for free work process.
Last run date/Last run time – specifies the date and time when last background job had been started.
Last duration - specifies how long the background job of Collector job was running.
Remaining time – specifies the time in seconds in which the next run of Collector job will occur.
Next run date/Next run time – specifies the date and time of next Collector job run.
Create date/Create time – specifies the date and time of creation of background job for Collector job
Creator name – specifies who started/stopped background job for Collector job.

If the user double clicks on Collector job record in the list, then all information is displayed in detail in the bottom section of Collector job tab.
In the top bar, the menu of Collector job list is available which contains all standard buttons for working with ALV list and which allows the user to sort, filter etc. In addition, a refresh button is available along with information about last time of refresh. This button gets actual status of all background jobs.
To modify or delete a Collector job, the user can right-click on the Collector job record and from the menu that displays, it is possible to delete or modify the selected Collector job by choosing the required option.

_{Figure 79: Context menu on selected Collector job}

Deletion of Collector job will remove it from the list and will also remove collected data by this Collector job. Modify Collector job will display Collector job manager for current Collector job where definition of Collector job can by changed. The process of starting the monitoring is described in a separate section called "Start monitoring".

Only Collector job that is not running can be deleted! Deletion of Collector job will remove all collected data with exception of data collected in saved time intervals.

KPI tab

All collected KPIs of monitored systems are represented in the form of charts within this tab. In these charts, you can directly see the behaviour of monitored KPI with respect to the time.
When the tab is chosen for the first time, then all data about KPI, which is necessary for displaying of charts, are loaded. Progress of loading takes some time and it is displayed in the status bar in the bottom left corner of the screen. KPI data are loaded only once for each KPI chart. This indicates that KPI chart is displayed faster during the second time, because the system uses the cached data.

_{Figure 80: Progress of loading data}

Each chart describes one measured KPI in a selected 'time interval' for systems and/or servers that have been checked in the 'system tree'. The chart has the KPI's description at the top, X-axis represents a timeline and the Y-axis represents measured values. The unit of value is displayed after the KPI description at the top in brackets (next to the chart title). A legend is displayed by clicking on a grey square on left side in each chart. The legend describes the coloured series, which represents the checked system(s)/server(s).

By right-clicking on the KPI chart area, the KPI context menu is displayed. The user can use all available functions which were described in the KPI Context Menu chapter.

Figure 81: KPI charts

The measured KPI can be defined for a system or for a server as follows:

If only one system/server is checked, then one line will be displayed to represent the KPI.

_{Figure 82: System's KPI in a chart}

If KPIs of multiple systems/servers are displayed in one chart, then each individual system or server will be represented by a separate coloured line. In the legend, you can see which coloured series is associated with which system or server.

Figure 83: Server's KPIs in a chart

The number of charts per page can be selected by choosing a predefined layout. Possible grid layouts are 1x1, 1x2, 2x2, 3x3 and 4x4. The grid between charts is also resizable by dragging the mouse.

_{Figure 84: Grid layout choosing}

Depending on the number of KPIs, these charts are not only placed on one page but on several pages. The user can switch between pages using the and buttons found on the top left corner of the chart displayed in the KPI tab.
The KPIs, which are displayed in the grid layout, is determined by selected KPI group or specific KPI in KPI tree. Each group consists of KPIs and other subgroups. The correct page is set automatically.

Figure 85: Displaying KPI charts in grid layout by selecting KPI in KPI tree

KPI mix tab

By using the KPI mix tab, the user can see the behaviour of more KPIs together, by graphically comparing dependencies between all measured KPIs, for example, the dependence between the CPU load and work processes.
When the KPI tab is selected, the user may then check to display an active KPI in KPI tree (note that a system/server must also be selected in the system tree). Once a KPI is checked, it will be displayed as a chart of mixed KPIs and assigned to a new coloured series.
When KPI or system/server is unchecked, the appropriate series will be removed from the chart. Allowing the user to create and customize a comparison chart based on individual needs. All KPI series are portioned in the same manner similar to the KPI tab charts specified by KPI definition.

_{Figure 86: Checked KPI in KPI mixer}

You can find associations of coloured series to KPIs in the legend at the bottom of the chart. These colours of series are different from the previous KPI tab.

Messages tab

The message tab displays the status messages created by Datavard Insights, for example, during the collection and processing of KPIs from a monitored system. Each message is described by a text containing information about when it occurred, on which system or server and the type of message (in ID column). Three standard types of messages exist:

- OK / successful messages
- Information / alert messages
- Error messages

The messages displayed are dependent on the systems/servers that are checked in the system tree.
Towards the top of the Messages tab, there is a list box in which the user can filter the message group. By default, all messages are displayed and in addition there are the standard ALV operations such as sorting, filtering, aggregating etc.

_{Figure 87: Monitor Messages}

Details tab

This tab contains additional information collected from the monitored systems. All this information of the same kind is categorized and placed into one table. By clicking on a concrete table, all the information is displayed in an ALV list from the table. With this ALV list of messages, it is possible to do all standard ALV operations like sorting, filtering, aggregating and so on.

Figure 88: Detailed information from monitoring

Events tab

The Events tab displays messages during monitoring and collecting of KPIs, where an event occurs to one or more specific KPIs' values. Messages of events wherein the KPI's critical values that reach or exceed the threshold limits, which is set by the user. Listed below are some examples of such events:

Massive short dumps within a short period of time
System downtime or unavailability
System stops/restarts
A typical peaks or downs in system performance
Start/end of SLO operations (conversions or migrations)
Users working in the system during conversions and many others

Such events will be in a state of occurrence as long as KPI values are outside of the threshold limit. If events hold for a long time (also customizable), the alerts will keep informing the user that such events are occurring and allows the user to keep track and monitor the event's progress.
The configuration and maintenance of events for KPIs and related purposes are described in the separate section "Event definitions maintenance". Below is an example of an event occurrence in Datavard Insights:

_{Figure 89: Process of event occurrence}

During the process of event occurrence (X-axis: Value and Y-axis: Duration), the KPI line starts below the threshold but then begins to rise and crosses the threshold, this is when the process of detecting events starts:

When the KPI value reaches and crosses the threshold, the progress turns to "grey", which indicates that the KPI value still has a chance to return under the threshold for up to a certain timeframe and the Duration has started for Datavard Insights.
Since the KPI has remained above the threshold for the allocated time, the progress turns to "Red", an event is started, and an alert is raised.
The event continues even if the KPI value falls below the threshold "grey area" because the progress needs to remain below the threshold for a certain period. If the progress stays below the threshold for long enough, then the duration would stop. However, for this example, the progress again rises above the threshold and so the event continues.
As the event duration is still continuing and an event period has elapsed, the system raises new alert to inform the user that the event is still occurring.
During the elapse of the second event period, another (third) alert could be raised depending on how the user has configured the event. This is because, this time the progress has dropped below the threshold but the progress is grey since the progress has not remained below the threshold for a certain timeframe.
Once the line has remained below the threshold for long enough, the progress changes to green and the event duration is finished. The system raises a green alert and informs the user that the event has ended.

Through an event occurrence, the process is recorded and stored within the events tab. There, you can find an ALV list of occurred events. Each line of the list represents one event. It has information about the system and the server (if the KPI of the event is server-specific). There is also an icon that informs about event statuses as follows:

- - Success event
- - Warning event
- - Issues event
- - Error event

In the next column, there is a count of occurred event alerts. There is always the number one if an event occurs for the first time. The next columns indicate when an event has started and when it has ended (if it has ended). The last column is an event's description.

_{Figure 90: Occurred events}

With this ALV list of messages, it is possible to do all standard ALV operations like sorting, filtering, aggregating and so on.
For more detailed view on event, just double click on the line in ALV and detailed output will be displayed.

_{Figure 91: Detail output}