Monitoring ServiceNav

Introduction

Goal: to implement monitoring of the ServiceNav solution itself.

This information is for administrators of the ServiceNav solution.

It describes the ServiceNav checks needing to be deployed on the hosts running the solution to ensure that they are operating correctly.

For security reasons, certain checks can only be made within the ServiceNav VPN.

This document describes the checks to be implemented for each host monitored by the solution. For each host there is a description of:

  • Which host monitors it
  • On which network: eg VPN, Internet
  • Service template name
  • Service check parameters
  • Service check frequency
  • Business impact of the check
  • If a mandatory notification policy must be implemented: the nature of such a policy is the responsibility of the solution administrator

Requirements

Certain checks are deployed at the level of each ServiceNav Box because they are undertaken by each ServiceNav Box; which thus monitors itself. However, one ServiceNav Box must monitor all the other components of the ServiceNav solution.

This specific ServiceNav Box will be referred to in this document as the monitoring appliance.

To ensure that the monitoring appliance is operational, it must be monitored via cross-checking with another ServiceNav Box.

This second ServiceNav Box is called a watchdog in this document.

The ServiceNav solution will therefore be monitored provided its infrastructure is composed of at least two ServiceNav Boxes.

It is recommended that you place the watchdog in a system/network/physical infrastructure that is completely separate from that of the ServiceNav solution. Thus, the watchdog will be able to monitor this infrastructure, including at the level of the services provided by this infrastructure,  from the Internet.

The following diagram describes a typical implementation:

ServiceNav Box

The service checks below need to be implemented for all ServiceNav Boxes.

From Template parameters Checkpoint frequency

(mins)

Mandatory
notify
Create/identify a host with IP address: 127.0.0.1 for the “coservit” community
Itself CPU Template defaults Template default
Itself LIN-DiskIO Template defaults Template default
Itself LIN-Diskspace Template defaults Template default Y
Itself LIN-Network_traffic Template defaults Template default
Itself LIN-RAM Template defaults Template default
Itself LIN-Swap Template defaults Template default
Itself check_vsb_remote_health Template defaults Template default Y
Create/identify a host with IP address = that of the ServiceNav VPN network for the “coservit” community
Monitoring Appliance Lin-Process-CPU process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 70
critical threshold: 80
5 Y
Monitoring Appliance Lin-Process-RAM process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 70
critical threshold: 80
10 Y
Monitoring Appliance Lin-Process-SWAP process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 10
critical threshold: 20
15 Y
Monitoring Appliance Lin-Process-Nb-Byname process: apache2
alert threshold: 20
critical threshold: 80
5 Y
Monitoring Appliance Lin-Process-Nb-Byname process: nagios
alert threshold: 50
critical threshold: 100
5 Y
Monitoring Appliance VSBox-IsAlive Name of the box defined on the web site 15 Y

ServiceNav Box (SNM type)

Implement the checks defined in the section above.

In addition, define the following service check:

 

From Template Arguments Checkpoint frequency

(mins)

Mandatory notify
Create/identify a host with IP address = that of the ServiceNav VPN network for the “coservit” community
Monitoring Appliance  

TCP-Port

5667  

5

Y

If the monitoring appliance also has a shared ServiceNav Box role, the service template must be deployed on the watchdog.

ServiceNav Box – monitoring appliance role

The service checks below need to be implemented:

 

From Template Arguments Checkpoint frequency

(mins)

Mandatory notify
Create/identify a host with IP address: 127.0.0.1 for the “coservit” community
Itself CPU Template defaults Template default
Itself LIN-DiskIO Template defaults Template default
Itself LIN-Diskspace Template defaults Template default Y
Itself LIN-Network_traffic Template defaults Template default
Itself LIN-RAM Template defaults Template default
Itself LIN-Swap Template defaults Template default
Itself check_vsb_remote_health Template defaults Template default Y
Create/identify a host with IP address = that of the Viadéis™ VPN Services network for the “coservit” community
Watchdog Lin-Process-CPU process: nsca,apache2,nagios,remoteOperationBox
alert threshold: 70
critical threshold: 80
5 Y
Watchdog Lin-Process-RAM process: nsca,apache2,nagios,remoteOperationBox
alert threshold: 70
critical threshold: 80
10 Y
Watchdog Lin-Process-SWAP process: nsca,apache2,nagios,remoteOperationBox
alert threshold: 10
critical threshold: 20
15 Y
Watchdog Lin-Process-Nb-Byname process: apache2
alert threshold: 20
critical threshold: 80
5 Y
Watchdog Lin-Process-Nb-Byname process: nagios
alert threshold: 50
critical threshold: 100
5 Y
Watchdog VSBox-IsAlive Name of the box defined on the web site 15 Y

ServiceNav Box – watchdog role

The service checks below need to be implemented:

From Template Arguments Checkpoint frequency

(mins)

Mandatory notify
Create/identify a host with IP address: 127.0.0.1 for the “coservit” community
Itself CPU Template defaults Template default
Itself LIN-DiskIO Template defaults Template default
Itself LIN-Diskspace Template defaults Template default Y
Itself LIN-Network_traffic Template defaults Template default
Itself LIN-RAM Template defaults Template default
Itself LIN-Swap Template defaults Template default
Itself check_vsb_remote_health Template defaults Template default Y
Create/identify a host with IP address = that of the ServiceNav VPN network for the “coservit” community
Monitoring Appliance Lin-Process-CPU process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 70
critical threshold: 80
5 Y
Monitoring Appliance Lin-Process-RAM process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 70
critical threshold: 80
10 Y
Monitoring Appliance Lin-Process-SWAP process: nsca,apache2,nagios, remoteOperationBox
alert threshold: 10
critical threshold: 20
15 Y
Monitoring Appliance Lin-Process-Nb-Byname process: apache2
alert threshold: 20
critical threshold: 80
5 Y
Monitoring Appliance Lin-Process-Nb-Byname process: nagios
alert threshold: 50
critical threshold: 100
5 Y
Watchdog VSBox-IsAlive Name of the box defined on the web site 15 Y

 

ServiceNav – Web Site

The service checks below need to be implemented:

From Template Prameters Checkpoint frequency

(mins)

Mandatory notify
Create/identify a host with IP address: that of the ServiceNav VPN network or local IP for the “coservit” community
Monitoring Appliance CPU

 

Template defaults Template default
Monitoring Appliance LIN-DiskIO

 

Template defaults Template default
Monitoring Appliance LIN-Diskspace

 

Template defaults Template default Y
Monitoring Appliance LIN-Network_traffic Template defaults Template default
Monitoring Appliance LIN-RAM Template defaults Template default
Monitoring Appliance LIN-Swap Template defaults Template default
Monitoring Appliance Lin-Process-CPU process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 10
critical threshold:
20

5 Y
Monitoring Appliance Lin-Process-CPU process: mysqld,apache2

alert threshold: 70

critical threshold: 90

5 Y
Monitoring Appliance Lin-Process-CPU process: ODS_PerfData,ODS_StatusData,VS_UpdateBoxProvider,VSB_Initialisation,VS_CommandProcessing,VS_ExternalSynchro,VS_ITDiscovery,VS_ITInventory
alert threshold: 50
critical threshold: 70
5 Y
Monitoring Appliance Lin-Process-CPU process: ndo2db
alert threshold: 70
critical threshold: 90
5 Y
Monitoring Appliance Lin-Process-CPU process: beam.smp,epmd, inet_gethost

alert threshold: 15

critical threshold: 50

5 Y
Monitoring Appliance Lin-Process-RAM process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 5

critical threshold: 10

10 Y
Monitoring Appliance Lin-Process-RAM process: ODS_PerfData,ODS_StatusData,VS_UpdateBoxProvider,VSB_Initialisation,VS_CommandProcessing,VS_ExternalSynchro,VS_ITDiscovery,VS_ITInventory
alert threshold: 50
critical threshold: 70
10 Y
Monitoring Appliance Lin-Process-RAM process: apache2

alert threshold: 5

critical threshold: 10

5 Y
Monitoring Appliance Lin-Process-RAM process: mysqld

alert threshold: 50

critical threshold: 70

5 Y
Monitoring Appliance Lin-Process-RAM process: ndo2db
alert threshold: 50
critical threshold: 70
10 Y
Monitoring Appliance Lin-Process-RAM process: beam.smp,epmd, inet_gethost

alert threshold: 10

critical threshold: 20

10 Y
Monitoring Appliance Lin-Process-Swap process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 5

critical threshold: 10

15 Y
Monitoring Appliance Lin-Process-Swap process: mysqld,apache2

alert threshold: 10

critical threshold: 20

15 Y
Monitoring Appliance Lin-Process-Swap process: ODS_PerfData,ODS_StatusData,VS_UpdateBoxProvider,VSB_Initialisation,VS_CommandProcessing,VS_ExternalSynchro,VS_ITDiscovery,VS_ITInventory
alert threshold: 50
critical threshold: 70
15 Y
Monitoring Appliance Lin-Process- Swap process: ndo2db
alert threshold: 10
critical threshold: 20
10 Y
Monitoring Appliance Lin-Process-Swap process: beam.smp,epmd, inet_gethost

alert threshold: 5

critical threshold: 10

15 Y
Monitoring Appliance Lin-Process-NB process: apache2
alert threshold: 70
critical threshold: 100
5 Y
Monitoring Appliance Lin-Process-NB process: ndo2db
alert threshold: 1000
critical threshold: 1500
5 Y
Monitoring Appliance TCP-Port Port: 80

Alert threshold: 2

Critical threshold: 4

1 Y
Monitoring Appliance TCP-Port Port: 443

Alert threshold: 2

Critical threshold: 4

1 Y
Monitoring Appliance TCP-Port Port: 9465

Alert threshold: 2

Critical threshold: 4

1 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: ODS_PerfData
“messages ready” thresholds
: 10:50 AM
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: ODS_StatusData
“messages ready” thresholds
: 10:50 AM
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: VSB_Initialisation
“messages ready” thresholds
: 2:5
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: VS_CommandProcessing
“messages ready” thresholds
: 2:5
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: VS_ITDiscovery
“messages ready” thresholds
: 10:20 AM
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: VS_ITInventory
“messages ready” thresholds
: 10:20 AM
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance check_vsp_process_health User name: supervision
Password
: supervision
Vhost
: %2f
Process: VS_ExternalSynchro
“messages ready” thresholds
: 10:20 AM
“messages processing” thresholds
: 2:5
“IDLE time” thresholds
: 300:900
10 Y
Monitoring Appliance Check_Aliveness_RabbitMQ

 

Identical to Model

 

2 Y
Create/identify a host with IP address = public IP address for the “coservit” community
Watchdog TCP-Port Port: 80

Alert threshold: 2

Critical threshold: 4

1 Y
Watchdog TCP-Port Port: 443

Alert threshold: 2

Critical threshold: 4

1 Y

ServiceNav – Business Intelligence

The service checks below need to be implemented:

From Template Arguments Checkpoint frequency

(mins)

Mandatory notify
Create/identify a host with IP address: that of the Service Nav VPN network or local IP for the “coservit” community
Monitoring Appliance CPU Template defaults Template default
Monitoring Appliance LIN-DiskIO Template defaults Template default
Monitoring Appliance LIN-Diskspace Template defaults Template default Y
Monitoring Appliance LIN-Network_traffic Template defaults Template default
Monitoring Appliance LIN-RAM Template defaults Template default
Monitoring Appliance LIN-Swap Template defaults Template default
Monitoring Appliance Lin-Process-CPU process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 10
critical threshold:
20

5 Y
Monitoring Appliance Lin-Process-CPU process: mysqld

alert threshold: 70

critical threshold: 90

5 Y
Monitoring Appliance Lin-Process-CPU process: java

alert threshold: 70

critical threshold: 90

timeperiod: 00.00 – 08.00

5
Monitoring Appliance Lin-Process-RAM process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 5

critical threshold: 10

10 Y
Monitoring Appliance Lin-Process-RAM process: mysqld

alert threshold: 50

critical threshold: 70

10 Y
Monitoring Appliance Lin-Process- RAM process: java

alert threshold: 50

critical threshold: 70

timeperiod: 00.00 – 08.00

5
Monitoring Appliance Lin-Process-Swap process: rsync,(sshd),cron,exim4,openvpn,snmpd

alert threshold: 5

critical threshold: 10

15 Y
Monitoring Appliance Lin-Process-Swap process: mysqld

alert threshold: 10

critical threshold: 20

15 Y
Monitoring Appliance Lin-Process- Swap process: java

alert threshold: 10

critical threshold: 20

timeperiod: 00.00 – 08.00

5
Monitoring Appliance VS_VBI_check_dw_vs_param: kpi_performance_scheduler User name: supervision
Password: <see server configuration>
Name of process: kpi_performance_scheduler_status
Expected status(es): SUCCESS
Validity duration of the status: 1440
720 Y
Monitoring Appliance VS_VBI_check_dw_vs_param: kpi_status_scheduler User name: supervision
Password: <see server configuration>
Name of process: kpi_status_scheduler_status
Expected status(es): SUCCESS
Validity duration of the status: 1440
720 Y

 

UK ServiceNav Product Development Manager; my priority is to be needful of the particular requirements of all ‘English-speaking’ markets where ServiceNav is sold. I have over 20 years experience of the IT monitoring field - covering a wide variety of products and technologies.