Overview
Monitoring is one of the most important aspects of maintaining the health of RadiantOne. It is highly recommended that you monitor the RadiantOne service on a regular basis. The RadiantOne service can be monitored from Environment Operations Center (for SaaS deployments only), and the Classic and Server Control Panels (for SaaS and Self-managed deployments), in addition to using third party tools.
Switching to Classic Control Panel
To switch to Classic Control Panel, use the menu options for the logged in user in the upper right.
Accessing Server Control Panel
Once logged into the Classic Control Panel, click the Server Control Panel link in the upper right to access the Server Control Panel.
Monitoring in Environment Operations Center
- Only SaaS deployments have access to Environment Operations Center. Self-managed deployments of RadiantOne do not.
In Environment Operations Center, navigate to the Monitoring section and select the environment, application, dashboard, RadiantOne node to monitor and time range.
For more details, see Monitoring and Reporting.
Monitoring in Classic Control Panel
The key services to monitor are RadiantOne and ZooKeeper.
Expert Mode
Some settings in the Classic Control Panel are accessible only in Expert Mode. To switch to Expert Mode, click the Logged in as, (username) drop-down menu and select Expert Mode.
Dashboard Tab
On the Dashboard Tab of the Classic Control Panel, the Overview section displays information about each RadiantOne cluster node. From here, you can see the status of all nodes. The cluster’s RadiantOne leader node is noted with a yellow-colored triangle beside the server name.
For each node, the Overview section displays the status of:
- Current CPU usage of the machine hosting RadiantOne.
- Current RadiantOne service memory usage.
- Current disk usage of the drive on the machine hosting RadiantOne.
- Disk latency
- Up time – how long the RadiantOne service has been running on the node.
- Version of RadiantOne installed on the node
Replication Monitoring Tab
RadiantOne Directory stores across multiple sites/environments support multi-master replication. This type of replication is referred to as inter-cluster replication. The state of inter-cluster replication can be monitored from the Classic Control Panel > Replication Monitoring Tab.
The Replication Monitoring tab is relevant if inter-cluster replication is deployed.
Central Journal Replication
The default, recommended replication model for RadiantOne Directory stores is based on a publish-and-subscribe methodology. When writes occur on a site, the leader node publishes the changes into a central journal. The leader nodes on all other sites pick up the changes from the central journal and update their local stores. These changes are then automatically replicated out to follower/follower-only nodes within the cluster.
If inter-cluster replication is enabled, the clusters that are participating in replication can be viewed in the Central Journal Replication section. The topology depicts the connectivity between the clusters and the cluster housing the replication journal. If a red line is visible, this indicates a connection problem between a cluster and the replication journal.
An example is shown below.
Column Name | Definition |
---|---|
HDAP Store | The root naming context that stores the entries to be replicated across all clusters. |
Cluster Name | The cluster name, as defined during the RadiantOne installation. |
Changes Subscribed (Applied) | The number of changes replicated to this cluster. |
Changes Subscribed (Pending) | The number of changes waiting to be replicated to this cluster. |
Changes Published | This value is a total of the number of applied and pending changes. |
More than one store per cluster can be participating in inter-cluster replication. The table shown in the Central Journal Replication section details for each store the clusters involved in replication. Then, for each cluster, the table shows:
- The number of changes subscribed to, are broken down into changes that have been applied and changes that are pending.
- The number of changes published into the replication journal.
Push Mode Replication
To address a very small subset of use cases, namely where a global load balancer directs client traffic across data centers/sites, where the inter-cluster replication architecture might be too slow, you have the option to enable an additional, more real-time replication mode where changes can be pushed directly to intended targets. For example, an update made by a client to one data center might not be replicated to other data centers in time for the client to immediately read the change, if the read request it sent to a different data center than the update was. This is generally not an ideal load distribution policy when working with distributed systems. Load balancing is best deployed across multiple nodes within the same cluster on the same site/data center.
In any event, to address scenarios like this, a push replication mode can be used to send the changes directly to the intended targets. The targets must be other RadiantOne servers defined as LDAP data sources. LDAP data sources can be defined in Control Panel > Setup > Data Catalog > Data Sources.
If push mode replication is enabled, the clusters that are participating in replication can be viewed in the table in the Push Mode Replication section. The table lists, for each store, the clusters involved in replication. The source cluster, target cluster and connectivity status between them is shown.
Synchronization Tab
From the Classic Control Panel > Synchronization tab, you can select a topology and monitor the activities of the capture, transform and apply processes associated with each pipeline.
All topologies are listed on the left. Select a topology to view the sync pipelines. For each running pipeline, a list of entries processed by the Capture, Transform and Apply components are shown. For the Transform component, you see a number queued (messages in the queue waiting to be processed) and a number processed (entries transformed).
From the Synchronization tab, you can stop the synchronization flows with Stop. Clicking stop, pauses the synchronization for all pipelines associated with the topology. Click Start to start synchronization for all pipelines. To resume synchronization for a specific pipeline, click CONFIGURE next to the pipeline, select the Apply component and click Start.
Click Configure next to a pipeline to access the queue monitoring, alert settings, and logs associated with the synchronization. In the Queue section, you can view the number of messages processed, the current queue size, and some processing statistics (rate per sec and peak processing times). You can also manage messages that failed to be applied by either deleting them or manually resending them.
Persistent Cache Refresh (PCache Monitoring tab)
From the Classic Control Panel > PCache Monitoring tab, you can select a real-time persistent cache refresh configuration and monitor the activities of the capture and apply processes.
Monitoring from the Server Control Panel
The items that can be monitored from the Server Control Panel are described in this section.
Click the "Server Control Panel" link at the top-right of the Classic Control Panel to launch the Server Control Panel.
CPU, Memory, Disk Space, Disk Latency and Connection Usage
From the Server Control Panel > Dashboard Tab, you can monitor the CPU, disk space and latency on the machine hosting RadiantOne, and the RadiantOne service memory and connections.
To use this feature, enable the cluster monitor at Classic Control Panel > Settings > Logs > Clustermonitor.
RadiantOne Directory Status
Store status (including number of entries, index size, revision, and search and write operations) can be monitored from the Server Control Panel > Usage & Activity tab > Universal Directory Status section.
To filter the stores displayed, click on gear icon and then click Select Filters. Select the stores to display and click OK. Click OK to exit the settings.
Data Source Status
The status of the RadiantOne service (data sources named vds and vdsha) and any backend can be monitored from the Server Control Panel > Usage & Activity Tab > Data Source Status section.
The status values are on, off, offline, and unavailable. The following table describes each status.
Status | Description |
---|---|
On | RadiantOne can connect to the data source. |
Off | The connection test to the data source failed. |
Offline | The data source’s active property is set to false. |
Unavailable | No classname property is defined for the data source. |
If a data source has failover servers configured, the URL in the message indicates which server the status applies to.
Network Latency
The latency between RadiantOne nodes can be monitored from the Server Control Panel -> Usage & Activity tab -> Network Latency section.
Logs and Error Messages to Monitor
RadiantOne Directory Service
The most important logs to monitor for RadiantOne are vds_server.log and vds_server_access.log. The log files can be viewed and downloaded from Server Control Panel > Log Viewer. For SaaS deployments, you can view logs from the Environment Operations Center > Environments > EnvironmentName > ApplicationName > LOGS.
You can monitor these logs for the following errors.
vds_server_access.log
- SearchResult {resultCode=53
- VDS_Server is shutting-down [Connection to ZooKeeper lost.]
- !!! Server Busy -- Maxmium TCP/IP connection limit(1000) reached
vds_server.log
The following errors indicate the server is shutting down or unable to accept more connections.
- Server is shutting down: VDS stop command invoked.
- Error in accepting an incoming connection: java.io.IOException: Too many open files
The following errors can indicate users are connecting to the server and attempting to perform operations they are not authorized to.
- ??? Error in binding: Failed in passing security check point: IP address where the
ROOT_USER connected from is not listed.
- SearchResult {resultCode=50
The following errors indicate memory problems.
- java.lang.OutOfMemoryError: unable to create new native thread
- java.lang.OutOfMemoryError: GC overhead limit exceeded
- Root exception is java.net.SocketException: Too many open files
The following error indicates problems with ZooKeeper.
- Server is shutting down: VDS has been configured to stop if the connection to ZK is
not writable any more.
- VDS_Server is shutting-down [Connection to ZooKeeper lost.]
- ConnectionStateListener: ZK connection is not writable any more.
- ZooKeeper session has been lost. [vds_server]
Periodic Persistent Cache Refresh Threshold Failures
To monitor periodic persistent cache refresh failures due to thresholds being reached, monitor the periodiccache.log for keywords: “Threshold reached”.
For example, assume a 50% Add Threshold has been configured and during the next refresh cycle, there are 85% more entries in the backend than there are in the persistent cache image. This persistent cache refresh is aborted, and the following message can be found in the periodiccache.log.
2021 - 04 - 20T11:44:59,487 ERROR - [Storage Periodic Refresh - o=sql] Refresh failed: java.lang.Exception: Threshold reached with 85% difference for adds.
In the example above, a periodic persistent cache refresh is configured on the o=sql naming context.
ZooKeeper Logs and Error Messages to Monitor
The most important log to monitor for ZooKeeper is zookeeper.log. This log file can be viewed and downloaded from Server Control Panel > Log Viewer. The following are critical error messages to monitor in this log:
- ERROR [ConnectionStateManager-0:ZooManager@?] - Connection lost. (local zookeeper node no longer able to contact peers in ensemble)
- Non-optimial configuration, consider an odd number of servers.
- Could not add appenderMBean for [null]
- Unexpected exception causing shutdown while sock still open
Alerts
For SaaS deployments, alerts can be managed from the Environment Operations Center. For details on configuring notification channel integrations for alerts, see: Alerts
There are also default standard alerts that can be managed from the Classic Control Panel > Settings > Monitoring > Standard Alerts. These alerts are also supported for Self-managed deployments. For details see: Alerts Configured in Classic Control Panel