Version:

MarketplaceSupport

Overview

Monitoring is one of the most important aspects of maintaining the health of RadiantOne. It is highly recommended that you monitor the RadiantOne service on a regular basis. The RadiantOne service can be monitored from Environment Operations Center (for SaaS deployments only), and the Classic and Server Control Panels (for SaaS and Self-managed deployments), in addition to using third party tools.

Switching to Classic Control Panel

To switch to Classic Control Panel, use the menu options for the logged in user in the upper right.

Classic Control Panel

Accessing Server Control Panel

Once logged into the Classic Control Panel, click the Server Control Panel link in the upper right to access the Server Control Panel.

Server Control Panel


Monitoring in Environment Operations Center

In Environment Operations Center, navigate to the Monitoring section and select the environment, application, dashboard, RadiantOne node to monitor and time range.

EOC Monitoring

For more details, see Monitoring and Reporting.


Monitoring in Classic Control Panel

The key services to monitor are RadiantOne and ZooKeeper.

Expert Mode

Some settings in the Classic Control Panel are accessible only in Expert Mode. To switch to Expert Mode, click the Logged in as, (username) drop-down menu and select Expert Mode.

An image showing expert mode

Dashboard Tab

On the Dashboard Tab of the Classic Control Panel, the Overview section displays information about each RadiantOne cluster node. From here, you can see the status of all nodes. The cluster’s RadiantOne leader node is noted with a yellow-colored triangle beside the server name.

For each node, the Overview section displays the status of:

  • Current CPU usage of the machine hosting RadiantOne.
  • Current RadiantOne service memory usage.
  • Current disk usage of the drive on the machine hosting RadiantOne.
  • Disk latency
  • Up time – how long the RadiantOne service has been running on the node.
  • Version of RadiantOne installed on the node

An image showing monitoring two RadiantOne cluster nodes

Replication Monitoring Tab

RadiantOne Directory stores across multiple sites/environments support multi-master replication. This type of replication is referred to as inter-cluster replication. The state of inter-cluster replication can be monitored from the Classic Control Panel > Replication Monitoring Tab.

The Replication Monitoring tab is relevant if inter-cluster replication is deployed.

Central Journal Replication

The default, recommended replication model for RadiantOne Directory stores is based on a publish-and-subscribe methodology. When writes occur on a site, the leader node publishes the changes into a central journal. The leader nodes on all other sites pick up the changes from the central journal and update their local stores. These changes are then automatically replicated out to follower/follower-only nodes within the cluster.

If inter-cluster replication is enabled, the clusters that are participating in replication can be viewed in the Central Journal Replication section. The topology depicts the connectivity between the clusters and the cluster housing the replication journal. If a red line is visible, this indicates a connection problem between a cluster and the replication journal.

An example is shown below.

An image showing central journal replication

Column Name
Definition

HDAP Store

The root naming context that stores the entries to be replicated across all clusters.

Cluster Name

The cluster name, as defined during the RadiantOne installation.

Changes Subscribed (Applied)

The number of changes replicated to this cluster.

Changes Subscribed (Pending)

The number of changes waiting to be replicated to this cluster.

Changes Published

This value is a total of the number of applied and pending changes.

More than one store per cluster can be participating in inter-cluster replication. The table shown in the Central Journal Replication section details for each store the clusters involved in replication. Then, for each cluster, the table shows:

  • The number of changes subscribed to, are broken down into changes that have been applied and changes that are pending.
  • The number of changes published into the replication journal.

Push Mode Replication

To address a very small subset of use cases, namely where a global load balancer directs client traffic across data centers/sites, where the inter-cluster replication architecture might be too slow, you have the option to enable an additional, more real-time replication mode where changes can be pushed directly to intended targets. For example, an update made by a client to one data center might not be replicated to other data centers in time for the client to immediately read the change, if the read request it sent to a different data center than the update was. This is generally not an ideal load distribution policy when working with distributed systems. Load balancing is best deployed across multiple nodes within the same cluster on the same site/data center.

In any event, to address scenarios like this, a push replication mode can be used to send the changes directly to the intended targets. The targets must be other RadiantOne servers defined as LDAP data sources. LDAP data sources can be defined in Control Panel > Setup > Data Catalog > Data Sources.

If push mode replication is enabled, the clusters that are participating in replication can be viewed in the table in the Push Mode Replication section. The table lists, for each store, the clusters involved in replication. The source cluster, target cluster and connectivity status between them is shown.

Synchronization Tab

From the Classic Control Panel > Synchronization tab, you can select a topology and monitor the activities of the capture, transform and apply processes associated with each pipeline.

Sync Monitoring

All topologies are listed on the left. Select a topology to view the sync pipelines. For each running pipeline, a list of entries processed by the Capture, Transform and Apply components are shown. For the Transform component, you see a number queued (messages in the queue waiting to be processed) and a number processed (entries transformed).

From the Synchronization tab, you can stop the synchronization flows with Stop. Clicking stop, pauses the synchronization for all pipelines associated with the topology. Click Start to start synchronization for all pipelines. To resume synchronization for a specific pipeline, click CONFIGURE next to the pipeline, select the Apply component and click Start.

Start Sync for a Specific Pipeline

Click Configure next to a pipeline to access the queue monitoring, alert settings, and logs associated with the synchronization. In the Queue section, you can view the number of messages processed, the current queue size, and some processing statistics (rate per sec and peak processing times). You can also manage messages that failed to be applied by either deleting them or manually resending them.

Queue Monitoring

Persistent Cache Refresh (PCache Monitoring tab)

From the Classic Control Panel > PCache Monitoring tab, you can select a real-time persistent cache refresh configuration and monitor the activities of the capture and apply processes.

An image showing


Monitoring from the Server Control Panel

The items that can be monitored from the Server Control Panel are described in this section.

Click the "Server Control Panel" link at the top-right of the Classic Control Panel to launch the Server Control Panel.

CPU, Memory, Disk Space, Disk Latency and Connection Usage

From the Server Control Panel > Dashboard Tab, you can monitor the CPU, disk space and latency on the machine hosting RadiantOne, and the RadiantOne service memory and connections.

An image showing monitoring resources from the server control panel

RadiantOne Directory Status

Store status (including number of entries, index size, revision, and search and write operations) can be monitored from the Server Control Panel > Usage & Activity tab > Universal Directory Status section.

To filter the stores displayed, click on gear icon and then click Select Filters. Select the stores to display and click OK. Click OK to exit the settings.

RadiantOne Directory Stores

Data Source Status

The status of the RadiantOne service (data sources named vds and vdsha) and any backend can be monitored from the Server Control Panel > Usage & Activity Tab > Data Source Status section.

Data Source Status

The status values are on, off, offline, and unavailable. The following table describes each status.

Status
Description

On

RadiantOne can connect to the data source.

Off

The connection test to the data source failed.

Offline

The data source’s active property is set to false.

Unavailable

No classname property is defined for the data source.

If a data source has failover servers configured, the URL in the message indicates which server the status applies to.

Network Latency

The latency between RadiantOne nodes can be monitored from the Server Control Panel -> Usage & Activity tab -> Network Latency section.

An image showing monitoring network latency


Logs and Error Messages to Monitor

RadiantOne Directory Service

The most important logs to monitor for RadiantOne are vds_server.log and vds_server_access.log. The log files can be viewed and downloaded from Server Control Panel > Log Viewer. For SaaS deployments, you can view logs from the Environment Operations Center > Environments > EnvironmentName > ApplicationName > LOGS.

You can monitor these logs for the following errors.

vds_server_access.log

  • SearchResult {resultCode=53
  • VDS_Server is shutting-down [Connection to ZooKeeper lost.]
  • !!! Server Busy -- Maxmium TCP/IP connection limit(1000) reached

vds_server.log

The following errors indicate the server is shutting down or unable to accept more connections.

  • Server is shutting down: VDS stop command invoked.
  • Error in accepting an incoming connection: java.io.IOException: Too many open files

The following errors can indicate users are connecting to the server and attempting to perform operations they are not authorized to.

  • ??? Error in binding: Failed in passing security check point: IP address where the

ROOT_USER connected from is not listed.

  • SearchResult {resultCode=50

The following errors indicate memory problems.

  • java.lang.OutOfMemoryError: unable to create new native thread
  • java.lang.OutOfMemoryError: GC overhead limit exceeded
  • Root exception is java.net.SocketException: Too many open files

The following error indicates problems with ZooKeeper.

  • Server is shutting down: VDS has been configured to stop if the connection to ZK is

not writable any more.

  • VDS_Server is shutting-down [Connection to ZooKeeper lost.]
  • ConnectionStateListener: ZK connection is not writable any more.
  • ZooKeeper session has been lost. [vds_server]

Periodic Persistent Cache Refresh Threshold Failures

To monitor periodic persistent cache refresh failures due to thresholds being reached, monitor the periodiccache.log for keywords: “Threshold reached”.

For example, assume a 50% Add Threshold has been configured and during the next refresh cycle, there are 85% more entries in the backend than there are in the persistent cache image. This persistent cache refresh is aborted, and the following message can be found in the periodiccache.log.

2021 - 04 - 20T11:44:59,487 ERROR - [Storage Periodic Refresh - o=sql] Refresh failed: java.lang.Exception: Threshold reached with 85% difference for adds.

In the example above, a periodic persistent cache refresh is configured on the o=sql naming context.

ZooKeeper Logs and Error Messages to Monitor

The most important log to monitor for ZooKeeper is zookeeper.log. This log file can be viewed and downloaded from Server Control Panel > Log Viewer. The following are critical error messages to monitor in this log:

  • ERROR [ConnectionStateManager-0:ZooManager@?] - Connection lost. (local zookeeper node no longer able to contact peers in ensemble)
  • Non-optimial configuration, consider an odd number of servers.
  • Could not add appenderMBean for [null]
  • Unexpected exception causing shutdown while sock still open

Alerts

For SaaS deployments, alerts can be managed from the Environment Operations Center > Admin, ALERTS tab.

For details, see: Alerts

IN THIS PAGE