Monitoring is one of the most important aspects of maintaining the health of RadiantOne. It is highly recommended that you monitor the RadiantOne components on a regular basis using the methods and utilities discussed in this guide.
The RadiantOne components can be monitored from both the Main and Server Control Panels and command line scripts, in addition to using third party tools.
The key services to monitor are RadiantOne and ZooKeeper. There are default monitoring and alerts for these services. For RadiantOne, see RadiantOne Availability. For ZooKeeper, see ZooKeeper Write Failure.
Some settings in the Main Control Panel are accessible only in Expert Mode. To switch to Expert Mode, click the Logged in as, (username) drop-down menu and select Expert Mode.
The Main Control Panel saves the last mode (Expert or Standard) it was in when you log out and returns to this mode automatically when you log back in. The mode is saved on a per-role basis.
Monitoring from the Main Control Panel
The Main Control Panel provides a view of RadiantOne nodes deployed. When RadiantOne is deployed in a cluster, the state of services running on all nodes is visible in the Overview section of the Dashboard tab. In the Intranode Health section you can see the connectivity across all nodes.
The Replication Monitoring tab is relevant if inter-cluster replication is deployed.
On the Dashboard Tab of the Main Control Panel, the Overview section displays information about each cluster node. From here, you can see the status of all nodes. The cluster’s RadiantOne leader node is noted with a yellow-colored triangle beside the server name.
For each node, the Overview section displays the status of:
- Current CPU usage of the machine hosting RadiantOne.
- Current RadiantOne service memory usage.
- Current disk usage of the drive on the machine hosting RadiantOne.
- Disk latency
- Up time – how long the RadiantOne service has been running on the node.
- Version of RadiantOne installed on the node
Figure 1.1 : Example Monitoring of a RadiantOne Cluster Containing Two Nodes
The Internode Health displays a topology of all nodes in the cluster and information about the connectivity between the nodes. If you hover the mouse pointer over a node, the direction of connectivity is conditioned by this node and more details are shown. By default, this includes the availability of the LDAP and LDAPS ports for the RadiantOne service, and the ability to read and write to ZooKeeper on the target node. To toggle information about ZooKeeper or LDAP connectivity, check/uncheck the corresponding box in the upper-left corner of the Internode Health section.
Figure 1.2 : Internode Health Monitoring
A green checkmark means connectivity on the RadiantOne LDAP and/or LDAPS ports is fine and ZooKeeper can be read from and written to.
A red X means there is no connectivity on the LDAP and/or LDAPS ports for RadiantOne and/or ZooKeeper cannot be read from or written to.
Replication Monitoring Tab
RadiantOne Directory stores across multiple sites/data centers support multi-master replication. This type of replication is referred to as inter-cluster replication. The state of inter-cluster replication can be monitored from the Replication Monitoring Tab.
Central Journal Replication
The default, recommended replication model for RadiantOne Universal Directory stores is based on a publish-and-subscribe methodology. When writes occur on a site, the leader node publishes the changes into a central journal. The leader nodes on all other sites pick up the changes from the central journal and update their local stores. These changes are then automatically replicated out to follower/follower-only nodes within the cluster. For more details on inter-cluster replication, please see the RadiantOne Deployment and Tuning Guide.
If inter-cluster replication is enabled, the clusters that are participating in replication can be viewed in the Central Journal Replication section. The topology depicts the connectivity between the clusters and the cluster housing the replication journal. If a red line is visible, this indicates a connection problem between a cluster and the replication journal.
An example is shown below.
The root naming context that stores the entries to be replicated across all clusters.
The cluster name, as defined during the RadiantOne installation.
Changes Subscribed (Applied)
The number of changes replicated to this cluster.
Changes Subscribed (Pending)
The number of changes waiting to be replicated to this cluster.
This value is a total of the number of applied and pending changes.
More than one store per cluster can be participating in inter-cluster replication. The table shown in the Central Journal Replication section details for each store the clusters involved in replication. Then, for each cluster, the table shows:
- The number of changes subscribed to, are broken down into changes that have been applied and changes that are pending.
- The number of changes published into the replication journal.
Push Mode Replication
To address a very small subset of use cases, namely where a global load balancer directs client traffic across data centers/sites, where the inter-cluster replication architecture might be too slow, you have the option to enable an additional, more real-time replication mode where changes can be pushed directly to intended targets. For example, an update made by a client to one data center might not be replicated to other data centers in time for the client to immediately read the change, if the read request it sent to a different data center than the update was. This is generally not an ideal load distribution policy when working with distributed systems. Load balancing is best deployed across multiple nodes within the same cluster on the same site/data center.
In any event, to address scenarios like this, a push replication mode can be used to send the changes directly to the intended targets. The targets must be other RadiantOne servers defined as LDAP data sources. For more details on Push Mode Replication, please see the RadiantOne Deployment and Tuning Guide.
If push mode replication is enabled, the clusters that are participating in replication can be viewed in the table in the Push Mode Replication section. The table lists, for each store, the clusters involved in replication. The source cluster, target cluster and connectivity status between them is shown.
Global Sync Tab
From the Main Control Panel -> Global Sync tab, you can select a topology and monitor the activities of the capture, transform and apply processes associated with each pipeline.
Figure 1.35 : Global Sync Monitoring
All topologies are listed on the left. Select a topology to view the sync pipelines. For each running pipeline, a list of entries processed by the Capture, Transform and Apply components are shown. For the Transform component, you see a number queued (messages in the queue waiting to be processed) and a number processed (entries transformed).
From the Global Sync tab, you can stop the synchronization flows with Stop. Clicking stop, pauses the synchronization for all pipelines associated with the topology. Click Start to start synchronization for all pipelines. To resume synchronization for a specific pipeline, click CONFIGURE next to the apply, select the Apply component and click Start.
Figure 1.36 : Resume Synchronization for a Specific Pipeline
Click Configure next to a pipeline to access the queue monitoring, alert settings, and logs associated with the synchronization. In the Queue section, you can view the number of messages processed, the current queue size, and some processing statistics (rate per sec and peak processing times). You can also manage messages that failed to be applied by either deleting them or manually resending them.
Figure 1. 37 : Queue Monitoring – Resending Failed Messages
Persistent Caceh Refresh (PCache Monitoring tab)
From the Main Control Panel > PCache Monitoring tab, you can select a real-time persistent cache refresh configuration and monitor the activities of the capture and apply processes.
Figure 1. 38 : Persistent Cache Refresh Monitoring
Monitoring from the Server Control Panels
The items that can be monitored from the Server Control Panels are described in this section.
Click the "Server Control Panel" link at the top of the Main Control Panel to launch the Server Control Panel.
CPU, Memory, Disk Space, Disk Latency and Connection Usage
From the Server Control Panel > Dashboard Tab, you can monitor the CPU, disk space and latency on the machine hosting RadiantOne, and the RadiantOne service memory and connections.
To use this feature, enable the cluster monitor at Main Control Panel > Settings > Logs > Clustermonitor.
Figure 1.3 : Monitoring Resources from the Server Control Panel
Alerts can be configured for memory, connections, disk space and disk latency from the Main Control Panel > Settings tab > Monitoring > Standard Alerts.
To manage alerts:
- In the Main Control Panel go to Settings Tab > Monitoring section > Standard Alerts sub-section.
- Define the memory threshold in the Memory section.
- Define the connection threshold in the Connections section.
- Define the disk space and disk latency thresholds in the Disk Alerts section.
Figure 1.4: Configuring Standard Alerts from the Main Control Panel
- File alerts are enabled by default. If SMTP settings are configured in the Monitoring > Email Alert Settings section you can also use the Email Alert output.
- Click Save.