Kafka Brokers

Introduction

Here we'll go over the Kafka Broker JMX Data Source configuration fields and list Metrics for that source.  Please see our Kafka Overview article for details about other Kafka data sources besides Broker.

 

Requirements

Before we continue, you will need JMX enabled.  It is turned off by default on Kafka Brokers, so you'll want to enable JMX.

Add the line "export JMX_PORT=9999" to the JMX settings section inside the $KAFKA_HOME/bin/kafka-run-class.sh file at this location, where "9999" is the port you wish to use, then Restart the Kafka Broker.

 

Configure Kafka Brokers Data Sources

Open up the configuration modal for Kafka, and click on the Configure Data Sources tab.  In the Add a data source drop down menu [1], select Kafka Broker JMX.  Click on Edit [2] to the right to expose the fields that need to be configured [3]:

  

These are the field names (along with their default values) and a detailed description of what should go in each field under the Kafka Broker JMX data source:

Field Name Default Value Description
JMX URL service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi

When JMX is enabled at 9999 (for example), Opsclarity Recommends the following string:

service:jmx:rmi:///jndi/rmi://_HOST:9999_:9999/jmxrmi

When the agent sees this configuration, it will automatically figure out what address it should use to make a connection at the specified port.  This is extremely helpful in cases where listening service binds to local IP address instead of localhost.

JAVAHOME empty In most of the cases, the Opsclarity Agent will pick Up Java Home, but you can provide it here.
User Name empty If authentication is enabled, provide a username here.
Password empty If authentication is enabled, provide a password here. 
Classpath empty Input your Java classpath

  

 Kafka Broker Metrics

These performance metrics are collected from each broker:  

OpsClarity Metric Name Source Metric Name
under_replicated_partitions kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
offline_partitions_count kafka.controller:type=KafkaController,name=OfflinePartitionsCount
active_controller_count kafka.controller:type=KafkaController,name=ActiveControllerCount
messages_in_per_sec kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
bytes_in_per_sec kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
bytes_out_per_sec kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
produce_requests_per_sec kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce
fetchconsumer_requests_per_sec kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer
fetchfollower_requests_per_sec kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower
log_flush_rate_and_time_ms kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
leader_election_rate_and_time_ms kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
unclean_leader_elections_per_sec kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
partition_count kafka.server:type=ReplicaManager,name=PartitionCount
leader_count kafka.server:type=ReplicaManager,name=LeaderCount
isr_shrinks_per_sec kafka.server:type=ReplicaManager,name=IsrShrinksPerSec
isr_expands_per_sec kafka.server:type=ReplicaManager,name=IsrExpandsPerSec
replica_max_lag kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica
produce_requests_total_time_ms kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
fetchconsumer_requests_total_time_ms kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer
fetchfollower_requests_total_time_ms kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower
producer_request_purgatory_size kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize
fetch_request_purgatory_size kafka.server:type=FetchRequestPurgatory,name=PurgatorySize

  

Kafka Aggregate Metrics

These aggregated metrics are created by OpsClarity for monitors and graphs:

Aggregate Metric Name Aggregate Function Base Metric Name
active_controller_count.svc.sum
SUM
active_controller_count
bytes_in_per_sec.svc.sum
SUM
bytes_in_per_sec
bytes_out_per_sec.svc.sum
SUM
bytes_out_per_sec
messages_in.svc.avg
AVG
messages_in
offline_partitions_count.svc.sum
SUM
offline_partitions_count
under_replicated_partitions.svc.sum
SUM
under_replicated_partitions
heap_memory_usage_average
AVG
used_heap_memory

 

 Kafka Default Monitors

Following monitors are applied by default for every Kafka broker cluster discovered by OpsClarity.

Port
Monitors the availability of the Kafka broker port running on a host, which uses 9092 as default.
 
Metric
Following conditions are monitored using Metric Monitors
 
1. If under_replicated_partitions is greater than 0 for 5 minutes, then turn the monitor health status to critical. This condition means that there are a set of partitions are not getting replicated across the cluster completely and hence, the cluster will not be able to ensure zero data loss if there is any outage at this point.
 
2. If offline_partition_count is greater than 0 for 5 minutes, then turn the monitor health status to critical. This condition means that there a set of partitions that don't have an active leader and are hence not accessible for both reads and writes. 
 
3. If active_controller_count.svc.sum is greater than 1 for 5 minutes, then turn the monitor health status to critical. This condition means that there are multiple brokers in the cluster who consider themselves as leader and hence can make the cluster inconsistent as reads/writes are served. The normal condition is when only 1 broker considers itself as the leader.
 
 
 
OpsClarity Statistical
 
Unexpected behavior in the following metrics is monitored using OpsClarity Statistical Monitors.
 
1. used_heap_memory on each broker
2. GenericJMX.total_time_in_ms.kafka_jvm.gc.collection_time.PS_MarkSweep.value on each broker
3. messages_in.svc.avg for the overall service cluster
 
 

 

If you have any questions or comments about this article, feel free to contact us at support@opsclarity.com.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.