Spark Yarn Driver

Introduction

Here we'll go over the Spark Yarn Driver

 

Requirements

 

Spark Yarn Driver Data Sources
 
Open up the configuration modal for Spark, and click on the Configure Data Sources tab.  In the Add a data source drop down menu [1], select Spark Yarn Driver.  Click on Edit [2] to the right to expose the fields that need to be configured [3]:
 
 SparkYarnDriverConfig.png
 
The plugin requires the following information:
 
Field Name Default Value Description
 Resource Manager Host _HOST:8088_  
Port 8088  
 Metrics URL Path /api/v1/applications  
Resource Manager Apps Path /ws/v1/cluster/apps  
Run as Unix user nobody  

 

Spark  Driver Metrics

 

Metric Name Units Metric Description
active_tasks Count Number of active tasks
completed_tasks Count Number of completed tasks
disk_used Bytes Disk Space Used
failed_tasks Count Number of Failed tasks
memory_max Bytes Max memory set for this process
memory_used Bytes Memory used
rdd_blocks Blocks Number of RDD blocks
total_duration Seconds Total duration for which this process has run
total_input_bytes Bytes Total Input bytes
total_shuffle_read Bytes Shuffle read size
total_shuffle_write Bytes Shuffle write size
total_tasks Count Total number of tasks

Spark  Executor Metrics

Metric Name Units Metric Description
active_tasks Count Number of active tasks
completed_tasks Count Number of completed tasks
disk_used Bytes Disk Space Used
failed_tasks Count Number of Failed tasks
memory_max Bytes Max memory set for this process
memory_used Bytes Memory used
rdd_blocks Blocks Number of RDD blocks
total_duration Seconds Total duration for which this process has run
total_input_bytes Bytes Total Input bytes
total_shuffle_read Bytes Shuffle read size
total_shuffle_write Bytes Shuffle write size
total_tasks Count Total number of tasks

 

Spark Jobs Metrics

Metric Name Units Metric Description
jobs_count Count Number of jobs
active_stages Count Number of active stages for the given jobId
active_tasks Count Number of active tasks for the given jobId
completed_stages Count Number of completed stages for the given jobId
complete_tasks Count Number of completed tasks for the given jobId
failed_stages Count Number of failed stages
failed_tasks Count Number of failed tasks for the given jobId
skipped_stages Count Number of skipped stages for the given job id
skipped_tasks Count Number of skipped tasks for the given jobId
num_tasks Count Number of tasks for the given jobId

 

Spark RDD Metrics

Metric Name Units Metric Description
rdd_disk_used Bytes disk used in the rdd
rdd_mem_used Bytes memory used in rdd
cached_partitions Count number of cached partitions in a given rdd
num_partitions Count Number of partitions in a given rdd

 


 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.