object ClusterUtil
- Alphabetic
- By Inheritance
- ClusterUtil
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
def
getDefaultNumExecutorCores(spark: SparkSession, log: Logger, master: Option[String] = None): Int
Get number of default cores from sparkSession(required) or master(optional) for 1 executor.
Get number of default cores from sparkSession(required) or master(optional) for 1 executor.
- spark
The current spark session. If master parameter is not set, the master in the spark session is used.
- master
This param is needed for unittest. If set, the function return the value for it. if not set, basically, master in spark (SparkSession) is used.
- returns
The number of default cores per executor based on master.
- def getDriverHost(spark: SparkSession): String
-
def
getExecutors(spark: SparkSession): Array[(Int, String)]
Returns a list of executor id and host.
Returns a list of executor id and host.
- spark
The current spark session.
- returns
List of executors as an array of (id,host).
- def getHostToIP(hostname: String): String
- def getJVMCPUs(spark: SparkSession): Int
-
def
getNumExecutorTasks(spark: SparkSession, numTasksPerExec: Int, log: Logger): Int
Returns the number of executors * number of tasks.
Returns the number of executors * number of tasks.
- spark
The current spark session.
- numTasksPerExec
The number of tasks per executor.
- returns
The number of executors * number of tasks.
-
def
getNumRowsPerPartition(df: DataFrame, labelCol: Column): Array[Long]
Get number of rows per partition of a dataframe.
Get number of rows per partition of a dataframe. Note that this will execute a full distributed Spark app query.
- df
The dataframe.
- returns
The number of rows per partition (where partitionId is the array index).
-
def
getNumTasksPerExecutor(spark: SparkSession, log: Logger): Int
Get number of tasks from dummy dataset for 1 executor.
Get number of tasks from dummy dataset for 1 executor. Note: all executors have same number of cores, and this is more reliable than getting value from conf.
- spark
The current spark session.
- log
The Logger.
- returns
The number of tasks per executor.
- def getTaskCpus(sparkContext: SparkContext, log: Logger): Int