Packages

object ClusterUtil

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ClusterUtil
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def getDefaultNumExecutorCores(spark: SparkSession, log: Logger, master: Option[String] = None): Int

    Get number of default cores from sparkSession(required) or master(optional) for 1 executor.

    Get number of default cores from sparkSession(required) or master(optional) for 1 executor.

    spark

    The current spark session. If master parameter is not set, the master in the spark session is used.

    master

    This param is needed for unittest. If set, the function return the value for it. if not set, basically, master in spark (SparkSession) is used.

    returns

    The number of default cores per executor based on master.

  2. def getDriverHost(spark: SparkSession): String
  3. def getExecutors(spark: SparkSession): Array[(Int, String)]

    Returns a list of executor id and host.

    Returns a list of executor id and host.

    spark

    The current spark session.

    returns

    List of executors as an array of (id,host).

  4. def getHostToIP(hostname: String): String
  5. def getJVMCPUs(spark: SparkSession): Int
  6. def getNumExecutorTasks(spark: SparkSession, numTasksPerExec: Int, log: Logger): Int

    Returns the number of executors * number of tasks.

    Returns the number of executors * number of tasks.

    spark

    The current spark session.

    numTasksPerExec

    The number of tasks per executor.

    returns

    The number of executors * number of tasks.

  7. def getNumRowsPerPartition(df: DataFrame, labelCol: Column): Array[Long]

    Get number of rows per partition of a dataframe.

    Get number of rows per partition of a dataframe. Note that this will execute a full distributed Spark app query.

    df

    The dataframe.

    returns

    The number of rows per partition (where partitionId is the array index).

  8. def getNumTasksPerExecutor(spark: SparkSession, log: Logger): Int

    Get number of tasks from dummy dataset for 1 executor.

    Get number of tasks from dummy dataset for 1 executor. Note: all executors have same number of cores, and this is more reliable than getting value from conf.

    spark

    The current spark session.

    log

    The Logger.

    returns

    The number of tasks per executor.

  9. def getTaskCpus(sparkContext: SparkContext, log: Logger): Int