object NetworkManager extends Serializable

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. NetworkManager
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def create(numTasks: Int, spark: SparkSession, driverListenPort: Int, timeout: Double, useBarrierExecutionMode: Boolean): NetworkManager

    Create a NetworkManager, which will encapsulate all network operations.

    Create a NetworkManager, which will encapsulate all network operations. This method will opens a socket communications channel on the driver, and then initialize the network manager itself. The NetworkManager object will start a thread that waits for the host:port from the executors, and then sends back the information to the executors.

    numTasks

    The total number of training tasks to wait for.

    spark

    The Spark session.

    driverListenPort

    The port to listen for the driver on.

    timeout

    The timeout (in seconds).

    useBarrierExecutionMode

    Whether to use barrier mode.

    returns

    The NetworkTopology.

  2. def getGlobalNetworkInfo(ctx: TrainingContext, log: Logger, taskId: Long, partitionId: Int, shouldExecuteTraining: Boolean, measures: TaskInstrumentationMeasures): NetworkTopologyInfo

    Retrieve the network nodes and current port information from the driver.

    Retrieve the network nodes and current port information from the driver.

    Establish local socket connection.

    Note: Ideally we would start the socket connections in the C layer, this opens us up for race conditions in case other applications open sockets on cluster, but usually this should not be a problem

    ctx

    Information about the current training session.

    log

    The Logger.

    taskId

    The task id.

    partitionId

    The partition id.

    shouldExecuteTraining

    Whether this task should be a part of the training network.

    measures

    Instrumentation for perf measurements.

    returns

    Information about the network topology.

  3. def getMainWorkerPort(nodes: String, log: Logger): Int

    Gets the main node's port that will return the LightGBM Booster.

    Gets the main node's port that will return the LightGBM Booster. Used to minimize network communication overhead in reduce step.

    returns

    The main node's port number.

  4. def initLightGBMNetwork(ctx: PartitionTaskContext, log: Logger, retry: Int = LightGBMConstants.NetworkRetries, delay: Long = LightGBMConstants.InitialDelay): Unit
  5. def parseWorkerMessage(message: String): TaskMessageInfo