object NetworkManager extends Serializable
- Alphabetic
- By Inheritance
- NetworkManager
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
def
create(numTasks: Int, spark: SparkSession, driverListenPort: Int, timeout: Double, useBarrierExecutionMode: Boolean): NetworkManager
Create a NetworkManager, which will encapsulate all network operations.
Create a NetworkManager, which will encapsulate all network operations. This method will opens a socket communications channel on the driver, and then initialize the network manager itself. The NetworkManager object will start a thread that waits for the host:port from the executors, and then sends back the information to the executors.
- numTasks
The total number of training tasks to wait for.
- spark
The Spark session.
- driverListenPort
The port to listen for the driver on.
- timeout
The timeout (in seconds).
- useBarrierExecutionMode
Whether to use barrier mode.
- returns
The NetworkTopology.
-
def
getGlobalNetworkInfo(ctx: TrainingContext, log: Logger, taskId: Long, partitionId: Int, shouldExecuteTraining: Boolean, measures: TaskInstrumentationMeasures): NetworkTopologyInfo
Retrieve the network nodes and current port information from the driver.
Retrieve the network nodes and current port information from the driver.
Establish local socket connection.
Note: Ideally we would start the socket connections in the C layer, this opens us up for race conditions in case other applications open sockets on cluster, but usually this should not be a problem
- ctx
Information about the current training session.
- log
The Logger.
- taskId
The task id.
- partitionId
The partition id.
- shouldExecuteTraining
Whether this task should be a part of the training network.
- measures
Instrumentation for perf measurements.
- returns
Information about the network topology.
-
def
getMainWorkerPort(nodes: String, log: Logger): Int
Gets the main node's port that will return the LightGBM Booster.
Gets the main node's port that will return the LightGBM Booster. Used to minimize network communication overhead in reduce step.
- returns
The main node's port number.
- def initLightGBMNetwork(ctx: PartitionTaskContext, log: Logger, retry: Int = LightGBMConstants.NetworkRetries, delay: Long = LightGBMConstants.InitialDelay): Unit
- def parseWorkerMessage(message: String): TaskMessageInfo