case class SampledData(numRows: Int, numCols: Int) extends Product with Serializable
SampledData: Encapsulates the sampled data need to initialize a LightGBM dataset. . LightGBM expects sampled data to be an array of vectors, where each feature column has a sparse representation of non-zero values (i.e. indexes and data vector). It also needs a #features sized array of element count per feature to know how long each column is. . Since we create sampled data as a self-contained set with ONLY sampled data and nothing else, the indexes are trivial (0 until #elements). We don't need to maintain original raw indexes. LightGBM only uses this data to get distributions, and does not care about raw row indexes. . This class manages keeping all the indexing in sync so callers can just push rows of data into it and retrieve the resulting pointers at the end. . Note: sample data row count is not expected to exceed max(Int), so we index with Ints.
- Alphabetic
- By Inheritance
- SampledData
- Serializable
- Serializable
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
- def delete(): Unit
- def getRowCounts: SWIGTYPE_p_int
- def getSampleData: SWIGTYPE_p_p_double
- def getSampleIndices: SWIGTYPE_p_p_int
- val numCols: Int
- val numRows: Int
- def pushRow(rowData: SparseVector, index: Int): Unit
- def pushRow(rowData: Array[Double], index: Int): Unit
- def pushRow(rowData: DenseVector, index: Int): Unit
- def pushRow(rowData: Row, index: Int, featureColName: String): Unit
- val rowCounts: IntSwigArray
- val sampleData: DoublePointerSwigArray
- val sampleIndexes: IntPointerSwigArray