object DatasetUtils
Linear Supertypes
Ordering
- Alphabetic
- By Inheritance
Inherited
- DatasetUtils
- AnyRef
- Any
- Hide All
- Show All
Visibility
- Public
- All
Type Members
- case class CardinalityTriplet[T](groupCounts: List[Int], currentValue: T, currentCount: Int) extends Product with Serializable
Value Members
- def countCardinality[T](input: Seq[T]): Array[Int]
-
def
getArrayType(rowsIter: Iterator[Row], matrixType: String, featuresColumn: String): (Iterator[Row], Boolean)
Get whether to use dense or sparse data, using configuration and/or data sampling.
Get whether to use dense or sparse data, using configuration and/or data sampling.
- rowsIter
Iterator of rows.
- matrixType
Matrix type as configured by user..
- featuresColumn
The name of the features column.
- returns
A reconstructed iterator with the same original rows and whether the matrix should be sparse or dense.
- def getRowAsDoubleArray(row: Row, columnParams: ColumnParams): Array[Double]
-
def
sampleRowsForArrayType(rowsIter: Iterator[Row], featuresColumn: String): (Iterator[Row], Boolean)
Sample the first several rows to determine whether to construct sparse or dense matrix in lightgbm native code.
Sample the first several rows to determine whether to construct sparse or dense matrix in lightgbm native code.
- rowsIter
Iterator of rows.
- featuresColumn
The name of the features column.
- returns
A reconstructed iterator with the same original rows and whether the matrix should be sparse or dense.
- def validateGroupColumn(col: String, schema: StructType): Unit