package stages
- Alphabetic
- Public
- All
Type Members
- class Cacher extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
-
class
ClassBalancer extends Estimator[ClassBalancerModel] with DefaultParamsWritable with HasInputCol with HasOutputCol with Wrappable with BasicLogging
An estimator that calculates the weights for balancing a dataset.
An estimator that calculates the weights for balancing a dataset. For example, if the negative class is half the size of the positive class, the weights will be 2 for rows with negative classes and 1 for rows with positive classes. these weights can be used in weighted classifiers and regressors to correct for heavily skewed datasets. The inputCol should be the labels of the classes, and the output col will be the requisite weights.
- class ClassBalancerModel extends Model[ClassBalancerModel] with ComplexParamsWritable with Wrappable with HasInputCol with HasOutputCol with BasicLogging
- class Consolidator[T] extends AnyRef
-
class
DropColumns extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
DropColumns
takes a dataframe and a list of columns to drop as input and returns a dataframe comprised of only those columns not listed in the input list. - class DynamicBufferedBatcher[T] extends Iterator[List[T]]
- class DynamicMiniBatchTransformer extends Transformer with MiniBatchBase with BasicLogging
- class EnsembleByKey extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
- class Explode extends Transformer with HasInputCol with HasOutputCol with Wrappable with DefaultParamsWritable with BasicLogging
- class FixedBatcher[T] extends Iterator[List[T]]
- class FixedBufferedBatcher[T] extends Iterator[List[T]]
- class FixedMiniBatchTransformer extends Transformer with MiniBatchBase with HasBatchSize with BasicLogging
- class FlattenBatch extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
- trait HasBatchSize extends Params
- trait HasMiniBatcher extends Params
- class Lambda extends Transformer with Wrappable with ComplexParamsWritable with BasicLogging
- trait MiniBatchBase extends Transformer with DefaultParamsWritable with Wrappable with BasicLogging
-
class
MultiColumnAdapter extends Estimator[PipelineModel] with Wrappable with ComplexParamsWritable with BasicLogging
The
MultiColumnAdapter
takes a unary pipeline stage and a list of input output column pairs and applies the pipeline stage to each input column after being fit - class PartitionConsolidator extends Transformer with ConcurrencyParams with HasInputCol with HasOutputCol with ComplexParamsWritable with BasicLogging
-
class
RenameColumn extends Transformer with Wrappable with DefaultParamsWritable with HasInputCol with HasOutputCol with BasicLogging
RenameColumn
takes a dataframe with an input and an output column name and returns a dataframe comprised of the original columns with the input column renamed as the output column name. -
class
Repartition extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
Partitions the dataset into n partitions
-
class
SelectColumns extends Transformer with Wrappable with DefaultParamsWritable with BasicLogging
SelectColumns
takes a dataframe and a list of columns to select as input and returns a dataframe comprised of only those columns listed in the input list.SelectColumns
takes a dataframe and a list of columns to select as input and returns a dataframe comprised of only those columns listed in the input list.The columns to be selected is a list of column names
-
class
StratifiedRepartition extends Transformer with Wrappable with DefaultParamsWritable with HasLabelCol with HasSeed with BasicLogging
StratifiedRepartition
repartitions the DataFrame such that each label is selected in each partition.StratifiedRepartition
repartitions the DataFrame such that each label is selected in each partition. This may be necessary in some cases such as in LightGBM multiclass classification, where it is necessary for at least one instance of each label to be present on each partition. -
class
SummarizeData extends Transformer with SummarizeDataParams with BasicLogging
Compute summary statistics for the dataset.
Compute summary statistics for the dataset. The following statistics are computed: - counts - basic - sample - percentiles - errorThreshold - error threshold for quantiles
- trait SummarizeDataParams extends Wrappable with DefaultParamsWritable
-
class
TextPreprocessor extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable with BasicLogging
TextPreprocessor
takes a dataframe and a dictionary that maps (text -> replacement text), scans each cell in the input col and replaces all substring matches with the corresponding value.TextPreprocessor
takes a dataframe and a dictionary that maps (text -> replacement text), scans each cell in the input col and replaces all substring matches with the corresponding value. Priority is given to longer keys and from left to right. - class TimeIntervalBatcher[T] extends Iterator[List[T]]
- class TimeIntervalMiniBatchTransformer extends Transformer with MiniBatchBase with BasicLogging
- class Timer extends Estimator[TimerModel] with TimerParams with ComplexParamsWritable with BasicLogging
- class TimerModel extends Model[TimerModel] with TimerParams with ComplexParamsWritable with BasicLogging
- trait TimerParams extends Wrappable
- class Trie extends Serializable
-
class
UDFTransformer extends Transformer with Wrappable with ComplexParamsWritable with HasInputCol with HasInputCols with HasOutputCol with BasicLogging
UDFTransformer
takes as input input column, output column, and a UserDefinedFunction returns a dataframe comprised of the original columns with the output column as the result of the udf applied to the input column -
class
UnicodeNormalize extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable with BasicLogging
UnicodeNormalize
takes a dataframe and normalizes the unicode representation.
Value Members
- object Cacher extends DefaultParamsReadable[Cacher] with Serializable
- object ClassBalancer extends DefaultParamsReadable[ClassBalancer] with Serializable
- object ClassBalancerModel extends ComplexParamsReadable[ClassBalancerModel] with Serializable
- object DropColumns extends DefaultParamsReadable[DropColumns] with Serializable
- object DynamicMiniBatchTransformer extends DefaultParamsReadable[DynamicMiniBatchTransformer] with Serializable
- object EnsembleByKey extends DefaultParamsReadable[EnsembleByKey] with Serializable
- object Explode extends DefaultParamsReadable[Explode] with Serializable
- object FixedMiniBatchTransformer extends DefaultParamsReadable[FixedMiniBatchTransformer] with Serializable
- object FlattenBatch extends DefaultParamsReadable[FlattenBatch] with Serializable
- object Lambda extends ComplexParamsReadable[Lambda] with Serializable
- object MultiColumnAdapter extends ComplexParamsReadable[MultiColumnAdapter] with Serializable
- object PartitionConsolidator extends DefaultParamsReadable[PartitionConsolidator] with Serializable
- object RenameColumn extends DefaultParamsReadable[RenameColumn] with Serializable
- object Repartition extends DefaultParamsReadable[Repartition] with Serializable
-
object
SPConstants
Constants for
StratifiedRepartition
. - object SelectColumns extends DefaultParamsReadable[SelectColumns] with Serializable
- object StratifiedRepartition extends DefaultParamsReadable[DropColumns] with Serializable
- object SummarizeData extends DefaultParamsReadable[SummarizeData] with Serializable
- object TextPreprocessor extends ComplexParamsReadable[TextPreprocessor] with Serializable
- object TimeIntervalMiniBatchTransformer extends DefaultParamsReadable[TimeIntervalMiniBatchTransformer] with Serializable
- object Timer extends ComplexParamsReadable[Timer] with Serializable
- object TimerModel extends ComplexParamsReadable[TimerModel] with Serializable
- object Trie extends Serializable
- object UDFTransformer extends ComplexParamsReadable[UDFTransformer] with Serializable
- object UnicodeNormalize extends ComplexParamsReadable[UnicodeNormalize] with Serializable
- object udfs