stages

Type Members

class Cacher extends Transformer with Wrappable with DefaultParamsWritable
class ClassBalancer extends Estimator[ClassBalancerModel] with DefaultParamsWritable with HasInputCol with HasOutputCol

An estimator that calculates the weights for balancing a dataset.
An estimator that calculates the weights for balancing a dataset. For example, if the negative class is half the size of the positive class, the weights will be 2 for rows with negative classes and 1 for rows with positive classes. these weights can be used in weighted classifiers and regressors to correct for heavily skewed datasets. The inputCol should be the labels of the classes, and the output col will be the requisite weights.
class ClassBalancerModel extends Model[ClassBalancerModel] with ConstructorWritable[ClassBalancerModel]
class DropColumns extends Transformer with Wrappable with DefaultParamsWritable

DropColumns takes a dataframe and a list of columns to drop as input and returns a dataframe comprised of only those columns not listed in the input list.
DropColumns takes a dataframe and a list of columns to drop as input and returns a dataframe comprised of only those columns not listed in the input list.
class DynamicBufferedBatcher[T] extends Iterator[List[T]]
class DynamicMiniBatchTransformer extends Transformer with MiniBatchBase
class EnsembleByKey extends Transformer with Wrappable with DefaultParamsWritable
class Explode extends Transformer with HasInputCol with HasOutputCol with Wrappable with DefaultParamsWritable
class FixedBatcher[T] extends Iterator[List[T]]
class FixedBufferedBatcher[T] extends Iterator[List[T]]
class FixedMiniBatchTransformer extends Transformer with MiniBatchBase with HasBatchSize
class FlattenBatch extends Transformer with Wrappable with DefaultParamsWritable
trait HasBatchSize extends Params
trait HasMiniBatcher extends Params
class Lambda extends Transformer with Wrappable with ComplexParamsWritable
trait MiniBatchBase extends Transformer with DefaultParamsWritable with Wrappable
class MultiColumnAdapter extends Estimator[PipelineModel] with Wrappable with ComplexParamsWritable

The MultiColumnAdapter takes a unary pipeline stage and a list of input output column pairs and applies the pipeline stage to each input column after being fit
The MultiColumnAdapter takes a unary pipeline stage and a list of input output column pairs and applies the pipeline stage to each input column after being fit
class RenameColumn extends Transformer with Wrappable with DefaultParamsWritable with HasInputCol with HasOutputCol

RenameColumn takes a dataframe with an input and an output column name and returns a dataframe comprised of the original columns with the input column renamed as the output column name.
RenameColumn takes a dataframe with an input and an output column name and returns a dataframe comprised of the original columns with the input column renamed as the output column name.
class Repartition extends Transformer with Wrappable with DefaultParamsWritable

Partitions the dataset into n partitions
class SelectColumns extends Transformer with Wrappable with DefaultParamsWritable

SelectColumns takes a dataframe and a list of columns to select as input and returns a dataframe comprised of only those columns listed in the input list.
SelectColumns takes a dataframe and a list of columns to select as input and returns a dataframe comprised of only those columns listed in the input list.
The columns to be selected is a list of column names
class StratifiedRepartition extends Transformer with Wrappable with DefaultParamsWritable with HasLabelCol with HasSeed

StratifiedRepartition repartitions the DataFrame such that each label is selected in each partition.
StratifiedRepartition repartitions the DataFrame such that each label is selected in each partition. This may be necessary in some cases such as in LightGBM multiclass classification, where it is necessary for at least one instance of each label to be present on each partition.
class SummarizeData extends Transformer with SummarizeDataParams

Compute summary statistics for the dataset.
Compute summary statistics for the dataset. The following statistics are computed: - counts - basic - sample - percentiles - errorThreshold - error threshold for quantiles
trait SummarizeDataParams extends Wrappable with DefaultParamsWritable
class TextPreprocessor extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable

TextPreprocessor takes a dataframe and a dictionary that maps (text -> replacement text), scans each cell in the input col and replaces all substring matches with the corresponding value.
TextPreprocessor takes a dataframe and a dictionary that maps (text -> replacement text), scans each cell in the input col and replaces all substring matches with the corresponding value. Priority is given to longer keys and from left to right.
class TimeIntervalBatcher[T] extends Iterator[List[T]]
class TimeIntervalMiniBatchTransformer extends Transformer with MiniBatchBase
class Timer extends Estimator[TimerModel] with TimerParams with ComplexParamsWritable
class TimerModel extends Model[TimerModel] with TimerParams with ConstructorWritable[TimerModel]
trait TimerParams extends Wrappable
class Trie extends Serializable
class UDFTransformer extends Transformer with Wrappable with ComplexParamsWritable with HasInputCol with HasInputCols with HasOutputCol

UDFTransformer takes as input input column, output column, and a UserDefinedFunction returns a dataframe comprised of the original columns with the output column as the result of the udf applied to the input column
UDFTransformer takes as input input column, output column, and a UserDefinedFunction returns a dataframe comprised of the original columns with the output column as the result of the udf applied to the input column

Annotations
@InternalWrapper()
class UnicodeNormalize extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable

UnicodeNormalize takes a dataframe and normalizes the unicode representation.
UnicodeNormalize takes a dataframe and normalizes the unicode representation.

Value Members

object Cacher extends DefaultParamsReadable[Cacher] with Serializable
object ClassBalancer extends DefaultParamsReadable[ClassBalancer] with Serializable
object ClassBalancerModel extends ConstructorReadable[ClassBalancerModel] with Serializable
object DropColumns extends DefaultParamsReadable[DropColumns] with Serializable
object DynamicMiniBatchTransformer extends DefaultParamsReadable[DynamicMiniBatchTransformer] with Serializable
object EnsembleByKey extends DefaultParamsReadable[EnsembleByKey] with Serializable
object Explode extends DefaultParamsReadable[Explode] with Serializable
object FixedMiniBatchTransformer extends DefaultParamsReadable[FixedMiniBatchTransformer] with Serializable
object FlattenBatch extends DefaultParamsReadable[FlattenBatch] with Serializable
object Lambda extends ComplexParamsReadable[Lambda] with Serializable
object MultiColumnAdapter extends ComplexParamsReadable[MultiColumnAdapter] with Serializable
object RenameColumn extends DefaultParamsReadable[RenameColumn] with Serializable
object Repartition extends DefaultParamsReadable[Repartition] with Serializable
object SPConstants

Constants for StratifiedRepartition.
Constants for StratifiedRepartition.
object SelectColumns extends DefaultParamsReadable[SelectColumns] with Serializable
object StratifiedRepartition extends DefaultParamsReadable[DropColumns] with Serializable
object SummarizeData extends DefaultParamsReadable[SummarizeData] with Serializable
object TextPreprocessor extends ComplexParamsReadable[TextPreprocessor] with Serializable
object TimeIntervalMiniBatchTransformer extends DefaultParamsReadable[TimeIntervalMiniBatchTransformer] with Serializable
object Timer extends ComplexParamsReadable[Timer] with Serializable
object TimerModel extends ConstructorReadable[TimerModel] with Serializable
object Trie extends Serializable
object UDFTransformer extends ComplexParamsReadable[UDFTransformer] with Serializable
object UnicodeNormalize extends ComplexParamsReadable[UnicodeNormalize] with Serializable
object udfs

package stages

Type Members

class Cacher extends Transformer with Wrappable with DefaultParamsWritable

class ClassBalancer extends Estimator[ClassBalancerModel] with DefaultParamsWritable with HasInputCol with HasOutputCol

class ClassBalancerModel extends Model[ClassBalancerModel] with ConstructorWritable[ClassBalancerModel]

class DropColumns extends Transformer with Wrappable with DefaultParamsWritable

class DynamicBufferedBatcher[T] extends Iterator[List[T]]

class DynamicMiniBatchTransformer extends Transformer with MiniBatchBase

class EnsembleByKey extends Transformer with Wrappable with DefaultParamsWritable

class Explode extends Transformer with HasInputCol with HasOutputCol with Wrappable with DefaultParamsWritable

class FixedBatcher[T] extends Iterator[List[T]]

class FixedBufferedBatcher[T] extends Iterator[List[T]]

class FixedMiniBatchTransformer extends Transformer with MiniBatchBase with HasBatchSize

class FlattenBatch extends Transformer with Wrappable with DefaultParamsWritable

trait HasBatchSize extends Params

trait HasMiniBatcher extends Params

class Lambda extends Transformer with Wrappable with ComplexParamsWritable

trait MiniBatchBase extends Transformer with DefaultParamsWritable with Wrappable

class MultiColumnAdapter extends Estimator[PipelineModel] with Wrappable with ComplexParamsWritable

class RenameColumn extends Transformer with Wrappable with DefaultParamsWritable with HasInputCol with HasOutputCol

class Repartition extends Transformer with Wrappable with DefaultParamsWritable

class SelectColumns extends Transformer with Wrappable with DefaultParamsWritable

class StratifiedRepartition extends Transformer with Wrappable with DefaultParamsWritable with HasLabelCol with HasSeed

class SummarizeData extends Transformer with SummarizeDataParams

trait SummarizeDataParams extends Wrappable with DefaultParamsWritable

class TextPreprocessor extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable

class TimeIntervalBatcher[T] extends Iterator[List[T]]

class TimeIntervalMiniBatchTransformer extends Transformer with MiniBatchBase

class Timer extends Estimator[TimerModel] with TimerParams with ComplexParamsWritable

class TimerModel extends Model[TimerModel] with TimerParams with ConstructorWritable[TimerModel]

trait TimerParams extends Wrappable

class Trie extends Serializable

class UDFTransformer extends Transformer with Wrappable with ComplexParamsWritable with HasInputCol with HasInputCols with HasOutputCol

class UnicodeNormalize extends Transformer with HasInputCol with HasOutputCol with Wrappable with ComplexParamsWritable

Value Members

object Cacher extends DefaultParamsReadable[Cacher] with Serializable

object ClassBalancer extends DefaultParamsReadable[ClassBalancer] with Serializable

object ClassBalancerModel extends ConstructorReadable[ClassBalancerModel] with Serializable

object DropColumns extends DefaultParamsReadable[DropColumns] with Serializable

object DynamicMiniBatchTransformer extends DefaultParamsReadable[DynamicMiniBatchTransformer] with Serializable

object EnsembleByKey extends DefaultParamsReadable[EnsembleByKey] with Serializable

object Explode extends DefaultParamsReadable[Explode] with Serializable

object FixedMiniBatchTransformer extends DefaultParamsReadable[FixedMiniBatchTransformer] with Serializable

object FlattenBatch extends DefaultParamsReadable[FlattenBatch] with Serializable

object Lambda extends ComplexParamsReadable[Lambda] with Serializable

object MultiColumnAdapter extends ComplexParamsReadable[MultiColumnAdapter] with Serializable

object RenameColumn extends DefaultParamsReadable[RenameColumn] with Serializable

object Repartition extends DefaultParamsReadable[Repartition] with Serializable

object SPConstants

object SelectColumns extends DefaultParamsReadable[SelectColumns] with Serializable

object StratifiedRepartition extends DefaultParamsReadable[DropColumns] with Serializable

object SummarizeData extends DefaultParamsReadable[SummarizeData] with Serializable

object TextPreprocessor extends ComplexParamsReadable[TextPreprocessor] with Serializable

object TimeIntervalMiniBatchTransformer extends DefaultParamsReadable[TimeIntervalMiniBatchTransformer] with Serializable

object Timer extends ComplexParamsReadable[Timer] with Serializable

object TimerModel extends ConstructorReadable[TimerModel] with Serializable

object Trie extends Serializable

object UDFTransformer extends ComplexParamsReadable[UDFTransformer] with Serializable

object UnicodeNormalize extends ComplexParamsReadable[UnicodeNormalize] with Serializable

object udfs

Ungrouped