Class

com.microsoft.ml.spark.vw.featurizer

StringSplitFeaturizer

Related Doc: package featurizer

Permalink

class StringSplitFeaturizer extends Featurizer

Featurize strings by splitting into native VW structure. (hash(s(0)):value, hash(s(1)):value, ...)

Linear Supertypes
Featurizer, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StringSplitFeaturizer
  2. Featurizer
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StringSplitFeaturizer(fieldIdx: Int, columnName: String, namespaceHash: Int, mask: Int)

    Permalink

    fieldIdx

    input field index.

    columnName

    used as feature name prefix.

    namespaceHash

    pre-hashed namespace.

    mask

    bit mask applied to final hash.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val columnName: String

    Permalink

    used as feature name prefix.

    used as feature name prefix.

    Definition Classes
    StringSplitFeaturizerFeaturizer
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def featurize(row: Row, indices: ArrayBuilder[Int], values: ArrayBuilder[Double]): Unit

    Permalink

    Featurize a single row.

    Featurize a single row.

    row

    input row.

    indices

    output indices.

    values

    output values.

    Definition Classes
    StringSplitFeaturizerFeaturizer
    Note

    this interface isn't very Scala-esce, but it avoids lots of allocation. Also due to SparseVector limitations we don't support 64bit indices (e.g. indices are signed 32bit ints)

  10. val fieldIdx: Int

    Permalink

    input field index.

    input field index.

    Definition Classes
    StringSplitFeaturizerFeaturizer
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. lazy val hasher: VowpalWabbitMurmurWithPrefix

    Permalink

    Initialize hasher that already pre-hashes the column prefix.

    Initialize hasher that already pre-hashes the column prefix.

    Attributes
    protected
    Definition Classes
    Featurizer
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. val mask: Int

    Permalink

    bit mask applied to final hash.

  17. val namespaceHash: Int

    Permalink

    pre-hashed namespace.

  18. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. val nonWhiteSpaces: Regex

    Permalink

    (?U) makes \w unicode aware https://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions we could follow https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html but that strips single character words...

    (?U) makes \w unicode aware https://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions we could follow https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html but that strips single character words...

    TODO: expose as user configurable parameter

  20. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  23. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Featurizer

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped