Packages

class TextFeaturizer extends Estimator[PipelineModel] with TextFeaturizerParams with HasInputCol with HasOutputCol with BasicLogging

Featurize text.

Linear Supertypes
BasicLogging, HasOutputCol, HasInputCol, TextFeaturizerParams, DefaultParamsWritable, MLWritable, Wrappable, DotnetWrappable, RWrappable, PythonWrappable, BaseWrappable, Estimator[PipelineModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TextFeaturizer
  2. BasicLogging
  3. HasOutputCol
  4. HasInputCol
  5. TextFeaturizerParams
  6. DefaultParamsWritable
  7. MLWritable
  8. Wrappable
  9. DotnetWrappable
  10. RWrappable
  11. PythonWrappable
  12. BaseWrappable
  13. Estimator
  14. PipelineStage
  15. Logging
  16. Params
  17. Serializable
  18. Serializable
  19. Identifiable
  20. AnyRef
  21. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TextFeaturizer()
  2. new TextFeaturizer(uid: String)

    uid

    The id of the module

Value Members

  1. val binary: BooleanParam

    All nonnegative word counts are set to 1 when set to true

    All nonnegative word counts are set to 1 when set to true

    Definition Classes
    TextFeaturizerParams
  2. val caseSensitiveStopWords: BooleanParam

    Indicates whether a case sensitive comparison is performed on stop words.

    Indicates whether a case sensitive comparison is performed on stop words.

    Definition Classes
    TextFeaturizerParams
  3. final def clear(param: Param[_]): TextFeaturizer.this.type
    Definition Classes
    Params
  4. def copy(extra: ParamMap): TextFeaturizer.this.type
    Definition Classes
    TextFeaturizer → Estimator → PipelineStage → Params
  5. val defaultStopWordLanguage: Param[String]

    Specify the language to use for stop word removal.

    Specify the language to use for stop word removal. The Use the custom setting when using the stopWords input

    Definition Classes
    TextFeaturizerParams
  6. def dotnetAdditionalMethods: String
    Definition Classes
    DotnetWrappable
  7. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  8. def explainParams(): String
    Definition Classes
    Params
  9. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  10. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  11. def fit(dataset: Dataset[_]): PipelineModel
    Definition Classes
    TextFeaturizer → Estimator
  12. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[PipelineModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  13. def fit(dataset: Dataset[_], paramMap: ParamMap): PipelineModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  14. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): PipelineModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  15. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  16. final def getBinary: Boolean

    Definition Classes
    TextFeaturizerParams
  17. final def getCaseSensitiveStopWords: Boolean

    Definition Classes
    TextFeaturizerParams
  18. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  19. final def getDefaultStopWordLanguage: String

    Definition Classes
    TextFeaturizerParams
  20. def getInputCol: String

    Definition Classes
    HasInputCol
  21. final def getMinDocFreq: Int

    Definition Classes
    TextFeaturizerParams
  22. final def getMinTokenLength: Int

    Definition Classes
    TextFeaturizerParams
  23. final def getNGramLength: Int

    Definition Classes
    TextFeaturizerParams
  24. final def getNumFeatures: Int

    Definition Classes
    TextFeaturizerParams
  25. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  26. def getOutputCol: String

    Definition Classes
    HasOutputCol
  27. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  28. def getParamInfo(p: Param[_]): ParamInfo[_]
    Definition Classes
    BaseWrappable
  29. final def getStopWords: String

    Definition Classes
    TextFeaturizerParams
  30. final def getToLowercase: Boolean

    Definition Classes
    TextFeaturizerParams
  31. final def getTokenizerGaps: Boolean

    Definition Classes
    TextFeaturizerParams
  32. final def getTokenizerPattern: String

    Definition Classes
    TextFeaturizerParams
  33. final def getUseIDF: Boolean

    Definition Classes
    TextFeaturizerParams
  34. final def getUseNGram: Boolean

    Definition Classes
    TextFeaturizerParams
  35. final def getUseStopWordsRemover: Boolean

    Definition Classes
    TextFeaturizerParams
  36. final def getUseTokenizer: Boolean

    Definition Classes
    TextFeaturizerParams
  37. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  38. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  39. val inputCol: Param[String]

    The name of the input column

    The name of the input column

    Definition Classes
    HasInputCol
  40. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  41. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  42. def logClass(): Unit
    Definition Classes
    BasicLogging
  43. def logFit[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  44. def logPredict[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  45. def logTrain[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  46. def logTransform[T](f: ⇒ T): T
    Definition Classes
    BasicLogging
  47. def logVerb[T](verb: String, f: ⇒ T): T
    Definition Classes
    BasicLogging
  48. def makeDotnetFile(conf: CodegenConfig): Unit
    Definition Classes
    DotnetWrappable
  49. def makePyFile(conf: CodegenConfig): Unit
    Definition Classes
    PythonWrappable
  50. def makeRFile(conf: CodegenConfig): Unit
    Definition Classes
    RWrappable
  51. val minDocFreq: IntParam

    Minimum number of documents in which a term should appear.

    Minimum number of documents in which a term should appear.

    Definition Classes
    TextFeaturizerParams
  52. val minTokenLength: IntParam

    Minumum token length; must be 0 or greater.

    Minumum token length; must be 0 or greater.

    Definition Classes
    TextFeaturizerParams
  53. val nGramLength: IntParam

    The size of the Ngrams

    The size of the Ngrams

    Definition Classes
    TextFeaturizerParams
  54. val numFeatures: IntParam

    Set the number of features to hash each document to

    Set the number of features to hash each document to

    Definition Classes
    TextFeaturizerParams
  55. val outputCol: Param[String]

    The name of the output column

    The name of the output column

    Definition Classes
    HasOutputCol
  56. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  57. def pyAdditionalMethods: String
    Definition Classes
    PythonWrappable
  58. def pyInitFunc(): String
    Definition Classes
    PythonWrappable
  59. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  60. final def set[T](param: Param[T], value: T): TextFeaturizer.this.type
    Definition Classes
    Params
  61. def setBinary(value: Boolean): TextFeaturizer.this.type

  62. def setCaseSensitiveStopWords(value: Boolean): TextFeaturizer.this.type

  63. def setDefaultStopWordLanguage(value: String): TextFeaturizer.this.type

  64. def setInputCol(value: String): TextFeaturizer.this.type

    Definition Classes
    HasInputCol
  65. def setMinDocFreq(value: Int): TextFeaturizer.this.type

  66. def setMinTokenLength(value: Int): TextFeaturizer.this.type

  67. def setNGramLength(value: Int): TextFeaturizer.this.type

  68. def setNumFeatures(value: Int): TextFeaturizer.this.type

  69. def setOutputCol(value: String): TextFeaturizer.this.type

    Definition Classes
    HasOutputCol
  70. def setStopWords(value: String): TextFeaturizer.this.type

  71. def setToLowercase(value: Boolean): TextFeaturizer.this.type

  72. def setTokenizerGaps(value: Boolean): TextFeaturizer.this.type

  73. def setTokenizerPattern(value: String): TextFeaturizer.this.type

  74. def setUseIDF(value: Boolean): TextFeaturizer.this.type

  75. def setUseNGram(value: Boolean): TextFeaturizer.this.type

  76. def setUseStopWordsRemover(value: Boolean): TextFeaturizer.this.type

  77. def setUseTokenizer(value: Boolean): TextFeaturizer.this.type
  78. val stopWords: Param[String]

    The words to be filtered out.

    The words to be filtered out. This is a comma separated list of words, encoded as a single string. For example, "a, the, and"

    Definition Classes
    TextFeaturizerParams
  79. val toLowercase: BooleanParam

    Indicates whether to convert all characters to lowercase before tokenizing.

    Indicates whether to convert all characters to lowercase before tokenizing.

    Definition Classes
    TextFeaturizerParams
  80. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  81. val tokenizerGaps: BooleanParam

    Indicates whether the regex splits on gaps (true) or matches tokens (false)

    Indicates whether the regex splits on gaps (true) or matches tokens (false)

    Definition Classes
    TextFeaturizerParams
  82. val tokenizerPattern: Param[String]

    Regex pattern used to match delimiters if gaps (true) or tokens (false)

    Regex pattern used to match delimiters if gaps (true) or tokens (false)

    Definition Classes
    TextFeaturizerParams
  83. def transformSchema(schema: StructType): StructType
    Definition Classes
    TextFeaturizer → PipelineStage
  84. val uid: String
    Definition Classes
    TextFeaturizerBasicLogging → Identifiable
  85. val useIDF: BooleanParam

    Scale the Term Frequencies by IDF when set to true

    Scale the Term Frequencies by IDF when set to true

    Definition Classes
    TextFeaturizerParams
  86. val useNGram: BooleanParam

    Enumerate N grams when set

    Enumerate N grams when set

    Definition Classes
    TextFeaturizerParams
  87. val useStopWordsRemover: BooleanParam

    Indicates whether to remove stop words from tokenized data.

    Indicates whether to remove stop words from tokenized data.

    Definition Classes
    TextFeaturizerParams
  88. val useTokenizer: BooleanParam

    Tokenize the input when set to true

    Tokenize the input when set to true

    Definition Classes
    TextFeaturizerParams
  89. val ver: String
    Definition Classes
    BasicLogging
  90. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable