com.microsoft.ml.spark.featurize.text
All nonnegative word counts are set to 1 when set to true
All nonnegative word counts are set to 1 when set to true
Indicates whether a case sensitive comparison is performed on stop words.
Indicates whether a case sensitive comparison is performed on stop words.
Specify the language to use for stop word removal.
Specify the language to use for stop word removal. The Use the custom setting when using the stopWords input
The name of the input column
The name of the input column
Minimum number of documents in which a term should appear.
Minimum number of documents in which a term should appear.
Minumum token length; must be 0 or greater.
Minumum token length; must be 0 or greater.
The size of the Ngrams
The size of the Ngrams
Set the number of features to hash each document to
Set the number of features to hash each document to
The name of the output column
The name of the output column
The words to be filtered out.
The words to be filtered out. This is a comma separated list of words, encoded as a single string. For example, "a, the, and"
Indicates whether to convert all characters to lowercase before tokenizing.
Indicates whether to convert all characters to lowercase before tokenizing.
Indicates whether the regex splits on gaps (true) or matches tokens (false)
Indicates whether the regex splits on gaps (true) or matches tokens (false)
Regex pattern used to match delimiters if gaps (true) or tokens (false)
Regex pattern used to match delimiters if gaps (true) or tokens (false)
The id of the module
The id of the module
Scale the Term Frequencies by IDF when set to true
Scale the Term Frequencies by IDF when set to true
Enumerate N grams when set
Enumerate N grams when set
Indicates whether to remove stop words from tokenized data.
Indicates whether to remove stop words from tokenized data.
Tokenize the input when set to true
Tokenize the input when set to true
Featurize text.