Synapseml
0.10.0
|
TextFeaturizer implements TextFeaturizer More...
Public Member Functions | |
TextFeaturizer () | |
Creates a TextFeaturizer without any parameters. More... | |
TextFeaturizer (string uid) | |
Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID. More... | |
TextFeaturizer | SetBinary (bool value) |
Sets value for binary More... | |
TextFeaturizer | SetCaseSensitiveStopWords (bool value) |
Sets value for caseSensitiveStopWords More... | |
TextFeaturizer | SetDefaultStopWordLanguage (string value) |
Sets value for defaultStopWordLanguage More... | |
TextFeaturizer | SetInputCol (string value) |
Sets value for inputCol More... | |
TextFeaturizer | SetMinDocFreq (int value) |
Sets value for minDocFreq More... | |
TextFeaturizer | SetMinTokenLength (int value) |
Sets value for minTokenLength More... | |
TextFeaturizer | SetNGramLength (int value) |
Sets value for nGramLength More... | |
TextFeaturizer | SetNumFeatures (int value) |
Sets value for numFeatures More... | |
TextFeaturizer | SetOutputCol (string value) |
Sets value for outputCol More... | |
TextFeaturizer | SetStopWords (string value) |
Sets value for stopWords More... | |
TextFeaturizer | SetToLowercase (bool value) |
Sets value for toLowercase More... | |
TextFeaturizer | SetTokenizerGaps (bool value) |
Sets value for tokenizerGaps More... | |
TextFeaturizer | SetTokenizerPattern (string value) |
Sets value for tokenizerPattern More... | |
TextFeaturizer | SetUseIDF (bool value) |
Sets value for useIDF More... | |
TextFeaturizer | SetUseNGram (bool value) |
Sets value for useNGram More... | |
TextFeaturizer | SetUseStopWordsRemover (bool value) |
Sets value for useStopWordsRemover More... | |
TextFeaturizer | SetUseTokenizer (bool value) |
Sets value for useTokenizer More... | |
bool | GetBinary () |
Gets binary value More... | |
bool | GetCaseSensitiveStopWords () |
Gets caseSensitiveStopWords value More... | |
string | GetDefaultStopWordLanguage () |
Gets defaultStopWordLanguage value More... | |
string | GetInputCol () |
Gets inputCol value More... | |
int | GetMinDocFreq () |
Gets minDocFreq value More... | |
int | GetMinTokenLength () |
Gets minTokenLength value More... | |
int | GetNGramLength () |
Gets nGramLength value More... | |
int | GetNumFeatures () |
Gets numFeatures value More... | |
string | GetOutputCol () |
Gets outputCol value More... | |
string | GetStopWords () |
Gets stopWords value More... | |
bool | GetToLowercase () |
Gets toLowercase value More... | |
bool | GetTokenizerGaps () |
Gets tokenizerGaps value More... | |
string | GetTokenizerPattern () |
Gets tokenizerPattern value More... | |
bool | GetUseIDF () |
Gets useIDF value More... | |
bool | GetUseNGram () |
Gets useNGram value More... | |
bool | GetUseStopWordsRemover () |
Gets useStopWordsRemover value More... | |
bool | GetUseTokenizer () |
Gets useTokenizer value More... | |
override PipelineModel | Fit (DataFrame dataset) |
Fits a model to the input data. More... | |
void | Save (string path) |
Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala. More... | |
JavaMLWriter | Write () |
| |
JavaMLReader< TextFeaturizer > | Read () |
Get the corresponding JavaMLReader instance. More... | |
Static Public Member Functions | |
static TextFeaturizer | Load (string path) |
Loads the TextFeaturizer that was previously saved using Save(string). More... | |
TextFeaturizer implements TextFeaturizer
|
inline |
Creates a TextFeaturizer without any parameters.
|
inline |
Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID.
uid | An immutable unique ID for the object and its derivatives. |
override PipelineModel Synapse.ML.Featurize.Text.TextFeaturizer.Fit | ( | DataFrame | dataset | ) |
Fits a model to the input data.
dataset | The DataFrame to fit the model to. |
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetBinary | ( | ) |
Gets binary value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetCaseSensitiveStopWords | ( | ) |
Gets caseSensitiveStopWords value
string Synapse.ML.Featurize.Text.TextFeaturizer.GetDefaultStopWordLanguage | ( | ) |
Gets defaultStopWordLanguage value
string Synapse.ML.Featurize.Text.TextFeaturizer.GetInputCol | ( | ) |
Gets inputCol value
int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinDocFreq | ( | ) |
Gets minDocFreq value
int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinTokenLength | ( | ) |
Gets minTokenLength value
int Synapse.ML.Featurize.Text.TextFeaturizer.GetNGramLength | ( | ) |
Gets nGramLength value
int Synapse.ML.Featurize.Text.TextFeaturizer.GetNumFeatures | ( | ) |
Gets numFeatures value
string Synapse.ML.Featurize.Text.TextFeaturizer.GetOutputCol | ( | ) |
Gets outputCol value
string Synapse.ML.Featurize.Text.TextFeaturizer.GetStopWords | ( | ) |
Gets stopWords value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerGaps | ( | ) |
Gets tokenizerGaps value
string Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerPattern | ( | ) |
Gets tokenizerPattern value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetToLowercase | ( | ) |
Gets toLowercase value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseIDF | ( | ) |
Gets useIDF value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseNGram | ( | ) |
Gets useNGram value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseStopWordsRemover | ( | ) |
Gets useStopWordsRemover value
bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseTokenizer | ( | ) |
Gets useTokenizer value
|
static |
Loads the TextFeaturizer that was previously saved using Save(string).
path | The path the previous TextFeaturizer was saved to |
JavaMLReader<TextFeaturizer> Synapse.ML.Featurize.Text.TextFeaturizer.Read | ( | ) |
Get the corresponding JavaMLReader instance.
void Synapse.ML.Featurize.Text.TextFeaturizer.Save | ( | string | path | ) |
Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala.
path | The path to save the object to |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetBinary | ( | bool | value | ) |
Sets value for binary
value | If true, all nonegative word counts are set to 1 |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetCaseSensitiveStopWords | ( | bool | value | ) |
Sets value for caseSensitiveStopWords
value | Whether to do a case sensitive comparison over the stop words |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetDefaultStopWordLanguage | ( | string | value | ) |
Sets value for defaultStopWordLanguage
value | Which language to use for the stop word remover, set this to custom to use the stopWords input |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetInputCol | ( | string | value | ) |
Sets value for inputCol
value | The name of the input column |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinDocFreq | ( | int | value | ) |
Sets value for minDocFreq
value | The minimum number of documents in which a term should appear. |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinTokenLength | ( | int | value | ) |
Sets value for minTokenLength
value | Minimum token length, >= 0. |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNGramLength | ( | int | value | ) |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNumFeatures | ( | int | value | ) |
Sets value for numFeatures
value | Set the number of features to hash each document to |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetOutputCol | ( | string | value | ) |
Sets value for outputCol
value | The name of the output column |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetStopWords | ( | string | value | ) |
Sets value for stopWords
value | The words to be filtered out. |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerGaps | ( | bool | value | ) |
Sets value for tokenizerGaps
value | Indicates whether regex splits on gaps (true) or matches tokens (false). |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerPattern | ( | string | value | ) |
Sets value for tokenizerPattern
value | Regex pattern used to match delimiters if gaps is true or tokens if gaps is false. |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetToLowercase | ( | bool | value | ) |
Sets value for toLowercase
value | Indicates whether to convert all characters to lowercase before tokenizing. |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseIDF | ( | bool | value | ) |
Sets value for useIDF
value | Whether to scale the Term Frequencies by IDF |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseNGram | ( | bool | value | ) |
Sets value for useNGram
value | Whether to enumerate N grams |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseStopWordsRemover | ( | bool | value | ) |
Sets value for useStopWordsRemover
value | Whether to remove stop words from tokenized data |
TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseTokenizer | ( | bool | value | ) |
Sets value for useTokenizer
value | Whether to tokenize the input |