|
Synapseml
0.10.0
|
TextFeaturizer implements TextFeaturizer More...


Public Member Functions | |
| TextFeaturizer () | |
| Creates a TextFeaturizer without any parameters. More... | |
| TextFeaturizer (string uid) | |
| Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID. More... | |
| TextFeaturizer | SetBinary (bool value) |
| Sets value for binary More... | |
| TextFeaturizer | SetCaseSensitiveStopWords (bool value) |
| Sets value for caseSensitiveStopWords More... | |
| TextFeaturizer | SetDefaultStopWordLanguage (string value) |
| Sets value for defaultStopWordLanguage More... | |
| TextFeaturizer | SetInputCol (string value) |
| Sets value for inputCol More... | |
| TextFeaturizer | SetMinDocFreq (int value) |
| Sets value for minDocFreq More... | |
| TextFeaturizer | SetMinTokenLength (int value) |
| Sets value for minTokenLength More... | |
| TextFeaturizer | SetNGramLength (int value) |
| Sets value for nGramLength More... | |
| TextFeaturizer | SetNumFeatures (int value) |
| Sets value for numFeatures More... | |
| TextFeaturizer | SetOutputCol (string value) |
| Sets value for outputCol More... | |
| TextFeaturizer | SetStopWords (string value) |
| Sets value for stopWords More... | |
| TextFeaturizer | SetToLowercase (bool value) |
| Sets value for toLowercase More... | |
| TextFeaturizer | SetTokenizerGaps (bool value) |
| Sets value for tokenizerGaps More... | |
| TextFeaturizer | SetTokenizerPattern (string value) |
| Sets value for tokenizerPattern More... | |
| TextFeaturizer | SetUseIDF (bool value) |
| Sets value for useIDF More... | |
| TextFeaturizer | SetUseNGram (bool value) |
| Sets value for useNGram More... | |
| TextFeaturizer | SetUseStopWordsRemover (bool value) |
| Sets value for useStopWordsRemover More... | |
| TextFeaturizer | SetUseTokenizer (bool value) |
| Sets value for useTokenizer More... | |
| bool | GetBinary () |
| Gets binary value More... | |
| bool | GetCaseSensitiveStopWords () |
| Gets caseSensitiveStopWords value More... | |
| string | GetDefaultStopWordLanguage () |
| Gets defaultStopWordLanguage value More... | |
| string | GetInputCol () |
| Gets inputCol value More... | |
| int | GetMinDocFreq () |
| Gets minDocFreq value More... | |
| int | GetMinTokenLength () |
| Gets minTokenLength value More... | |
| int | GetNGramLength () |
| Gets nGramLength value More... | |
| int | GetNumFeatures () |
| Gets numFeatures value More... | |
| string | GetOutputCol () |
| Gets outputCol value More... | |
| string | GetStopWords () |
| Gets stopWords value More... | |
| bool | GetToLowercase () |
| Gets toLowercase value More... | |
| bool | GetTokenizerGaps () |
| Gets tokenizerGaps value More... | |
| string | GetTokenizerPattern () |
| Gets tokenizerPattern value More... | |
| bool | GetUseIDF () |
| Gets useIDF value More... | |
| bool | GetUseNGram () |
| Gets useNGram value More... | |
| bool | GetUseStopWordsRemover () |
| Gets useStopWordsRemover value More... | |
| bool | GetUseTokenizer () |
| Gets useTokenizer value More... | |
| override PipelineModel | Fit (DataFrame dataset) |
| Fits a model to the input data. More... | |
| void | Save (string path) |
| Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala. More... | |
| JavaMLWriter | Write () |
| |
| JavaMLReader< TextFeaturizer > | Read () |
| Get the corresponding JavaMLReader instance. More... | |
Static Public Member Functions | |
| static TextFeaturizer | Load (string path) |
| Loads the TextFeaturizer that was previously saved using Save(string). More... | |
TextFeaturizer implements TextFeaturizer
|
inline |
Creates a TextFeaturizer without any parameters.
|
inline |
Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID.
| uid | An immutable unique ID for the object and its derivatives. |
| override PipelineModel Synapse.ML.Featurize.Text.TextFeaturizer.Fit | ( | DataFrame | dataset | ) |
Fits a model to the input data.
| dataset | The DataFrame to fit the model to. |
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetBinary | ( | ) |
Gets binary value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetCaseSensitiveStopWords | ( | ) |
Gets caseSensitiveStopWords value
| string Synapse.ML.Featurize.Text.TextFeaturizer.GetDefaultStopWordLanguage | ( | ) |
Gets defaultStopWordLanguage value
| string Synapse.ML.Featurize.Text.TextFeaturizer.GetInputCol | ( | ) |
Gets inputCol value
| int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinDocFreq | ( | ) |
Gets minDocFreq value
| int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinTokenLength | ( | ) |
Gets minTokenLength value
| int Synapse.ML.Featurize.Text.TextFeaturizer.GetNGramLength | ( | ) |
Gets nGramLength value
| int Synapse.ML.Featurize.Text.TextFeaturizer.GetNumFeatures | ( | ) |
Gets numFeatures value
| string Synapse.ML.Featurize.Text.TextFeaturizer.GetOutputCol | ( | ) |
Gets outputCol value
| string Synapse.ML.Featurize.Text.TextFeaturizer.GetStopWords | ( | ) |
Gets stopWords value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerGaps | ( | ) |
Gets tokenizerGaps value
| string Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerPattern | ( | ) |
Gets tokenizerPattern value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetToLowercase | ( | ) |
Gets toLowercase value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseIDF | ( | ) |
Gets useIDF value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseNGram | ( | ) |
Gets useNGram value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseStopWordsRemover | ( | ) |
Gets useStopWordsRemover value
| bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseTokenizer | ( | ) |
Gets useTokenizer value
|
static |
Loads the TextFeaturizer that was previously saved using Save(string).
| path | The path the previous TextFeaturizer was saved to |
| JavaMLReader<TextFeaturizer> Synapse.ML.Featurize.Text.TextFeaturizer.Read | ( | ) |
Get the corresponding JavaMLReader instance.
| void Synapse.ML.Featurize.Text.TextFeaturizer.Save | ( | string | path | ) |
Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala.
| path | The path to save the object to |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetBinary | ( | bool | value | ) |
Sets value for binary
| value | If true, all nonegative word counts are set to 1 |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetCaseSensitiveStopWords | ( | bool | value | ) |
Sets value for caseSensitiveStopWords
| value | Whether to do a case sensitive comparison over the stop words |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetDefaultStopWordLanguage | ( | string | value | ) |
Sets value for defaultStopWordLanguage
| value | Which language to use for the stop word remover, set this to custom to use the stopWords input |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetInputCol | ( | string | value | ) |
Sets value for inputCol
| value | The name of the input column |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinDocFreq | ( | int | value | ) |
Sets value for minDocFreq
| value | The minimum number of documents in which a term should appear. |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinTokenLength | ( | int | value | ) |
Sets value for minTokenLength
| value | Minimum token length, >= 0. |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNGramLength | ( | int | value | ) |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNumFeatures | ( | int | value | ) |
Sets value for numFeatures
| value | Set the number of features to hash each document to |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetOutputCol | ( | string | value | ) |
Sets value for outputCol
| value | The name of the output column |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetStopWords | ( | string | value | ) |
Sets value for stopWords
| value | The words to be filtered out. |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerGaps | ( | bool | value | ) |
Sets value for tokenizerGaps
| value | Indicates whether regex splits on gaps (true) or matches tokens (false). |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerPattern | ( | string | value | ) |
Sets value for tokenizerPattern
| value | Regex pattern used to match delimiters if gaps is true or tokens if gaps is false. |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetToLowercase | ( | bool | value | ) |
Sets value for toLowercase
| value | Indicates whether to convert all characters to lowercase before tokenizing. |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseIDF | ( | bool | value | ) |
Sets value for useIDF
| value | Whether to scale the Term Frequencies by IDF |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseNGram | ( | bool | value | ) |
Sets value for useNGram
| value | Whether to enumerate N grams |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseStopWordsRemover | ( | bool | value | ) |
Sets value for useStopWordsRemover
| value | Whether to remove stop words from tokenized data |
| TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseTokenizer | ( | bool | value | ) |
Sets value for useTokenizer
| value | Whether to tokenize the input |
1.8.13