Synapseml  0.11.0
Public Member Functions | Static Public Member Functions | List of all members
Synapse.ML.Featurize.Text.TextFeaturizer Class Reference

TextFeaturizer implements TextFeaturizer More...

Inheritance diagram for Synapse.ML.Featurize.Text.TextFeaturizer:
Inheritance graph
[legend]
Collaboration diagram for Synapse.ML.Featurize.Text.TextFeaturizer:
Collaboration graph
[legend]

Public Member Functions

 TextFeaturizer ()
 Creates a TextFeaturizer without any parameters. More...
 
 TextFeaturizer (string uid)
 Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID. More...
 
TextFeaturizer SetBinary (bool value)
 Sets value for binary More...
 
TextFeaturizer SetCaseSensitiveStopWords (bool value)
 Sets value for caseSensitiveStopWords More...
 
TextFeaturizer SetDefaultStopWordLanguage (string value)
 Sets value for defaultStopWordLanguage More...
 
TextFeaturizer SetInputCol (string value)
 Sets value for inputCol More...
 
TextFeaturizer SetMinDocFreq (int value)
 Sets value for minDocFreq More...
 
TextFeaturizer SetMinTokenLength (int value)
 Sets value for minTokenLength More...
 
TextFeaturizer SetNGramLength (int value)
 Sets value for nGramLength More...
 
TextFeaturizer SetNumFeatures (int value)
 Sets value for numFeatures More...
 
TextFeaturizer SetOutputCol (string value)
 Sets value for outputCol More...
 
TextFeaturizer SetStopWords (string value)
 Sets value for stopWords More...
 
TextFeaturizer SetToLowercase (bool value)
 Sets value for toLowercase More...
 
TextFeaturizer SetTokenizerGaps (bool value)
 Sets value for tokenizerGaps More...
 
TextFeaturizer SetTokenizerPattern (string value)
 Sets value for tokenizerPattern More...
 
TextFeaturizer SetUseIDF (bool value)
 Sets value for useIDF More...
 
TextFeaturizer SetUseNGram (bool value)
 Sets value for useNGram More...
 
TextFeaturizer SetUseStopWordsRemover (bool value)
 Sets value for useStopWordsRemover More...
 
TextFeaturizer SetUseTokenizer (bool value)
 Sets value for useTokenizer More...
 
bool GetBinary ()
 Gets binary value More...
 
bool GetCaseSensitiveStopWords ()
 Gets caseSensitiveStopWords value More...
 
string GetDefaultStopWordLanguage ()
 Gets defaultStopWordLanguage value More...
 
string GetInputCol ()
 Gets inputCol value More...
 
int GetMinDocFreq ()
 Gets minDocFreq value More...
 
int GetMinTokenLength ()
 Gets minTokenLength value More...
 
int GetNGramLength ()
 Gets nGramLength value More...
 
int GetNumFeatures ()
 Gets numFeatures value More...
 
string GetOutputCol ()
 Gets outputCol value More...
 
string GetStopWords ()
 Gets stopWords value More...
 
bool GetToLowercase ()
 Gets toLowercase value More...
 
bool GetTokenizerGaps ()
 Gets tokenizerGaps value More...
 
string GetTokenizerPattern ()
 Gets tokenizerPattern value More...
 
bool GetUseIDF ()
 Gets useIDF value More...
 
bool GetUseNGram ()
 Gets useNGram value More...
 
bool GetUseStopWordsRemover ()
 Gets useStopWordsRemover value More...
 
bool GetUseTokenizer ()
 Gets useTokenizer value More...
 
override PipelineModel Fit (DataFrame dataset)
 Fits a model to the input data. More...
 
void Save (string path)
 Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala. More...
 
JavaMLWriter Write ()
 
Returns
a JavaMLWriter instance for this ML instance.

 
JavaMLReader< TextFeaturizerRead ()
 Get the corresponding JavaMLReader instance. More...
 

Static Public Member Functions

static TextFeaturizer Load (string path)
 Loads the TextFeaturizer that was previously saved using Save(string). More...
 

Detailed Description

TextFeaturizer implements TextFeaturizer

Constructor & Destructor Documentation

◆ TextFeaturizer() [1/2]

Synapse.ML.Featurize.Text.TextFeaturizer.TextFeaturizer ( )
inline

Creates a TextFeaturizer without any parameters.

◆ TextFeaturizer() [2/2]

Synapse.ML.Featurize.Text.TextFeaturizer.TextFeaturizer ( string  uid)
inline

Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID.

Parameters
uidAn immutable unique ID for the object and its derivatives.

Member Function Documentation

◆ Fit()

override PipelineModel Synapse.ML.Featurize.Text.TextFeaturizer.Fit ( DataFrame  dataset)

Fits a model to the input data.

Parameters
datasetThe DataFrame to fit the model to.
Returns
PipelineModel

◆ GetBinary()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetBinary ( )

Gets binary value

Returns
binary: If true, all nonegative word counts are set to 1

◆ GetCaseSensitiveStopWords()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetCaseSensitiveStopWords ( )

Gets caseSensitiveStopWords value

Returns
caseSensitiveStopWords: Whether to do a case sensitive comparison over the stop words

◆ GetDefaultStopWordLanguage()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetDefaultStopWordLanguage ( )

Gets defaultStopWordLanguage value

Returns
defaultStopWordLanguage: Which language to use for the stop word remover, set this to custom to use the stopWords input

◆ GetInputCol()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetInputCol ( )

Gets inputCol value

Returns
inputCol: The name of the input column

◆ GetMinDocFreq()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinDocFreq ( )

Gets minDocFreq value

Returns
minDocFreq: The minimum number of documents in which a term should appear.

◆ GetMinTokenLength()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinTokenLength ( )

Gets minTokenLength value

Returns
minTokenLength: Minimum token length, >= 0.

◆ GetNGramLength()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetNGramLength ( )

Gets nGramLength value

Returns
nGramLength: The size of the Ngrams

◆ GetNumFeatures()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetNumFeatures ( )

Gets numFeatures value

Returns
numFeatures: Set the number of features to hash each document to

◆ GetOutputCol()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetOutputCol ( )

Gets outputCol value

Returns
outputCol: The name of the output column

◆ GetStopWords()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetStopWords ( )

Gets stopWords value

Returns
stopWords: The words to be filtered out.

◆ GetTokenizerGaps()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerGaps ( )

Gets tokenizerGaps value

Returns
tokenizerGaps: Indicates whether regex splits on gaps (true) or matches tokens (false).

◆ GetTokenizerPattern()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerPattern ( )

Gets tokenizerPattern value

Returns
tokenizerPattern: Regex pattern used to match delimiters if gaps is true or tokens if gaps is false.

◆ GetToLowercase()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetToLowercase ( )

Gets toLowercase value

Returns
toLowercase: Indicates whether to convert all characters to lowercase before tokenizing.

◆ GetUseIDF()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseIDF ( )

Gets useIDF value

Returns
useIDF: Whether to scale the Term Frequencies by IDF

◆ GetUseNGram()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseNGram ( )

Gets useNGram value

Returns
useNGram: Whether to enumerate N grams

◆ GetUseStopWordsRemover()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseStopWordsRemover ( )

Gets useStopWordsRemover value

Returns
useStopWordsRemover: Whether to remove stop words from tokenized data

◆ GetUseTokenizer()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseTokenizer ( )

Gets useTokenizer value

Returns
useTokenizer: Whether to tokenize the input

◆ Load()

static TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.Load ( string  path)
static

Loads the TextFeaturizer that was previously saved using Save(string).

Parameters
pathThe path the previous TextFeaturizer was saved to
Returns
New TextFeaturizer object, loaded from path.

◆ Read()

JavaMLReader<TextFeaturizer> Synapse.ML.Featurize.Text.TextFeaturizer.Read ( )

Get the corresponding JavaMLReader instance.

Returns
an JavaMLReader<TextFeaturizer> instance for this ML instance.

◆ Save()

void Synapse.ML.Featurize.Text.TextFeaturizer.Save ( string  path)

Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala.

Parameters
pathThe path to save the object to

◆ SetBinary()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetBinary ( bool  value)

Sets value for binary

Parameters
valueIf true, all nonegative word counts are set to 1
Returns
New TextFeaturizer object

◆ SetCaseSensitiveStopWords()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetCaseSensitiveStopWords ( bool  value)

Sets value for caseSensitiveStopWords

Parameters
valueWhether to do a case sensitive comparison over the stop words
Returns
New TextFeaturizer object

◆ SetDefaultStopWordLanguage()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetDefaultStopWordLanguage ( string  value)

Sets value for defaultStopWordLanguage

Parameters
valueWhich language to use for the stop word remover, set this to custom to use the stopWords input
Returns
New TextFeaturizer object

◆ SetInputCol()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetInputCol ( string  value)

Sets value for inputCol

Parameters
valueThe name of the input column
Returns
New TextFeaturizer object

◆ SetMinDocFreq()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinDocFreq ( int  value)

Sets value for minDocFreq

Parameters
valueThe minimum number of documents in which a term should appear.
Returns
New TextFeaturizer object

◆ SetMinTokenLength()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinTokenLength ( int  value)

Sets value for minTokenLength

Parameters
valueMinimum token length, >= 0.
Returns
New TextFeaturizer object

◆ SetNGramLength()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNGramLength ( int  value)

Sets value for nGramLength

Parameters
valueThe size of the Ngrams
Returns
New TextFeaturizer object

◆ SetNumFeatures()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNumFeatures ( int  value)

Sets value for numFeatures

Parameters
valueSet the number of features to hash each document to
Returns
New TextFeaturizer object

◆ SetOutputCol()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetOutputCol ( string  value)

Sets value for outputCol

Parameters
valueThe name of the output column
Returns
New TextFeaturizer object

◆ SetStopWords()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetStopWords ( string  value)

Sets value for stopWords

Parameters
valueThe words to be filtered out.
Returns
New TextFeaturizer object

◆ SetTokenizerGaps()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerGaps ( bool  value)

Sets value for tokenizerGaps

Parameters
valueIndicates whether regex splits on gaps (true) or matches tokens (false).
Returns
New TextFeaturizer object

◆ SetTokenizerPattern()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerPattern ( string  value)

Sets value for tokenizerPattern

Parameters
valueRegex pattern used to match delimiters if gaps is true or tokens if gaps is false.
Returns
New TextFeaturizer object

◆ SetToLowercase()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetToLowercase ( bool  value)

Sets value for toLowercase

Parameters
valueIndicates whether to convert all characters to lowercase before tokenizing.
Returns
New TextFeaturizer object

◆ SetUseIDF()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseIDF ( bool  value)

Sets value for useIDF

Parameters
valueWhether to scale the Term Frequencies by IDF
Returns
New TextFeaturizer object

◆ SetUseNGram()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseNGram ( bool  value)

Sets value for useNGram

Parameters
valueWhether to enumerate N grams
Returns
New TextFeaturizer object

◆ SetUseStopWordsRemover()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseStopWordsRemover ( bool  value)

Sets value for useStopWordsRemover

Parameters
valueWhether to remove stop words from tokenized data
Returns
New TextFeaturizer object

◆ SetUseTokenizer()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseTokenizer ( bool  value)

Sets value for useTokenizer

Parameters
valueWhether to tokenize the input
Returns
New TextFeaturizer object

The documentation for this class was generated from the following file: