TextFeaturizer implements TextFeaturizer More...

Inheritance diagram for Synapse.ML.Featurize.Text.TextFeaturizer:

[legend]

Collaboration diagram for Synapse.ML.Featurize.Text.TextFeaturizer:

[legend]

Public Member Functions
	TextFeaturizer ()
	Creates a TextFeaturizer without any parameters. More...

	TextFeaturizer (string uid)
	Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID. More...

TextFeaturizer	SetBinary (bool value)
	Sets value for binary More...

TextFeaturizer	SetCaseSensitiveStopWords (bool value)
	Sets value for caseSensitiveStopWords More...

TextFeaturizer	SetDefaultStopWordLanguage (string value)
	Sets value for defaultStopWordLanguage More...

TextFeaturizer	SetInputCol (string value)
	Sets value for inputCol More...

TextFeaturizer	SetMinDocFreq (int value)
	Sets value for minDocFreq More...

TextFeaturizer	SetMinTokenLength (int value)
	Sets value for minTokenLength More...

TextFeaturizer	SetNGramLength (int value)
	Sets value for nGramLength More...

TextFeaturizer	SetNumFeatures (int value)
	Sets value for numFeatures More...

TextFeaturizer	SetOutputCol (string value)
	Sets value for outputCol More...

TextFeaturizer	SetStopWords (string value)
	Sets value for stopWords More...

TextFeaturizer	SetToLowercase (bool value)
	Sets value for toLowercase More...

TextFeaturizer	SetTokenizerGaps (bool value)
	Sets value for tokenizerGaps More...

TextFeaturizer	SetTokenizerPattern (string value)
	Sets value for tokenizerPattern More...

TextFeaturizer	SetUseIDF (bool value)
	Sets value for useIDF More...

TextFeaturizer	SetUseNGram (bool value)
	Sets value for useNGram More...

TextFeaturizer	SetUseStopWordsRemover (bool value)
	Sets value for useStopWordsRemover More...

TextFeaturizer	SetUseTokenizer (bool value)
	Sets value for useTokenizer More...

bool	GetBinary ()
	Gets binary value More...

bool	GetCaseSensitiveStopWords ()
	Gets caseSensitiveStopWords value More...

string	GetDefaultStopWordLanguage ()
	Gets defaultStopWordLanguage value More...

string	GetInputCol ()
	Gets inputCol value More...

int	GetMinDocFreq ()
	Gets minDocFreq value More...

int	GetMinTokenLength ()
	Gets minTokenLength value More...

int	GetNGramLength ()
	Gets nGramLength value More...

int	GetNumFeatures ()
	Gets numFeatures value More...

string	GetOutputCol ()
	Gets outputCol value More...

string	GetStopWords ()
	Gets stopWords value More...

bool	GetToLowercase ()
	Gets toLowercase value More...

bool	GetTokenizerGaps ()
	Gets tokenizerGaps value More...

string	GetTokenizerPattern ()
	Gets tokenizerPattern value More...

bool	GetUseIDF ()
	Gets useIDF value More...

bool	GetUseNGram ()
	Gets useNGram value More...

bool	GetUseStopWordsRemover ()
	Gets useStopWordsRemover value More...

bool	GetUseTokenizer ()
	Gets useTokenizer value More...

override PipelineModel	Fit (DataFrame dataset)
	Fits a model to the input data. More...

void	Save (string path)
	Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala. More...

JavaMLWriter	Write ()
	Returns a JavaMLWriter instance for this ML instance.

JavaMLReader< TextFeaturizer >	Read ()
	Get the corresponding JavaMLReader instance. More...

Static Public Member Functions
static TextFeaturizer	Load (string path)
	Loads the TextFeaturizer that was previously saved using Save(string). More...

Detailed Description

TextFeaturizer implements TextFeaturizer

Constructor & Destructor Documentation

◆ TextFeaturizer() [1/2]

Synapse.ML.Featurize.Text.TextFeaturizer.TextFeaturizer ( )

inline

Creates a TextFeaturizer without any parameters.

◆ TextFeaturizer() [2/2]

Synapse.ML.Featurize.Text.TextFeaturizer.TextFeaturizer ( string uid )

inline

Creates a TextFeaturizer with a UID that is used to give the TextFeaturizer a unique ID.

Parameters

uid	An immutable unique ID for the object and its derivatives.

Member Function Documentation

◆ Fit()

override PipelineModel Synapse.ML.Featurize.Text.TextFeaturizer.Fit ( DataFrame dataset )

Fits a model to the input data.

Parameters

dataset The DataFrame to fit the model to.

Returns: PipelineModel

◆ GetBinary()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetBinary ( )

Gets binary value

Returns: binary: If true, all nonegative word counts are set to 1

◆ GetCaseSensitiveStopWords()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetCaseSensitiveStopWords ( )

Gets caseSensitiveStopWords value

Returns: caseSensitiveStopWords: Whether to do a case sensitive comparison over the stop words

◆ GetDefaultStopWordLanguage()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetDefaultStopWordLanguage ( )

Gets defaultStopWordLanguage value

Returns: defaultStopWordLanguage: Which language to use for the stop word remover, set this to custom to use the stopWords input

◆ GetInputCol()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetInputCol ( )

Gets inputCol value

Returns: inputCol: The name of the input column

◆ GetMinDocFreq()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinDocFreq ( )

Gets minDocFreq value

Returns: minDocFreq: The minimum number of documents in which a term should appear.

◆ GetMinTokenLength()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetMinTokenLength ( )

Gets minTokenLength value

Returns: minTokenLength: Minimum token length, >= 0.

◆ GetNGramLength()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetNGramLength ( )

Gets nGramLength value

Returns: nGramLength: The size of the Ngrams

◆ GetNumFeatures()

int Synapse.ML.Featurize.Text.TextFeaturizer.GetNumFeatures ( )

Gets numFeatures value

Returns: numFeatures: Set the number of features to hash each document to

◆ GetOutputCol()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetOutputCol ( )

Gets outputCol value

Returns: outputCol: The name of the output column

◆ GetStopWords()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetStopWords ( )

Gets stopWords value

Returns: stopWords: The words to be filtered out.

◆ GetTokenizerGaps()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerGaps ( )

Gets tokenizerGaps value

Returns: tokenizerGaps: Indicates whether regex splits on gaps (true) or matches tokens (false).

◆ GetTokenizerPattern()

string Synapse.ML.Featurize.Text.TextFeaturizer.GetTokenizerPattern ( )

Gets tokenizerPattern value

Returns: tokenizerPattern: Regex pattern used to match delimiters if gaps is true or tokens if gaps is false.

◆ GetToLowercase()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetToLowercase ( )

Gets toLowercase value

Returns: toLowercase: Indicates whether to convert all characters to lowercase before tokenizing.

◆ GetUseIDF()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseIDF ( )

Gets useIDF value

Returns: useIDF: Whether to scale the Term Frequencies by IDF

◆ GetUseNGram()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseNGram ( )

Gets useNGram value

Returns: useNGram: Whether to enumerate N grams

◆ GetUseStopWordsRemover()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseStopWordsRemover ( )

Gets useStopWordsRemover value

Returns: useStopWordsRemover: Whether to remove stop words from tokenized data

◆ GetUseTokenizer()

bool Synapse.ML.Featurize.Text.TextFeaturizer.GetUseTokenizer ( )

Gets useTokenizer value

Returns: useTokenizer: Whether to tokenize the input

◆ Load()

static TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.Load ( string path )

static

Loads the TextFeaturizer that was previously saved using Save(string).

Parameters

path	The path the previous TextFeaturizer was saved to

Returns: New TextFeaturizer object, loaded from path.

◆ Read()

JavaMLReader<TextFeaturizer> Synapse.ML.Featurize.Text.TextFeaturizer.Read ( )

Get the corresponding JavaMLReader instance.

Returns: an JavaMLReader<TextFeaturizer> instance for this ML instance.

◆ Save()

void Synapse.ML.Featurize.Text.TextFeaturizer.Save ( string path )

Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala.

Parameters

path	The path to save the object to

◆ SetBinary()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetBinary ( bool value )

Sets value for binary

Parameters

value If true, all nonegative word counts are set to 1

Returns: New TextFeaturizer object

◆ SetCaseSensitiveStopWords()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetCaseSensitiveStopWords ( bool value )

Sets value for caseSensitiveStopWords

Parameters

value Whether to do a case sensitive comparison over the stop words

Returns: New TextFeaturizer object

◆ SetDefaultStopWordLanguage()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetDefaultStopWordLanguage ( string value )

Sets value for defaultStopWordLanguage

Parameters

value Which language to use for the stop word remover, set this to custom to use the stopWords input

Returns: New TextFeaturizer object

◆ SetInputCol()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetInputCol ( string value )

Sets value for inputCol

Parameters

value The name of the input column

Returns: New TextFeaturizer object

◆ SetMinDocFreq()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinDocFreq ( int value )

Sets value for minDocFreq

Parameters

value The minimum number of documents in which a term should appear.

Returns: New TextFeaturizer object

◆ SetMinTokenLength()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetMinTokenLength ( int value )

Sets value for minTokenLength

Parameters

value Minimum token length, >= 0.

Returns: New TextFeaturizer object

◆ SetNGramLength()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNGramLength ( int value )

Sets value for nGramLength

Parameters

value The size of the Ngrams

Returns: New TextFeaturizer object

◆ SetNumFeatures()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetNumFeatures ( int value )

Sets value for numFeatures

Parameters

value Set the number of features to hash each document to

Returns: New TextFeaturizer object

◆ SetOutputCol()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetOutputCol ( string value )

Sets value for outputCol

Parameters

value The name of the output column

Returns: New TextFeaturizer object

◆ SetStopWords()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetStopWords ( string value )

Sets value for stopWords

Parameters

value The words to be filtered out.

Returns: New TextFeaturizer object

◆ SetTokenizerGaps()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerGaps ( bool value )

Sets value for tokenizerGaps

Parameters

value Indicates whether regex splits on gaps (true) or matches tokens (false).

Returns: New TextFeaturizer object

◆ SetTokenizerPattern()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetTokenizerPattern ( string value )

Sets value for tokenizerPattern

Parameters

value Regex pattern used to match delimiters if gaps is true or tokens if gaps is false.

Returns: New TextFeaturizer object

◆ SetToLowercase()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetToLowercase ( bool value )

Sets value for toLowercase

Parameters

value Indicates whether to convert all characters to lowercase before tokenizing.

Returns: New TextFeaturizer object

◆ SetUseIDF()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseIDF ( bool value )

Sets value for useIDF

Parameters

value Whether to scale the Term Frequencies by IDF

Returns: New TextFeaturizer object

◆ SetUseNGram()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseNGram ( bool value )

Sets value for useNGram

Parameters

value Whether to enumerate N grams

Returns: New TextFeaturizer object

◆ SetUseStopWordsRemover()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseStopWordsRemover ( bool value )

Sets value for useStopWordsRemover

Parameters

value Whether to remove stop words from tokenized data

Returns: New TextFeaturizer object

◆ SetUseTokenizer()

TextFeaturizer Synapse.ML.Featurize.Text.TextFeaturizer.SetUseTokenizer ( bool value )

Sets value for useTokenizer

Parameters

value Whether to tokenize the input

Returns: New TextFeaturizer object

The documentation for this class was generated from the following file:

synapse/ml/featurize/text/TextFeaturizer.cs

Public Member Functions

Static Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ TextFeaturizer() [1/2]

◆ TextFeaturizer() [2/2]

Member Function Documentation

◆ Fit()

◆ GetBinary()

◆ GetCaseSensitiveStopWords()

◆ GetDefaultStopWordLanguage()

◆ GetInputCol()

◆ GetMinDocFreq()

◆ GetMinTokenLength()

◆ GetNGramLength()

◆ GetNumFeatures()

◆ GetOutputCol()

◆ GetStopWords()

◆ GetTokenizerGaps()

◆ GetTokenizerPattern()

◆ GetToLowercase()

◆ GetUseIDF()

◆ GetUseNGram()

◆ GetUseStopWordsRemover()

◆ GetUseTokenizer()

◆ Load()

◆ Read()

◆ Save()

◆ SetBinary()

◆ SetCaseSensitiveStopWords()

◆ SetDefaultStopWordLanguage()

◆ SetInputCol()

◆ SetMinDocFreq()

◆ SetMinTokenLength()

◆ SetNGramLength()

◆ SetNumFeatures()

◆ SetOutputCol()

◆ SetStopWords()

◆ SetTokenizerGaps()

◆ SetTokenizerPattern()

◆ SetToLowercase()

◆ SetUseIDF()

◆ SetUseNGram()

◆ SetUseStopWordsRemover()

◆ SetUseTokenizer()