mmlspark.cognitive package

Submodules

mmlspark.cognitive.AddDocuments module

class mmlspark.cognitive.AddDocuments.AddDocuments(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)

  • batchSize (int) – The max size of the buffer (default: 100)

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • indexName (str) –

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • serviceName (str) –

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getActionCol()[source]
Returns

You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)

Return type

str

getBatchSize()[source]
Returns

The max size of the buffer (default: 100)

Return type

int

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getIndexName()[source]
Returns

Return type

str

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getServiceName()[source]
Returns

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setActionCol(value)[source]
Parameters

actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)

setBatchSize(value)[source]
Parameters

batchSize (int) – The max size of the buffer (default: 100)

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setIndexName(value)[source]
Parameters

indexName (str) –

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch. upload: An upload action is similar to an ‘upsert’ where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case. merge: Merge updates an existing document with the specified fields. If the document doesn’t exist, the merge will fail. Any field you specify in a merge will replace the existing field in the document. This includes fields of type Collection(Edm.String). For example, if the document contains a field ‘tags’ with value [‘budget’] and you execute a merge with value [‘economy’, ‘pool’] for ‘tags’, the final value of the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’]. mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document. delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)

  • batchSize (int) – The max size of the buffer (default: 100)

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • indexName (str) –

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • serviceName (str) –

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setServiceName(value)[source]
Parameters

serviceName (str) –

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.AnalyzeImage module

class mmlspark.cognitive.AnalyzeImage.AnalyzeImage(concurrency=1, concurrentTimeout=100.0, details=None, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None, visualFeatures=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • details (object) – what visual feature types to return

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language of the response (en if none given) (default: ServiceParamData(None,Some(en)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

  • visualFeatures (object) – what visual feature types to return

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getDetails()[source]
Returns

what visual feature types to return

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language of the response (en if none given) (default: ServiceParamData(None,Some(en)))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

getVisualFeatures()[source]
Returns

what visual feature types to return

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setDetails(value)[source]
Parameters

details (object) – what visual feature types to return

setDetailsCol(value)[source]
Parameters

details (object) – what visual feature types to return

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLanguage(value)[source]
Parameters

language (object) – the language of the response (en if none given) (default: ServiceParamData(None,Some(en)))

setLanguageCol(value)[source]
Parameters

language (object) – the language of the response (en if none given) (default: ServiceParamData(None,Some(en)))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, details=None, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None, visualFeatures=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • details (object) – what visual feature types to return

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language of the response (en if none given) (default: ServiceParamData(None,Some(en)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

  • visualFeatures (object) – what visual feature types to return

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

setVisualFeatures(value)[source]
Parameters

visualFeatures (object) – what visual feature types to return

setVisualFeaturesCol(value)[source]
Parameters

visualFeatures (object) – what visual feature types to return

mmlspark.cognitive.AzureSearchWriter module

mmlspark.cognitive.AzureSearchWriter.streamToAzureSearch(df, **options)[source]
mmlspark.cognitive.AzureSearchWriter.writeToAzureSearch(df, **options)[source]

mmlspark.cognitive.BingImageSearch module

class mmlspark.cognitive.BingImageSearch.BingImageSearch(aspect=None, color=None, concurrency=1, concurrentTimeout=100.0, count=None, errorCol=None, freshness=None, handler=None, height=None, imageContent=None, imageType=None, license=None, maxFileSize=None, maxHeight=None, maxWidth=None, minFileSize=None, minHeight=None, minWidth=None, mkt=None, offset=None, outputCol=None, q=None, size=None, subscriptionKey=None, timeout=60.0, url='https://api.cognitive.microsoft.com/bing/v7.0/images/search', width=None)[source]

Bases: mmlspark.cognitive._BingImageSearch._BingImageSearch

static downloadFromUrls(pathCol, bytesCol, concurrency, timeout)[source]
static getUrlTransformer(imageCol, urlCol)[source]
setMarket(value)[source]
setMarketCol(value)[source]
setQuery(value)[source]
setQueryCol(value)[source]

mmlspark.cognitive.DescribeImage module

class mmlspark.cognitive.DescribeImage.DescribeImage(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, maxCandidates=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – Language of image description (default: ServiceParamData(None,Some(en)))

  • maxCandidates (object) – Maximum candidate descriptions to return (default: ServiceParamData(None,Some(1)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Language of image description (default: ServiceParamData(None,Some(en)))

Return type

object

getMaxCandidates()[source]
Returns

Maximum candidate descriptions to return (default: ServiceParamData(None,Some(1)))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLanguage(value)[source]
Parameters

language (object) – Language of image description (default: ServiceParamData(None,Some(en)))

setLanguageCol(value)[source]
Parameters

language (object) – Language of image description (default: ServiceParamData(None,Some(en)))

setLocation(value)[source]
setMaxCandidates(value)[source]
Parameters

maxCandidates (object) – Maximum candidate descriptions to return (default: ServiceParamData(None,Some(1)))

setMaxCandidatesCol(value)[source]
Parameters

maxCandidates (object) – Maximum candidate descriptions to return (default: ServiceParamData(None,Some(1)))

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, maxCandidates=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – Language of image description (default: ServiceParamData(None,Some(en)))

  • maxCandidates (object) – Maximum candidate descriptions to return (default: ServiceParamData(None,Some(1)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.DetectAnomalies module

class mmlspark.cognitive.DetectAnomalies.DetectAnomalies(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

object

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

object

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setCustomInterval(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setGranularity(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setPeriod(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.DetectFace module

class mmlspark.cognitive.DetectFace.DetectFace(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageUrl=None, outputCol=None, returnFaceAttributes=None, returnFaceId=None, returnFaceLandmarks=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

  • returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

  • returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getReturnFaceAttributes()[source]
Returns

Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

Return type

object

getReturnFaceId()[source]
Returns

Return faceIds of the detected faces or not. The default value is true

Return type

object

getReturnFaceLandmarks()[source]
Returns

Return face landmarks of the detected faces or not. The default value is false.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageUrl=None, outputCol=None, returnFaceAttributes=None, returnFaceId=None, returnFaceLandmarks=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

  • returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

  • returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setReturnFaceAttributes(value)[source]
Parameters

returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceAttributesCol(value)[source]
Parameters

returnFaceAttributes (object) – Analyze and return the one or more specified face attributes Supported face attributes include: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. Face attribute analysis has additional computational and time cost.

setReturnFaceId(value)[source]
Parameters

returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

setReturnFaceIdCol(value)[source]
Parameters

returnFaceId (object) – Return faceIds of the detected faces or not. The default value is true

setReturnFaceLandmarks(value)[source]
Parameters

returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

setReturnFaceLandmarksCol(value)[source]
Parameters

returnFaceLandmarks (object) – Return face landmarks of the detected faces or not. The default value is false.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.DetectLastAnomaly module

class mmlspark.cognitive.DetectLastAnomaly.DetectLastAnomaly(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

object

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

object

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setCustomInterval(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setGranularity(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setPeriod(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.EntityDetector module

class mmlspark.cognitive.EntityDetector.EntityDetector(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getText()[source]
Returns

the text in the request body (default: ServiceParamData(Some(Right(text)),None))

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLanguageCol(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setText(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTextCol(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.FindSimilarFace module

class mmlspark.cognitive.FindSimilarFace.FindSimilarFace(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceId=None, faceIds=None, faceListId=None, handler=None, largeFaceListId=None, maxNumOfCandidatesReturned=None, mode=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

  • faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

  • mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getFaceId()[source]
Returns

faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

Return type

object

getFaceIds()[source]
Returns

An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

object

getFaceListId()[source]
Returns

An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLargeFaceListId()[source]
Returns

An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

Return type

object

getMaxNumOfCandidatesReturned()[source]
Returns

Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

Return type

object

getMode()[source]
Returns

Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setFaceId(value)[source]
Parameters

faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIdCol(value)[source]
Parameters

faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

setFaceIds(value)[source]
Parameters

faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceIdsCol(value)[source]
Parameters

faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListId(value)[source]
Parameters

faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setFaceListIdCol(value)[source]
Parameters

faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLargeFaceListId(value)[source]
Parameters

largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLargeFaceListIdCol(value)[source]
Parameters

largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

setMode(value)[source]
Parameters

mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setModeCol(value)[source]
Parameters

mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceId=None, faceIds=None, faceListId=None, handler=None, largeFaceListId=None, maxNumOfCandidatesReturned=None, mode=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceId (object) – faceId of the query face. User needs to call FaceDetect first to get a valid faceId. Note that this faceId is not persisted and will expire 24 hours after the detection call.

  • faceIds (object) – An array of candidate faceIds. All of them are created by FaceDetect and the faceIds will expire 24 hours after the detection call. The number of faceIds is limited to 1000. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • faceListId (object) – An existing user-specified unique candidate face list, created in FaceList - Create. Face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largeFaceListId (object) – An existing user-specified unique candidate large face list, created in LargeFaceList - Create. Large face list contains a set of persistedFaceIds which are persisted and will never expire. Parameter faceListId, largeFaceListId and faceIds should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – Optional parameter. The number of top similar faces returned. The valid range is [1, 1000].It defaults to 20.

  • mode (object) – Optional parameter. Similar face searching mode. It can be ‘matchPerson’ or ‘matchFace’. It defaults to ‘matchPerson’.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.GenerateThumbnails module

class mmlspark.cognitive.GenerateThumbnails.GenerateThumbnails(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, height=None, imageBytes=None, imageUrl=None, outputCol=None, smartCropping=None, subscriptionKey=None, timeout=60.0, url=None, width=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • height (object) – the desired height of the image

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • smartCropping (object) – whether to intelligently crop the image

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

  • width (object) – the desired width of the image

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getHeight()[source]
Returns

the desired height of the image

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSmartCropping()[source]
Returns

whether to intelligently crop the image

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

getWidth()[source]
Returns

the desired width of the image

Return type

object

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setHeight(value)[source]
Parameters

height (object) – the desired height of the image

setHeightCol(value)[source]
Parameters

height (object) – the desired height of the image

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, height=None, imageBytes=None, imageUrl=None, outputCol=None, smartCropping=None, subscriptionKey=None, timeout=60.0, url=None, width=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • height (object) – the desired height of the image

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • smartCropping (object) – whether to intelligently crop the image

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

  • width (object) – the desired width of the image

setSmartCropping(value)[source]
Parameters

smartCropping (object) – whether to intelligently crop the image

setSmartCroppingCol(value)[source]
Parameters

smartCropping (object) – whether to intelligently crop the image

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

setWidth(value)[source]
Parameters

width (object) – the desired width of the image

setWidthCol(value)[source]
Parameters

width (object) – the desired width of the image

mmlspark.cognitive.GroupFaces module

class mmlspark.cognitive.GroupFaces.GroupFaces(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceIds=None, handler=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getFaceIds()[source]
Returns

Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setFaceIds(value)[source]
Parameters

faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setFaceIdsCol(value)[source]
Parameters

faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceIds=None, handler=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceIds (object) – Array of candidate faceId created by Face - Detect. The maximum is 1000 faces.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.IdentifyFaces module

class mmlspark.cognitive.IdentifyFaces.IdentifyFaces(concurrency=1, concurrentTimeout=100.0, confidenceThreshold=None, errorCol=None, faceIds=None, handler=None, largePersonGroupId=None, maxNumOfCandidatesReturned=None, outputCol=None, personGroupId=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getConfidenceThreshold()[source]
Returns

Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getFaceIds()[source]
Returns

Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

object

getMaxNumOfCandidatesReturned()[source]
Returns

The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPersonGroupId()[source]
Returns

personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setConfidenceThreshold(value)[source]
Parameters

confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setConfidenceThresholdCol(value)[source]
Parameters

confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setFaceIds(value)[source]
Parameters

faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setFaceIdsCol(value)[source]
Parameters

faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLocation(value)[source]
setMaxNumOfCandidatesReturned(value)[source]
Parameters

maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setMaxNumOfCandidatesReturnedCol(value)[source]
Parameters

maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, confidenceThreshold=None, errorCol=None, faceIds=None, handler=None, largePersonGroupId=None, maxNumOfCandidatesReturned=None, outputCol=None, personGroupId=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • confidenceThreshold (object) – Optional parameter.Customized identification confidence threshold, in the range of [0, 1].Advanced user can tweak this value to override defaultinternal threshold for better precision on their scenario data.Note there is no guarantee of this threshold value workingon other data and after algorithm updates.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceIds (object) – Array of query faces faceIds, created by the Face - Detect. Each of the faces are identified independently. The valid number of faceIds is between [1, 10].

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largePersonGroupId (object) – largePersonGroupId of the target large person group, created by LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • maxNumOfCandidatesReturned (object) – The range of maxNumOfCandidatesReturned is between 1 and 100 (default is 10).

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setPersonGroupId(value)[source]
Parameters

personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId (object) – personGroupId of the target person group, created by PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.KeyPhraseExtractor module

class mmlspark.cognitive.KeyPhraseExtractor.KeyPhraseExtractor(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getText()[source]
Returns

the text in the request body (default: ServiceParamData(Some(Right(text)),None))

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLanguageCol(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setText(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTextCol(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.LanguageDetector module

class mmlspark.cognitive.LanguageDetector.LanguageDetector(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getText()[source]
Returns

the text in the request body (default: ServiceParamData(Some(Right(text)),None))

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLanguageCol(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setText(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTextCol(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.NER module

class mmlspark.cognitive.NER.NER(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getText()[source]
Returns

the text in the request body (default: ServiceParamData(Some(Right(text)),None))

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLanguageCol(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setText(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTextCol(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.OCR module

class mmlspark.cognitive.OCR.OCR(concurrency=1, concurrentTimeout=100.0, detectOrientation=None, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • detectOrientation (object) – whether to detect image orientation prior to processing

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getDetectOrientation()[source]
Returns

whether to detect image orientation prior to processing

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language to use

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setDetectOrientation(value)[source]
Parameters

detectOrientation (object) – whether to detect image orientation prior to processing

setDetectOrientationCol(value)[source]
Parameters

detectOrientation (object) – whether to detect image orientation prior to processing

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLanguage(value)[source]
Parameters

language (object) – the language to use

setLanguageCol(value)[source]
Parameters

language (object) – the language to use

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, detectOrientation=None, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • detectOrientation (object) – whether to detect image orientation prior to processing

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – the language to use

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.RecognizeDomainSpecificContent module

class mmlspark.cognitive.RecognizeDomainSpecificContent.RecognizeDomainSpecificContent(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, model=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • model (object) – the domain specific model: celebrities, landmarks

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getModel()[source]
Returns

the domain specific model: celebrities, landmarks

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLocation(value)[source]
setModel(value)[source]
Parameters

model (object) – the domain specific model: celebrities, landmarks

setModelCol(value)[source]
Parameters

model (object) – the domain specific model: celebrities, landmarks

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, model=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • model (object) – the domain specific model: celebrities, landmarks

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.RecognizeText module

class mmlspark.cognitive.RecognizeText.RecognizeText(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=100.0, errorCol=None, imageBytes=None, imageUrl=None, maxPollingRetries=1000, mode=None, outputCol=None, pollingDelay=300, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • backoffs (list) – array of backoffs to use in the handler (default: [I@1b3fa44)

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • maxPollingRetries (int) – number of times to poll (default: 1000)

  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • pollingDelay (int) – number of milliseconds to wait between polling (default: 300)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getBackoffs()[source]
Returns

array of backoffs to use in the handler (default: [I@1b3fa44)

Return type

list

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns

number of times to poll (default: 1000)

Return type

int

getMode()[source]
Returns

If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPollingDelay()[source]
Returns

number of milliseconds to wait between polling (default: 300)

Return type

int

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters

backoffs (list) – array of backoffs to use in the handler (default: [I@1b3fa44)

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLocation(value)[source]
setMaxPollingRetries(value)[source]
Parameters

maxPollingRetries (int) – number of times to poll (default: 1000)

setMode(value)[source]
Parameters

mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setModeCol(value)[source]
Parameters

mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=100.0, errorCol=None, imageBytes=None, imageUrl=None, maxPollingRetries=1000, mode=None, outputCol=None, pollingDelay=300, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • backoffs (list) – array of backoffs to use in the handler (default: [I@1b3fa44)

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • maxPollingRetries (int) – number of times to poll (default: 1000)

  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • pollingDelay (int) – number of milliseconds to wait between polling (default: 300)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setPollingDelay(value)[source]
Parameters

pollingDelay (int) – number of milliseconds to wait between polling (default: 300)

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.SimpleDetectAnomalies module

class mmlspark.cognitive.SimpleDetectAnomalies.SimpleDetectAnomalies(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, groupbyCol=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • groupbyCol (str) – column that groups the series

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • timestampCol (str) – column representing the time of the series (default: timestamp)

  • url (str) – Url of the service

  • valueCol (str) – column representing the value of the series (default: value)

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getCustomInterval()[source]
Returns

Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

Return type

object

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getGranularity()[source]
Returns

Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

Return type

object

getGroupbyCol()[source]
Returns

column that groups the series

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getMaxAnomalyRatio()[source]
Returns

Optional argument, advanced model parameter, max anomaly ratio in a time series.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPeriod()[source]
Returns

Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

Return type

object

getSensitivity()[source]
Returns

Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

Return type

object

getSeries()[source]
Returns

Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getTimestampCol()[source]
Returns

column representing the time of the series (default: timestamp)

Return type

str

getUrl()[source]
Returns

Url of the service

Return type

str

getValueCol()[source]
Returns

column representing the value of the series (default: value)

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setCustomInterval(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setCustomIntervalCol(value)[source]
Parameters

customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setGranularity(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGranularityCol(value)[source]
Parameters

granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

setGroupbyCol(value)[source]
Parameters

groupbyCol (str) – column that groups the series

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLocation(value)[source]
setMaxAnomalyRatio(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setMaxAnomalyRatioCol(value)[source]
Parameters

maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, customInterval=None, errorCol=None, granularity=None, groupbyCol=None, handler=None, maxAnomalyRatio=None, outputCol=None, period=None, sensitivity=None, series=None, subscriptionKey=None, timeout=60.0, timestampCol='timestamp', url=None, valueCol='value')[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • customInterval (object) – Custom Interval is used to set non-standard time interval, for example, if the series is 5 minutes, request can be set as granularity=minutely, customInterval=5.

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • granularity (object) – Can only be one of yearly, monthly, weekly, daily, hourly or minutely. Granularity is used for verify whether input series is valid.

  • groupbyCol (str) – column that groups the series

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • maxAnomalyRatio (object) – Optional argument, advanced model parameter, max anomaly ratio in a time series.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

  • sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

  • series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • timestampCol (str) – column representing the time of the series (default: timestamp)

  • url (str) – Url of the service

  • valueCol (str) – column representing the value of the series (default: value)

setPeriod(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setPeriodCol(value)[source]
Parameters

period (object) – Optional argument, periodic value of a time series. If the value is null or does not present, the API will determine the period automatically.

setSensitivity(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSensitivityCol(value)[source]
Parameters

sensitivity (object) – Optional argument, advanced model parameter, between 0-99, the lower the value is, the larger the margin value will be which means less anomalies will be accepted

setSeries(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSeriesCol(value)[source]
Parameters

series (object) – Time series data points. Points should be sorted by timestamp in ascending order to match the anomaly detection result. If the data is not sorted correctly or there is duplicated timestamp, the API will not work. In such case, an error message will be returned.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setTimestampCol(value)[source]
Parameters

timestampCol (str) – column representing the time of the series (default: timestamp)

setUrl(value)[source]
Parameters

url (str) – Url of the service

setValueCol(value)[source]
Parameters

valueCol (str) – column representing the value of the series (default: value)

mmlspark.cognitive.SpeechToText module

class mmlspark.cognitive.SpeechToText.SpeechToText(audioData=None, concurrency=1, concurrentTimeout=100.0, errorCol=None, format=None, handler=None, language=None, outputCol=None, profanity=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • audioData (object) – The data sent to the service must be a .wav files

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getAudioData()[source]
Returns

The data sent to the service must be a .wav files

Return type

object

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getFormat()[source]
Returns

Specifies the result format. Accepted values are simple and detailed. Default is simple.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

Identifies the spoken language that is being recognized.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getProfanity()[source]
Returns

Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setAudioData(value)[source]
Parameters

audioData (object) – The data sent to the service must be a .wav files

setAudioDataCol(value)[source]
Parameters

audioData (object) – The data sent to the service must be a .wav files

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setFormat(value)[source]
Parameters

format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setFormatCol(value)[source]
Parameters

format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – Identifies the spoken language that is being recognized.

setLanguageCol(value)[source]
Parameters

language (object) – Identifies the spoken language that is being recognized.

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(audioData=None, concurrency=1, concurrentTimeout=100.0, errorCol=None, format=None, handler=None, language=None, outputCol=None, profanity=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • audioData (object) – The data sent to the service must be a .wav files

  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • format (object) – Specifies the result format. Accepted values are simple and detailed. Default is simple.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – Identifies the spoken language that is being recognized.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setProfanity(value)[source]
Parameters

profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setProfanityCol(value)[source]
Parameters

profanity (object) – Specifies how to handle profanity in recognition results. Accepted values are masked, which replaces profanity with asterisks, removed, which remove all profanity from the result, or raw, which includes the profanity in the result. The default setting is masked.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.TagImage module

class mmlspark.cognitive.TagImage.TagImage(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – The desired language for output generation. (default: ServiceParamData(None,Some(en)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

getImageBytes()[source]
Returns

bytestream of the image to use

Return type

object

getImageUrl()[source]
Returns

the url of the image to use

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

The desired language for output generation. (default: ServiceParamData(None,Some(en)))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setImageBytes(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageBytesCol(value)[source]
Parameters

imageBytes (object) – bytestream of the image to use

setImageUrl(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setImageUrlCol(value)[source]
Parameters

imageUrl (object) – the url of the image to use

setLanguage(value)[source]
Parameters

language (object) – The desired language for output generation. (default: ServiceParamData(None,Some(en)))

setLanguageCol(value)[source]
Parameters

language (object) – The desired language for output generation. (default: ServiceParamData(None,Some(en)))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, imageBytes=None, imageUrl=None, language=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • imageBytes (object) – bytestream of the image to use

  • imageUrl (object) – the url of the image to use

  • language (object) – The desired language for output generation. (default: ServiceParamData(None,Some(en)))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.TextSentiment module

class mmlspark.cognitive.TextSentiment.TextSentiment(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLanguage()[source]
Returns

the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getText()[source]
Returns

the text in the request body (default: ServiceParamData(Some(Right(text)),None))

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLanguage(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLanguageCol(value)[source]
Parameters

language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, language=None, outputCol=None, subscriptionKey=None, text=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • language (object) – the language code of the text (optional for some services) (default: ServiceParamData(None,Some(List(en))))

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • subscriptionKey (object) – the API key to use

  • text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setText(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTextCol(value)[source]
Parameters

text (object) – the text in the request body (default: ServiceParamData(Some(Right(text)),None))

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

mmlspark.cognitive.VerifyFaces module

class mmlspark.cognitive.VerifyFaces.VerifyFaces(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceId=None, faceId1=None, faceId2=None, handler=None, largePersonGroupId=None, outputCol=None, personGroupId=None, personId=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.core.schema.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceId (object) – faceId of the face, comes from Face - Detect.

  • faceId1 (object) – faceId of one face, comes from Face - Detect.

  • faceId2 (object) – faceId of another face, comes from Face - Detect.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

getConcurrency()[source]
Returns

max number of concurrent calls (default: 1)

Return type

int

getConcurrentTimeout()[source]
Returns

max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

Return type

double

getErrorCol()[source]
Returns

column to hold http errors (default: [self.uid]_error)

Return type

str

getFaceId()[source]
Returns

faceId of the face, comes from Face - Detect.

Return type

object

getFaceId1()[source]
Returns

faceId of one face, comes from Face - Detect.

Return type

object

getFaceId2()[source]
Returns

faceId of another face, comes from Face - Detect.

Return type

object

getHandler()[source]
Returns

Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

Return type

object

static getJavaPackage()[source]

Returns package name String.

getLargePersonGroupId()[source]
Returns

Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

object

getOutputCol()[source]
Returns

The name of the output column (default: [self.uid]_output)

Return type

str

getPersonGroupId()[source]
Returns

Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

Return type

object

getPersonId()[source]
Returns

Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

Return type

object

getSubscriptionKey()[source]
Returns

the API key to use

Return type

object

getTimeout()[source]
Returns

number of seconds to wait before closing the connection (default: 60.0)

Return type

double

getUrl()[source]
Returns

Url of the service

Return type

str

classmethod read()[source]

Returns an MLReader instance for this class.

setConcurrency(value)[source]
Parameters

concurrency (int) – max number of concurrent calls (default: 1)

setConcurrentTimeout(value)[source]
Parameters

concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

setErrorCol(value)[source]
Parameters

errorCol (str) – column to hold http errors (default: [self.uid]_error)

setFaceId(value)[source]
Parameters

faceId (object) – faceId of the face, comes from Face - Detect.

setFaceId1(value)[source]
Parameters

faceId1 (object) – faceId of one face, comes from Face - Detect.

setFaceId1Col(value)[source]
Parameters

faceId1 (object) – faceId of one face, comes from Face - Detect.

setFaceId2(value)[source]
Parameters

faceId2 (object) – faceId of another face, comes from Face - Detect.

setFaceId2Col(value)[source]
Parameters

faceId2 (object) – faceId of another face, comes from Face - Detect.

setFaceIdCol(value)[source]
Parameters

faceId (object) – faceId of the face, comes from Face - Detect.

setHandler(value)[source]
Parameters

handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

setLargePersonGroupId(value)[source]
Parameters

largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLargePersonGroupIdCol(value)[source]
Parameters

largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setLocation(value)[source]
setOutputCol(value)[source]
Parameters

outputCol (str) – The name of the output column (default: [self.uid]_output)

setParams(concurrency=1, concurrentTimeout=100.0, errorCol=None, faceId=None, faceId1=None, faceId2=None, handler=None, largePersonGroupId=None, outputCol=None, personGroupId=None, personId=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters
  • concurrency (int) – max number of concurrent calls (default: 1)

  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)

  • errorCol (str) – column to hold http errors (default: [self.uid]_error)

  • faceId (object) – faceId of the face, comes from Face - Detect.

  • faceId1 (object) – faceId of one face, comes from Face - Detect.

  • faceId2 (object) – faceId of another face, comes from Face - Detect.

  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))

  • largePersonGroupId (object) – Using existing largePersonGroupId and personId for fast adding a specified person. largePersonGroupId is created in LargePersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • outputCol (str) – The name of the output column (default: [self.uid]_output)

  • personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

  • personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

  • subscriptionKey (object) – the API key to use

  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

  • url (str) – Url of the service

setPersonGroupId(value)[source]
Parameters

personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonGroupIdCol(value)[source]
Parameters

personGroupId (object) – Using existing personGroupId and personId for fast loading a specified person. personGroupId is created in PersonGroup - Create. Parameter personGroupId and largePersonGroupId should not be provided at the same time.

setPersonId(value)[source]
Parameters

personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setPersonIdCol(value)[source]
Parameters

personId (object) – Specify a certain person in a person group or a large person group. personId is created in PersonGroup Person - Create or LargePersonGroup Person - Create.

setSubscriptionKey(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters

subscriptionKey (object) – the API key to use

setTimeout(value)[source]
Parameters

timeout (double) – number of seconds to wait before closing the connection (default: 60.0)

setUrl(value)[source]
Parameters

url (str) – Url of the service

Module contents

MicrosoftML is a library of Python classes to interface with the Microsoft scala APIs to utilize Apache Spark to create distibuted machine learning models.

MicrosoftML simplifies training and scoring classifiers and regressors, as well as facilitating the creation of models using the CNTK library, images, and text.