synapse.ml.services.aifoundry package

Submodules

synapse.ml.services.aifoundry.AIFoundryChatCompletion module

class synapse.ml.services.aifoundry.AIFoundryChatCompletion.AIFoundryChatCompletion(java_obj=None, AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, bestOf=None, bestOfCol=None, cacheLevel=None, cacheLevelCol=None, concurrency=1, concurrentTimeout=None, customHeaders=None, customHeadersCol=None, customUrlRoot=None, deploymentName=None, deploymentNameCol=None, echo=None, echoCol=None, errorCol='OpenAIChatCompletion_e4d076dd9e16_error', frequencyPenalty=None, frequencyPenaltyCol=None, handler=None, logProbs=None, logProbsCol=None, maxTokens=None, maxTokensCol=None, messagesCol=None, model=None, modelCol=None, n=None, nCol=None, outputCol='OpenAIChatCompletion_e4d076dd9e16_output', presencePenalty=None, presencePenaltyCol=None, responseFormat=None, responseFormatCol=None, seed=None, seedCol=None, stop=None, stopCol=None, subscriptionKey=None, subscriptionKeyCol=None, telemHeaders=None, telemHeadersCol=None, temperature=None, temperatureCol=None, timeout=360.0, topP=None, topPCol=None, url=None, user=None, userCol=None)[source]

Bases: ComplexParamsMixin, JavaMLReadable, JavaMLWritable, JavaTransformer

Parameters:
  • AADToken (object) – AAD Token used for authentication

  • CustomAuthHeader (object) – A Custom Value for Authorization Header

  • apiVersion (object) – version of the api

  • bestOf (object) – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

  • cacheLevel (object) – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

  • concurrency (int) – max number of concurrent calls

  • concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1

  • customHeaders (object) – Map of Custom Header Key-Value Tuples.

  • customUrlRoot (str) – The custom URL root for the service. This will not append OpenAI specific model path completions (i.e. /chat/completions) to the URL.

  • deploymentName (object) – The name of the deployment

  • echo (object) – Echo back the prompt in addition to the completion

  • errorCol (str) – column to hold http errors

  • frequencyPenalty (object) – How much to penalize new tokens based on whether they appear in the text so far. Increases the likelihood of the model to talk about new topics.

  • handler (object) – Which strategy to use when handling requests

  • logProbs (object) – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

  • maxTokens (object) – The maximum number of tokens to generate. Has minimum of 0.

  • messagesCol (str) – The column messages to generate chat completions for, in the chat format. This column should have type Array(Struct(role: String, content: String)).

  • model (object) – The name of the model

  • n (object) – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

  • outputCol (str) – The name of the output column

  • presencePenalty (object) – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood of the model to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

  • responseFormat (object) – Response format for the completion. Can be ‘json_object’ or ‘text’.

  • seed (object) – If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

  • stop (object) – A sequence which indicates the end of the current document.

  • subscriptionKey (object) – the API key to use

  • telemHeaders (object) – Map of Custom Header Key-Value Tuples.

  • temperature (object) – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

  • timeout (float) – number of seconds to wait before closing the connection

  • topP (object) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10 percent probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

  • url (str) – Url of the service

  • user (object) – The ID of the end-user, for use in tracking and rate-limiting.

AADToken = Param(parent='undefined', name='AADToken', doc='ServiceParam: AAD Token used for authentication')
CustomAuthHeader = Param(parent='undefined', name='CustomAuthHeader', doc='ServiceParam: A Custom Value for Authorization Header')
apiVersion = Param(parent='undefined', name='apiVersion', doc='ServiceParam: version of the api')
bestOf = Param(parent='undefined', name='bestOf', doc='ServiceParam: How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.')
cacheLevel = Param(parent='undefined', name='cacheLevel', doc='ServiceParam: can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache')
concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
customHeaders = Param(parent='undefined', name='customHeaders', doc='ServiceParam: Map of Custom Header Key-Value Tuples.')
customUrlRoot = Param(parent='undefined', name='customUrlRoot', doc='The custom URL root for the service. This will not append OpenAI specific model path completions (i.e. /chat/completions) to the URL.')
deploymentName = Param(parent='undefined', name='deploymentName', doc='ServiceParam: The name of the deployment')
echo = Param(parent='undefined', name='echo', doc='ServiceParam: Echo back the prompt in addition to the completion')
errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
frequencyPenalty = Param(parent='undefined', name='frequencyPenalty', doc='ServiceParam: How much to penalize new tokens based on whether they appear in the text so far. Increases the likelihood of the model to talk about new topics.')
getAADToken()[source]
Returns:

AAD Token used for authentication

Return type:

AADToken

getApiVersion()[source]
Returns:

version of the api

Return type:

apiVersion

getBestOf()[source]
Returns:

How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

Return type:

bestOf

getCacheLevel()[source]
Returns:

can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

Return type:

cacheLevel

getConcurrency()[source]
Returns:

max number of concurrent calls

Return type:

concurrency

getConcurrentTimeout()[source]
Returns:

max number seconds to wait on futures if concurrency >= 1

Return type:

concurrentTimeout

getCustomAuthHeader()[source]
Returns:

A Custom Value for Authorization Header

Return type:

CustomAuthHeader

getCustomHeaders()[source]
Returns:

Map of Custom Header Key-Value Tuples.

Return type:

customHeaders

getCustomUrlRoot()[source]
Returns:

The custom URL root for the service. This will not append OpenAI specific model path completions (i.e. /chat/completions) to the URL.

Return type:

customUrlRoot

getDeploymentName()[source]
Returns:

The name of the deployment

Return type:

deploymentName

getEcho()[source]
Returns:

Echo back the prompt in addition to the completion

Return type:

echo

getErrorCol()[source]
Returns:

column to hold http errors

Return type:

errorCol

getFrequencyPenalty()[source]
Returns:

How much to penalize new tokens based on whether they appear in the text so far. Increases the likelihood of the model to talk about new topics.

Return type:

frequencyPenalty

getHandler()[source]
Returns:

Which strategy to use when handling requests

Return type:

handler

static getJavaPackage()[source]

Returns package name String.

getLogProbs()[source]
Returns:

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

Return type:

logProbs

getMaxTokens()[source]
Returns:

The maximum number of tokens to generate. Has minimum of 0.

Return type:

maxTokens

getMessagesCol()[source]
Returns:

The column messages to generate chat completions for, in the chat format. This column should have type Array(Struct(role: String, content: String)).

Return type:

messagesCol

getModel()[source]
Returns:

The name of the model

Return type:

model

getN()[source]
Returns:

How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

Return type:

n

getOutputCol()[source]
Returns:

The name of the output column

Return type:

outputCol

getPresencePenalty()[source]
Returns:

How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood of the model to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

Return type:

presencePenalty

getResponseFormat()[source]
Returns:

Response format for the completion. Can be ‘json_object’ or ‘text’.

Return type:

responseFormat

getSeed()[source]
Returns:

If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

Return type:

seed

getStop()[source]
Returns:

A sequence which indicates the end of the current document.

Return type:

stop

getSubscriptionKey()[source]
Returns:

the API key to use

Return type:

subscriptionKey

getTelemHeaders()[source]
Returns:

Map of Custom Header Key-Value Tuples.

Return type:

telemHeaders

getTemperature()[source]
Returns:

What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

Return type:

temperature

getTimeout()[source]
Returns:

number of seconds to wait before closing the connection

Return type:

timeout

getTopP()[source]
Returns:

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10 percent probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

Return type:

topP

getUrl()[source]
Returns:

Url of the service

Return type:

url

getUser()[source]
Returns:

The ID of the end-user, for use in tracking and rate-limiting.

Return type:

user

handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
logProbs = Param(parent='undefined', name='logProbs', doc='ServiceParam: Include the log probabilities on the `logprobs` most likely tokens, as well the chosen tokens. So for example, if `logprobs` is 10, the API will return a list of the 10 most likely tokens. If `logprobs` is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.')
maxTokens = Param(parent='undefined', name='maxTokens', doc='ServiceParam: The maximum number of tokens to generate. Has minimum of 0.')
messagesCol = Param(parent='undefined', name='messagesCol', doc='The column messages to generate chat completions for, in the chat format. This column should have type Array(Struct(role: String, content: String)).')
model = Param(parent='undefined', name='model', doc='ServiceParam: The name of the model')
n = Param(parent='undefined', name='n', doc='ServiceParam: How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.')
outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
presencePenalty = Param(parent='undefined', name='presencePenalty', doc='ServiceParam: How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood of the model to repeat the same line verbatim. Has minimum of -2 and maximum of 2.')
classmethod read()[source]

Returns an MLReader instance for this class.

responseFormat = Param(parent='undefined', name='responseFormat', doc="ServiceParam: Response format for the completion. Can be 'json_object' or 'text'.")
seed = Param(parent='undefined', name='seed', doc='ServiceParam: If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.')
setAADToken(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setAADTokenCol(value)[source]
Parameters:

AADToken – AAD Token used for authentication

setApiVersion(value)[source]
Parameters:

apiVersion – version of the api

setApiVersionCol(value)[source]
Parameters:

apiVersion – version of the api

setBestOf(value)[source]
Parameters:

bestOf – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

setBestOfCol(value)[source]
Parameters:

bestOf – How many generations to create server side, and display only the best. Will not stream intermediate progress if best_of > 1. Has maximum value of 128.

setCacheLevel(value)[source]
Parameters:

cacheLevel – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

setCacheLevelCol(value)[source]
Parameters:

cacheLevel – can be used to disable any server-side caching, 0=no cache, 1=prompt prefix enabled, 2=full cache

setConcurrency(value)[source]
Parameters:

concurrency – max number of concurrent calls

setConcurrentTimeout(value)[source]
Parameters:

concurrentTimeout – max number seconds to wait on futures if concurrency >= 1

setCustomAuthHeader(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomAuthHeaderCol(value)[source]
Parameters:

CustomAuthHeader – A Custom Value for Authorization Header

setCustomHeaders(value)[source]
Parameters:

customHeaders – Map of Custom Header Key-Value Tuples.

setCustomHeadersCol(value)[source]
Parameters:

customHeaders – Map of Custom Header Key-Value Tuples.

setCustomServiceName(value)[source]
setCustomUrlRoot(value)[source]
Parameters:

customUrlRoot – The custom URL root for the service. This will not append OpenAI specific model path completions (i.e. /chat/completions) to the URL.

setDefaultInternalEndpoint(value)[source]
setDeploymentName(value)[source]
Parameters:

deploymentName – The name of the deployment

setDeploymentNameCol(value)[source]
Parameters:

deploymentName – The name of the deployment

setEcho(value)[source]
Parameters:

echo – Echo back the prompt in addition to the completion

setEchoCol(value)[source]
Parameters:

echo – Echo back the prompt in addition to the completion

setEndpoint(value)[source]
setErrorCol(value)[source]
Parameters:

errorCol – column to hold http errors

setFrequencyPenalty(value)[source]
Parameters:

frequencyPenalty – How much to penalize new tokens based on whether they appear in the text so far. Increases the likelihood of the model to talk about new topics.

setFrequencyPenaltyCol(value)[source]
Parameters:

frequencyPenalty – How much to penalize new tokens based on whether they appear in the text so far. Increases the likelihood of the model to talk about new topics.

setHandler(value)[source]
Parameters:

handler – Which strategy to use when handling requests

setLogProbs(value)[source]
Parameters:

logProbs – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

setLogProbsCol(value)[source]
Parameters:

logProbs – Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 10, the API will return a list of the 10 most likely tokens. If logprobs is 0, only the chosen tokens will have logprobs returned. Minimum of 0 and maximum of 100 allowed.

setMaxTokens(value)[source]
Parameters:

maxTokens – The maximum number of tokens to generate. Has minimum of 0.

setMaxTokensCol(value)[source]
Parameters:

maxTokens – The maximum number of tokens to generate. Has minimum of 0.

setMessagesCol(value)[source]
Parameters:

messagesCol – The column messages to generate chat completions for, in the chat format. This column should have type Array(Struct(role: String, content: String)).

setModel(value)[source]
Parameters:

model – The name of the model

setModelCol(value)[source]
Parameters:

model – The name of the model

setN(value)[source]
Parameters:

n – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

setNCol(value)[source]
Parameters:

n – How many snippets to generate for each prompt. Minimum of 1 and maximum of 128 allowed.

setOutputCol(value)[source]
Parameters:

outputCol – The name of the output column

setParams(AADToken=None, AADTokenCol=None, CustomAuthHeader=None, CustomAuthHeaderCol=None, apiVersion=None, apiVersionCol=None, bestOf=None, bestOfCol=None, cacheLevel=None, cacheLevelCol=None, concurrency=1, concurrentTimeout=None, customHeaders=None, customHeadersCol=None, customUrlRoot=None, deploymentName=None, deploymentNameCol=None, echo=None, echoCol=None, errorCol='OpenAIChatCompletion_e4d076dd9e16_error', frequencyPenalty=None, frequencyPenaltyCol=None, handler=None, logProbs=None, logProbsCol=None, maxTokens=None, maxTokensCol=None, messagesCol=None, model=None, modelCol=None, n=None, nCol=None, outputCol='OpenAIChatCompletion_e4d076dd9e16_output', presencePenalty=None, presencePenaltyCol=None, responseFormat=None, responseFormatCol=None, seed=None, seedCol=None, stop=None, stopCol=None, subscriptionKey=None, subscriptionKeyCol=None, telemHeaders=None, telemHeadersCol=None, temperature=None, temperatureCol=None, timeout=360.0, topP=None, topPCol=None, url=None, user=None, userCol=None)[source]

Set the (keyword only) parameters

setPresencePenalty(value)[source]
Parameters:

presencePenalty – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood of the model to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

setPresencePenaltyCol(value)[source]
Parameters:

presencePenalty – How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood of the model to repeat the same line verbatim. Has minimum of -2 and maximum of 2.

setResponseFormat(value)[source]
Parameters:

responseFormat – Response format for the completion. Can be ‘json_object’ or ‘text’.

setResponseFormatCol(value)[source]
Parameters:

responseFormat – Response format for the completion. Can be ‘json_object’ or ‘text’.

setSeed(value)[source]
Parameters:

seed – If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

setSeedCol(value)[source]
Parameters:

seed – If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

setStop(value)[source]
Parameters:

stop – A sequence which indicates the end of the current document.

setStopCol(value)[source]
Parameters:

stop – A sequence which indicates the end of the current document.

setSubscriptionKey(value)[source]
Parameters:

subscriptionKey – the API key to use

setSubscriptionKeyCol(value)[source]
Parameters:

subscriptionKey – the API key to use

setTelemHeaders(value)[source]
Parameters:

telemHeaders – Map of Custom Header Key-Value Tuples.

setTelemHeadersCol(value)[source]
Parameters:

telemHeaders – Map of Custom Header Key-Value Tuples.

setTemperature(value)[source]
Parameters:

temperature – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

setTemperatureCol(value)[source]
Parameters:

temperature – What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or top_p but not both. Minimum of 0 and maximum of 2 allowed.

setTimeout(value)[source]
Parameters:

timeout – number of seconds to wait before closing the connection

setTopP(value)[source]
Parameters:

topP – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10 percent probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

setTopPCol(value)[source]
Parameters:

topP – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10 percent probability mass are considered. We generally recommend using this or temperature but not both. Minimum of 0 and maximum of 1 allowed.

setUrl(value)[source]
Parameters:

url – Url of the service

setUser(value)[source]
Parameters:

user – The ID of the end-user, for use in tracking and rate-limiting.

setUserCol(value)[source]
Parameters:

user – The ID of the end-user, for use in tracking and rate-limiting.

stop = Param(parent='undefined', name='stop', doc='ServiceParam: A sequence which indicates the end of the current document.')
subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
telemHeaders = Param(parent='undefined', name='telemHeaders', doc='ServiceParam: Map of Custom Header Key-Value Tuples.')
temperature = Param(parent='undefined', name='temperature', doc='ServiceParam: What sampling temperature to use. Higher values means the model will take more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend using this or `top_p` but not both. Minimum of 0 and maximum of 2 allowed.')
timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
topP = Param(parent='undefined', name='topP', doc='ServiceParam: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10 percent probability mass are considered. We generally recommend using this or `temperature` but not both. Minimum of 0 and maximum of 1 allowed.')
url = Param(parent='undefined', name='url', doc='Url of the service')
user = Param(parent='undefined', name='user', doc='ServiceParam: The ID of the end-user, for use in tracking and rate-limiting.')

Module contents

SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.