synapse.ml.geospatial package
Submodules
synapse.ml.geospatial.AddressGeocoder module
- class synapse.ml.geospatial.AddressGeocoder.AddressGeocoder(java_obj=None, address=None, addressCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AddressGeocoder_cb71b3973257_error', initialPollingDelay=300, maxPollingRetries=1000, outputCol='AddressGeocoder_cb71b3973257_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url='https://atlas.microsoft.com/search/address/batch/json')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
address (object) – the address to geocode
backoffs (list) – array of backoffs to use in the handler
concurrency (int) – max number of concurrent calls
concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1
errorCol (str) – column to hold http errors
initialPollingDelay (int) – number of milliseconds to wait before first poll for result
maxPollingRetries (int) – number of times to poll
outputCol (str) – The name of the output column
pollingDelay (int) – number of milliseconds to wait between polling
subscriptionKey (object) – the API key to use
suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column
timeout (float) – number of seconds to wait before closing the connection
url (str) – Url of the service
- address = Param(parent='undefined', name='address', doc='ServiceParam: the address to geocode')
- backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
- concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
- concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
- errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
- getConcurrentTimeout()[source]
- Returns
max number seconds to wait on futures if concurrency >= 1
- Return type
concurrentTimeout
- getInitialPollingDelay()[source]
- Returns
number of milliseconds to wait before first poll for result
- Return type
initialPollingDelay
- getPollingDelay()[source]
- Returns
number of milliseconds to wait between polling
- Return type
pollingDelay
- getSuppressMaxRetriesExceededException()[source]
- Returns
set true to suppress the maxumimum retries exception and report in the error column
- Return type
suppressMaxRetriesExceededException
- getTimeout()[source]
- Returns
number of seconds to wait before closing the connection
- Return type
timeout
- initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
- maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
- setConcurrentTimeout(value)[source]
- Parameters
concurrentTimeout – max number seconds to wait on futures if concurrency >= 1
- setInitialPollingDelay(value)[source]
- Parameters
initialPollingDelay – number of milliseconds to wait before first poll for result
- setParams(address=None, addressCol=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='AddressGeocoder_cb71b3973257_error', initialPollingDelay=300, maxPollingRetries=1000, outputCol='AddressGeocoder_cb71b3973257_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url='https://atlas.microsoft.com/search/address/batch/json')[source]
Set the (keyword only) parameters
- setPollingDelay(value)[source]
- Parameters
pollingDelay – number of milliseconds to wait between polling
- setSuppressMaxRetriesExceededException(value)[source]
- Parameters
suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column
- setTimeout(value)[source]
- Parameters
timeout – number of seconds to wait before closing the connection
- subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
- suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
- timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
- url = Param(parent='undefined', name='url', doc='Url of the service')
synapse.ml.geospatial.CheckPointInPolygon module
- class synapse.ml.geospatial.CheckPointInPolygon.CheckPointInPolygon(java_obj=None, concurrency=1, concurrentTimeout=None, errorCol='CheckPointInPolygon_37e2b39e23e2_error', handler=None, latitude=None, latitudeCol=None, longitude=None, longitudeCol=None, outputCol='CheckPointInPolygon_37e2b39e23e2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url='https://atlas.microsoft.com/', userDataIdentifier=None, userDataIdentifierCol=None)[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
concurrency (int) – max number of concurrent calls
concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1
errorCol (str) – column to hold http errors
handler (object) – Which strategy to use when handling requests
latitude (object) – the latitude of location
longitude (object) – the longitude of location
outputCol (str) – The name of the output column
subscriptionKey (object) – the API key to use
timeout (float) – number of seconds to wait before closing the connection
url (str) – Url of the service
userDataIdentifier (object) – the identifier for the user uploaded data
- concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
- concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
- errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
- getConcurrentTimeout()[source]
- Returns
max number seconds to wait on futures if concurrency >= 1
- Return type
concurrentTimeout
- getTimeout()[source]
- Returns
number of seconds to wait before closing the connection
- Return type
timeout
- getUserDataIdentifier()[source]
- Returns
the identifier for the user uploaded data
- Return type
userDataIdentifier
- handler = Param(parent='undefined', name='handler', doc='Which strategy to use when handling requests')
- latitude = Param(parent='undefined', name='latitude', doc='ServiceParam: the latitude of location')
- longitude = Param(parent='undefined', name='longitude', doc='ServiceParam: the longitude of location')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- setConcurrentTimeout(value)[source]
- Parameters
concurrentTimeout – max number seconds to wait on futures if concurrency >= 1
- setParams(concurrency=1, concurrentTimeout=None, errorCol='CheckPointInPolygon_37e2b39e23e2_error', handler=None, latitude=None, latitudeCol=None, longitude=None, longitudeCol=None, outputCol='CheckPointInPolygon_37e2b39e23e2_output', subscriptionKey=None, subscriptionKeyCol=None, timeout=60.0, url='https://atlas.microsoft.com/', userDataIdentifier=None, userDataIdentifierCol=None)[source]
Set the (keyword only) parameters
- setTimeout(value)[source]
- Parameters
timeout – number of seconds to wait before closing the connection
- setUserDataIdentifier(value)[source]
- Parameters
userDataIdentifier – the identifier for the user uploaded data
- setUserDataIdentifierCol(value)[source]
- Parameters
userDataIdentifier – the identifier for the user uploaded data
- subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
- timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
- url = Param(parent='undefined', name='url', doc='Url of the service')
- userDataIdentifier = Param(parent='undefined', name='userDataIdentifier', doc='ServiceParam: the identifier for the user uploaded data')
synapse.ml.geospatial.ReverseAddressGeocoder module
- class synapse.ml.geospatial.ReverseAddressGeocoder.ReverseAddressGeocoder(java_obj=None, backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReverseAddressGeocoder_30b0337e17e1_error', initialPollingDelay=300, latitude=None, latitudeCol=None, longitude=None, longitudeCol=None, maxPollingRetries=1000, outputCol='ReverseAddressGeocoder_30b0337e17e1_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url='https://atlas.microsoft.com/search/address/reverse/batch/json')[source]
Bases:
synapse.ml.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
- Parameters
backoffs (list) – array of backoffs to use in the handler
concurrency (int) – max number of concurrent calls
concurrentTimeout (float) – max number seconds to wait on futures if concurrency >= 1
errorCol (str) – column to hold http errors
initialPollingDelay (int) – number of milliseconds to wait before first poll for result
latitude (object) – the latitude of location
longitude (object) – the longitude of location
maxPollingRetries (int) – number of times to poll
outputCol (str) – The name of the output column
pollingDelay (int) – number of milliseconds to wait between polling
subscriptionKey (object) – the API key to use
suppressMaxRetriesExceededException (bool) – set true to suppress the maxumimum retries exception and report in the error column
timeout (float) – number of seconds to wait before closing the connection
url (str) – Url of the service
- backoffs = Param(parent='undefined', name='backoffs', doc='array of backoffs to use in the handler')
- concurrency = Param(parent='undefined', name='concurrency', doc='max number of concurrent calls')
- concurrentTimeout = Param(parent='undefined', name='concurrentTimeout', doc='max number seconds to wait on futures if concurrency >= 1')
- errorCol = Param(parent='undefined', name='errorCol', doc='column to hold http errors')
- getConcurrentTimeout()[source]
- Returns
max number seconds to wait on futures if concurrency >= 1
- Return type
concurrentTimeout
- getInitialPollingDelay()[source]
- Returns
number of milliseconds to wait before first poll for result
- Return type
initialPollingDelay
- getPollingDelay()[source]
- Returns
number of milliseconds to wait between polling
- Return type
pollingDelay
- getSuppressMaxRetriesExceededException()[source]
- Returns
set true to suppress the maxumimum retries exception and report in the error column
- Return type
suppressMaxRetriesExceededException
- getTimeout()[source]
- Returns
number of seconds to wait before closing the connection
- Return type
timeout
- initialPollingDelay = Param(parent='undefined', name='initialPollingDelay', doc='number of milliseconds to wait before first poll for result')
- latitude = Param(parent='undefined', name='latitude', doc='ServiceParam: the latitude of location')
- longitude = Param(parent='undefined', name='longitude', doc='ServiceParam: the longitude of location')
- maxPollingRetries = Param(parent='undefined', name='maxPollingRetries', doc='number of times to poll')
- outputCol = Param(parent='undefined', name='outputCol', doc='The name of the output column')
- pollingDelay = Param(parent='undefined', name='pollingDelay', doc='number of milliseconds to wait between polling')
- setConcurrentTimeout(value)[source]
- Parameters
concurrentTimeout – max number seconds to wait on futures if concurrency >= 1
- setInitialPollingDelay(value)[source]
- Parameters
initialPollingDelay – number of milliseconds to wait before first poll for result
- setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=None, errorCol='ReverseAddressGeocoder_30b0337e17e1_error', initialPollingDelay=300, latitude=None, latitudeCol=None, longitude=None, longitudeCol=None, maxPollingRetries=1000, outputCol='ReverseAddressGeocoder_30b0337e17e1_output', pollingDelay=300, subscriptionKey=None, subscriptionKeyCol=None, suppressMaxRetriesExceededException=False, timeout=60.0, url='https://atlas.microsoft.com/search/address/reverse/batch/json')[source]
Set the (keyword only) parameters
- setPollingDelay(value)[source]
- Parameters
pollingDelay – number of milliseconds to wait between polling
- setSuppressMaxRetriesExceededException(value)[source]
- Parameters
suppressMaxRetriesExceededException – set true to suppress the maxumimum retries exception and report in the error column
- setTimeout(value)[source]
- Parameters
timeout – number of seconds to wait before closing the connection
- subscriptionKey = Param(parent='undefined', name='subscriptionKey', doc='ServiceParam: the API key to use')
- suppressMaxRetriesExceededException = Param(parent='undefined', name='suppressMaxRetriesExceededException', doc='set true to suppress the maxumimum retries exception and report in the error column')
- timeout = Param(parent='undefined', name='timeout', doc='number of seconds to wait before closing the connection')
- url = Param(parent='undefined', name='url', doc='Url of the service')
Module contents
SynapseML is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, SynapseML provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
SynapseML requires Scala 2.12, Spark 3.0+, and Python 3.6+.