mmlspark.automl package¶
Submodules¶
mmlspark.automl.BestModel module¶
- class mmlspark.automl.BestModel.BestModel(java_obj=None, allModelMetrics=None, bestModel=None, bestModelMetrics=None, rocCurve=None, scoredDataset=None)[source]¶
Bases:
mmlspark.automl._BestModel._BestModel
mmlspark.automl.FindBestModel module¶
- class mmlspark.automl.FindBestModel.FindBestModel(java_obj=None, evaluationMetric='accuracy', models=None)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶
- models = Param(parent='undefined', name='models', doc='List of models to be evaluated')¶
mmlspark.automl.HyperparamBuilder module¶
- class mmlspark.automl.HyperparamBuilder.DiscreteHyperParam(values, seed=0)[source]¶
Bases:
object
Specifies a discrete list of values.
- class mmlspark.automl.HyperparamBuilder.GridSpace(paramValues)[source]¶
Bases:
object
Specifies a predetermined grid of values to search through.
- class mmlspark.automl.HyperparamBuilder.HyperparamBuilder[source]¶
Bases:
object
Specifies the search space for hyperparameters.
mmlspark.automl.TuneHyperparameters module¶
- class mmlspark.automl.TuneHyperparameters.TuneHyperparameters(java_obj=None, evaluationMetric=None, models=None, numFolds=None, numRuns=None, parallelism=None, paramSpace=None, seed=0)[source]¶
Bases:
mmlspark.core.schema.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaEstimator
- Parameters
evaluationMetric (object) – Metric to evaluate models with
models (object) – Estimators to run
numFolds (int) – Number of folds
numRuns (int) – Termination criteria for randomized search
parallelism (int) – The number of models to run in parallel
paramSpace (object) – Parameter space for generating hyperparameters
seed (long) – Random number generator seed
- evaluationMetric = Param(parent='undefined', name='evaluationMetric', doc='Metric to evaluate models with')¶
- getParamSpace()[source]¶
- Returns
Parameter space for generating hyperparameters
- Return type
paramSpace
- models = Param(parent='undefined', name='models', doc='Estimators to run')¶
- numFolds = Param(parent='undefined', name='numFolds', doc='Number of folds')¶
- numRuns = Param(parent='undefined', name='numRuns', doc='Termination criteria for randomized search')¶
- parallelism = Param(parent='undefined', name='parallelism', doc='The number of models to run in parallel')¶
- paramSpace = Param(parent='undefined', name='paramSpace', doc='Parameter space for generating hyperparameters')¶
- seed = Param(parent='undefined', name='seed', doc='Random number generator seed')¶
- setParamSpace(value)[source]¶
- Parameters
paramSpace – Parameter space for generating hyperparameters
mmlspark.automl.TuneHyperparametersModel module¶
Module contents¶
MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.
MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. In this vein, MMLSpark provides easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services. For production grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+.