Microsoft Machine Learning for Apache Spark

A Fault-Tolerant, Elastic, and RESTful Machine Learning Framework

Announcing v1.0-rc

Vowpal Wabbit on Spark

Fast, Sparse, and Scalable Text Analytics

Try an Example

Quality and Build Refactor

New Azure Pipelines build with Code Coverage, CICD, and an organized package structure.

See Release Notes

LightGBM Ranking

Barrier Execution Mode, performance improvements, increased parameter coverage

Learn More

Anomaly Detection and Speech To Text

New Cognitive Services on Spark

Read the Docs

Featured Project:

Generative Adversarial Art with:

Explore the mind of a GAN trained on the Metropolitan Museum of Art's collected works. Then, find your creation in the MET's collection with reverse image search.

Explore our Features:

The Cognitive Services on Spark

Leverage the Microsoft Cognitive Services at unprecedented scales in your existing SparkML pipelines

Read the Paper

Stress Free Serving

Spark is well known for it's ability to switch between batch and streaming workloads by modifying a single line. We push this concept even further and enable distributed web services with the same API as batch and streaming workloads.

Learn More

Lightning Fast Gradient Boosting

MMLSpark adds GPU enabled gradient boosted machines from the popular framework LightGBM. Users can mix and match frameworks in a single distributed environment and API.

Try an Example

Fast and Sparse Text Analytics

Vowpal Wabbit on Spark enables new classes of workloads in scalable and performant text analytics

Try an Example

Distributed Microservices

MMLSpark provides powerful and idiomatic tools to communicate with any HTTP endpoint service using Spark. Users can now use Spark as a elastic micro-service orchestrator.

Learn More

Large Scale Model Interpretability

Understand any image classifier with a distributed implementation of Local Interpretable Model Agnostic Explanations (LIME).

Try an Example

Scalable Deep Learning

MMLSpark integrates the distributed computing framework Apache Spark with the flexible deep learning framework CNTK. Enabling deep learning at unprecedented scales.

Read the Paper

Broad Language Support

MMLSpark's API spans Scala, Python, Java, and R so you can integrate with any ecosystem.

Try our PySpark Examples


MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:
spark-shell --packages

pyspark --packages
This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.

Step 1: Create a Databricks account

If you already have a databricks account please skip to step 2. If not, you can make a free account on azure.

Step 2: Install MMLSpark

To install MMLSpark on the Databricks cloud, create a new library from Maven coordinates in your workspace. For the coordinates use: Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.

Step 3: Load our Examples (Optional)

To load our examples, right click in your workspace, click "import" and use the following URL:
The easiest way to evaluate MMLSpark is via our pre-built Docker container. To do so, run the following command:
docker run -it -p 8888:8888
Please read our docker EULA for usage rights
To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark. Next, use --packages or add the package at runtime to get the scala sources
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp")\
    .config("spark.jars.packages", "")\
import mmlspark
If you are building a Spark application in Scala, add the following lines to your build.sbt:
resolvers += "MMLSpark Repo" at ""
libraryDependencies += "" %% "mmlspark" % "1.0.0-rc1"

Unsupervised Currency Detection

Spark + AI Summit Keynote 2019

We use Bing on Spark, CNTK on Spark,  Spark Serving, and ML Ops to help those with visual impairments work with currency.

Watch Now

Unsupervised Fire Safety

Spark + AI Summit Europe Keynote 2018

We use Bing on Spark, CNTK on Spark, and Spark serving to create a automated fire detection service for gas station safety. We then deploy this to an FPGA accelerated camera for Shell Industries.

Watch Now

Predictive Maintenance with UAVs

Spark + AI Summit 2018

We use CNTK on Spark to distribute a Faster RCNN object detection network and deploy it as a web service with MMLSpark Serving for use on Unmanned Aerial Vehicals (UAVs)

Watch Now

Automated Snow Leopard Detection

We have partnered with the Snow Leopard Trust to create an intelligent snow leopard identification system. This project helped eliminate thousands of hours of searching through photos.

Real-time Intelligent Analytics

Microsoft Connect Keynote 2017

We use CNTK on Spark and deep transfer learning to create a real-time geospacial application for conservation biology in 5 minutes

Watch Now