Microsoft Machine Learning for Apache Spark

A Fault-Tolerant, Elastic, and RESTful Machine Learning Framework

Scalable Deep Learning

MMLSpark integrates the distributed computing framework Apache Spark with the flexible deep learning framework CNTK. Enabling deep learning at unprecedented scales.

Read the Paper

Stress Free Serving

Spark is well known for it's ability to switch between batch and streaming workloads by modifying a single line. We push this concept even further and enable distributed web services with the same API as batch and streaming workloads.

Learn More

Distributed Microservices

MMLSpark provides powerful and idiomatic tools to communicate with any HTTP endpoint service using Spark. Users can now use Spark as a elastic micro-service orchestrator.

Learn More

Lightning Fast Gradient Boosting

MMLSpark adds GPU enabled gradient boosted machines from the popular framework LightGBM. Users can mix and match frameworks in a single distributed environment and API.

Try an Example

Install

MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:
spark-shell --packages Azure:mmlspark:0.13
pyspark --packages Azure:mmlspark:0.13
spark-submit --packages Azure:mmlspark:0.13 MyApp.jar
This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.

Step 1: Create a Databricks account

If you already have a databricks account please skip to step 2. If not, you can make a free account on azure.

Step 2: Install MMLSpark

To install MMLSpark on the Databricks cloud, create a new library from Maven coordinates in your workspace. For the coordinates use: Azure:mmlspark:0.13. Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.

Step 3: Load our Examples (Optional)

To load our examples, right click in your workspace, click "import" and use the following URL:
https://mmlspark.blob.core.windows.net/dbcs/MMLSpark%20Examples%20v0.13.dbc
MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:
spark-shell --packages Azure:mmlspark:0.13
pyspark --packages Azure:mmlspark:0.13
spark-submit --packages Azure:mmlspark:0.13 MyApp.jar
This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.
To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark. Next, use --packages or add the package at runtime to get the scala sources
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .config("spark.jars.packages", "Azure:mmlspark:0.13") \
    .getOrCreate()
import mmlspark
If you are building a Spark application in Scala, add the following lines to your build.sbt:
resolvers += "MMLSpark Repo" at "https://mmlspark.azureedge.net/maven"
libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "0.13"

Predictive Maintenance with UAVs

Spark + AI Summit 2018

We use CNTK on Spark to distribute a Faster RCNN object detection network and deploy it as a web service with MMLSpark Serving for use on Unmanned Aerial Vehicals (UAVs)

Watch Now

Automated Snow Leopard Detection

We have partnered with the Snow Leopard Trust to create an intelligent snow leopard identification system. This project helped eliminate thousands of hours of searching through photos.

Real-time Intelligent Analytics

Microsoft Connect Keynote 2017

We use CNTK on Spark and deep transfer learning to create a real-time geospacial application for conservation biology in 5 minutes

Watch Now