Microsoft Machine Learning for Apache Spark

A Fault-Tolerant, Elastic, and RESTful Machine Learning Framework

Whats new in MMLSpark v0.14:

Microsoft Cognitive Services on Apache Spark

Bing Image Search

Rich, query-able access the visual world at unprecedented scales.

Try an Example

Computer Vision

Stream or serve Terabytes of data through state of the art OCR engines and Image Classifiers.

Try an Example

Text Analytics

Understand the meaning behind your database of free-form text.

Try an Example

Face Recognition

Identify faces in thousands of image with just a handful of lines.

Try an Example

Distributed Model Interpretability

Understand any image classifier with a distributed implementation of Local Interpretable Model Agnostic Explanations (LIME).

Try an Example

Sub-Millisecond Serving

Serve any spark computation with sub-millisecond latency in Python, Scala, or your language of choice.

Learn More

Spark AI Summit Demo

To try the Spark Summit Europe demo for yourself, you can start from our snow leopard recognition demo and change queries to address your custom detection needs.

Check out the Notebook

Explore our Features:

Scalable Deep Learning

MMLSpark integrates the distributed computing framework Apache Spark with the flexible deep learning framework CNTK. Enabling deep learning at unprecedented scales.

Read the Paper

Stress Free Serving

Spark is well known for it's ability to switch between batch and streaming workloads by modifying a single line. We push this concept even further and enable distributed web services with the same API as batch and streaming workloads.

Learn More

Distributed Microservices

MMLSpark provides powerful and idiomatic tools to communicate with any HTTP endpoint service using Spark. Users can now use Spark as a elastic micro-service orchestrator.

Learn More

Lightning Fast Gradient Boosting

MMLSpark adds GPU enabled gradient boosted machines from the popular framework LightGBM. Users can mix and match frameworks in a single distributed environment and API.

Try an Example

Install

MMLSpark can be conveniently installed on existing Spark clusters via the --packages option, examples:
spark-shell --packages Azure:mmlspark:0.14
pyspark --packages Azure:mmlspark:0.14
spark-submit --packages Azure:mmlspark:0.14 MyApp.jar
This can be used in other Spark contexts too, for example, you can use MMLSpark in AZTK by adding it to the .aztk/spark-default.conf file.

Step 1: Create a Databricks account

If you already have a databricks account please skip to step 2. If not, you can make a free account on azure.

Step 2: Install MMLSpark

To install MMLSpark on the Databricks cloud, create a new library from Maven coordinates in your workspace. For the coordinates use: Azure:mmlspark:0.14. Next, ensure this library is attached to your cluster (or all clusters). Finally, ensure that your Spark cluster has Spark 2.3 and Scala 2.11. You can use MMLSpark in both your Scala and PySpark notebooks.

Step 3: Load our Examples (Optional)

To load our examples, right click in your workspace, click "import" and use the following URL:
https://mmlspark.blob.core.windows.net/dbcs/MMLSpark%20Examples%20v0.14.dbc
The easiest way to evaluate MMLSpark is via our pre-built Docker container. To do so, run the following command:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes microsoft/mmlspark
Navigate to http://localhost:8888/ in your web browser to run the sample notebooks. To read the EULA for using the docker image, run:
docker run -it -p 8888:8888 microsoft/mmlspark eula
To try out MMLSpark on a Python (or Conda) installation first install PySpark via pip with pip install pyspark. Next, use --packages or add the package at runtime to get the scala sources
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
    .config("spark.jars.packages", "Azure:mmlspark:0.14") \
    .getOrCreate()
import mmlspark
If you are building a Spark application in Scala, add the following lines to your build.sbt:
resolvers += "MMLSpark Repo" at "https://mmlspark.azureedge.net/maven"
libraryDependencies += "com.microsoft.ml.spark" %% "mmlspark" % "0.14"

Unsupervised Fire Safety

Spark + AI Summit Europe Keynote 2018

We use Bing on Spark, CNTK on Spark, and Spark serving to create a automated fire detection service for gas station safety. We then deploy this to an FPGA accelerated camera for Shell Industries.

Watch Now

Predictive Maintenance with UAVs

Spark + AI Summit 2018

We use CNTK on Spark to distribute a Faster RCNN object detection network and deploy it as a web service with MMLSpark Serving for use on Unmanned Aerial Vehicals (UAVs)

Watch Now

Automated Snow Leopard Detection

We have partnered with the Snow Leopard Trust to create an intelligent snow leopard identification system. This project helped eliminate thousands of hours of searching through photos.

Real-time Intelligent Analytics

Microsoft Connect Keynote 2017

We use CNTK on Spark and deep transfer learning to create a real-time geospacial application for conservation biology in 5 minutes

Watch Now