MMLSpark: Unifying Machine Learning Ecosystems at Massive Scales

  • ,
  • Sudarshan Raghunathan ,
  • Ilya Matiach ,
  • Andrew Schonhoffer ,
  • Anand Raman ,
  • Eli Barzilay ,
  • Karthik Rajendran ,
  • Dalitso Banda ,
  • Casey Jisoo Hong ,
  • Manon Knoertzer ,
  • Ben Brodsky ,
  • Minsoo Thigpen ,
  • Janhavi Suresh Mahajan ,
  • Courtney Cochrane ,
  • Abhiram Eswaran ,
  • Ari Green

arXiv

Presentation (ppt)

We introduce Microsoft Machine Learning for Apache Spark (MMLSpark), an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning, Micro-Service Orchestration, Gradient Boosting, Model Interpretability, and other areas of modern computation. Furthermore, we present a novel system called Spark Serving that allows users to run any Apache Spark program as a distributed, sub-millisecond latency web service backed by their existing Spark Cluster. All MMLSpark contributions have the same API to enable simple composition across frameworks and usage across batch, streaming, and RESTful web serving scenarios on static, elastic, or serverless clusters. We showcase MMLSpark by creating a method for deep object detection capable of learning without human labeled data and demonstrate its effectiveness for Snow Leopard conservation.

Publication Downloads

Vowpal Wabbit

June 28, 2019

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well. Vowpal Wabbit is a destination for implementing and maturing state of the art algorithms with performance in mind.