Carlos Guestrin


Carlos Guestrin - Large-Scale Machine Learning: From Algorithms to the Cloud with GraphLab

Published on Jan 7, 2013

Carlos Guestrin at University of Washington - 10/2/12

Abstract:
Today, machine learning (ML) methods play a central role in industry
and science. The growth of the Web and improvements in sensor data
collection technology have been rapidly increasing the magnitude and
complexity of the ML tasks we must solve. This growth is driving the
need for scalable, parallel ML algorithms that can handle "Big Data."
In this talk, I'll first present some recent advances in large-scale
algorithms for tackling such huge problems.

Unfortunately, implementing efficient parallel ML algorithms is
challenging. Existing high-level parallel abstractions such as
MapReduce and Pregel are insufficiently expressive to achieve the
desired performance, while low-level tools such as MPI are difficult
to use, leaving ML experts repeatedly solving the same design
challenges.
In this talk, I will also describe the GraphLab framework, which
naturally expresses asynchronous, dynamic graph computations that are
key for state-of-the-art ML algorithms. When these algorithms are
expressed in our higher-level abstraction, GraphLab will effectively
address many of the underlying parallelism challenges, including data
distribution, optimized communication, and guaranteeing sequential
consistency, a property that is surprisingly important for many ML
algorithms. On a variety of large-scale tasks, GraphLab provides
20-100x performance improvements over Hadoop. In recent months,
GraphLab has received thousands of downloads, and is being actively
used by a number of startups, companies, research labs and
universities.

This talk represents joint work with Yucheng Low, Joey Gonzalez, Aapo
Kyrola, Jay Gu, Danny Bickson, and Joseph Bradley.
 
Back
Top