Baharan Mirzasoleiman

PhD. Candidate

Submodularity in Machine Learning and its application for Summarizing Big Data

  Computer Engineering Department, Floor 1, Class 102
  Wednesday, 30 December 2015
  15:10 - 16:10


Submodularity is a property of set functions with deep theoretical and practical consequences. Submodular functions exhibit a natural diminishing returns property: the marginal benefit of any given element decreases as we select more and more elements. Submodular maximization generalizes many well-known problems, including maximum weighted matching, maximum coverage, and finds numerous applications in machine learning and social networks: viral marketing, information gathering, document summarization, and active learning. Although maximizing a submodular function is NP-hard in general, a simple greedy algorithm produces solutions competitive with the optimal (intractable) solution. While the greedy algorithm can easily be applied if the data fits in main memory, it is impractical for data residing on disk, or arriving/changing over time at a fast pace. In this talk, I will first give an introduction to submodularity and its applications in machine learning.

I’ll then discuss in more details one of the important applications of submodular maximization in machine learning and information retrieval, i.e. summarizing massive data. A systematic way for data summarization is to turn the problem into selecting a subset of data elements optimizing a utility function that quantifies “representativeness” of the selected set. Often-times, these objective functions satisfy submodularity. Hence, such problems can be reduced to maximizing a submodular set function subject to cardinality or other feasibility constraints; and dealing with big data means we have to solve this problem at scale. I discuss some of the recently developed distributed and streaming techniques for submodular optimization. I briefly overview the theoretical results, and the effectiveness of these techniques on several real-world applications on millions of data points.


She is a third year Ph.D. student in the Learning and Adaptive Systems Group at ETH Zurich. I’m interested in designing and building large-scale machine learning and data mining algorithms. In particular, I’m working on large-scale summarization by selecting a representative subset out of a massive data set using techniques from Submodular Optimization. I was recently awarded the Google Anita Borg Memorial Scholarship. Before that, I completed my B.Sc. in computer engineering at University of Tehran and my M.Sc. in computer engineering from Sharif University of Technology where I was working on influence and information propagation through social networks.