#### Alireza Rezaei

###### PhD candidate

### Modeling Diversity in Machine Learning Using Determinantal Point Processes

*Saturday, 28 December 2019*

*16:00 - 19:00*

#### Syllabus

A wide variety of machine learning tasks can be cast as an instance of the “subset selection” problem. Given a large dataset of items, the goal of subset selection is to find a small subset which is a good representative of the original data. In particular, the selected subset is expected to preserve the diversity of data. As an example, consider a search task: Given a large number of images or documents, we want to select a subset that are relevant to a user query and also diverse. Note that, diversity is important because many queries can have multiple meanings and aspects, e.g. “apple” can refer to the name of a company or a fruit.

Another application is product recommendation where retailers with a large

inventory need to pick a small subset of their products which are likely to attract customers. To this end, this selected subset not only should contain highly rated products, but also needs to include diverse items.

In this workshop, we consider different notions of diversity and will see how they can be employed to mathematically formulate the problem of choosing a diverse subset of items. In particular, we study determinantal point processes (DPP) as a family of probabilistic models which have recently gained a lot of attention to model diversity. We begin with their applications, and then discuss algorithms for several fundamental tasks related to DPPs, focusing on the problem of sampling from DPPs.

In the final part, I introduce strongly Rayleigh measures, and discuss their basic properties. This family of distributions are in fact generalization of DPPs which are very well-studied in the mathematics community and understanding their properties can be very helpful to gain more insights about DPPs.

Another application is product recommendation where retailers with a large

inventory need to pick a small subset of their products which are likely to attract customers. To this end, this selected subset not only should contain highly rated products, but also needs to include diverse items.

In this workshop, we consider different notions of diversity and will see how they can be employed to mathematically formulate the problem of choosing a diverse subset of items. In particular, we study determinantal point processes (DPP) as a family of probabilistic models which have recently gained a lot of attention to model diversity. We begin with their applications, and then discuss algorithms for several fundamental tasks related to DPPs, focusing on the problem of sampling from DPPs.

In the final part, I introduce strongly Rayleigh measures, and discuss their basic properties. This family of distributions are in fact generalization of DPPs which are very well-studied in the mathematics community and understanding their properties can be very helpful to gain more insights about DPPs.