# Abstracts

### Invited Talks

#### Causal and Statistical Inference with Social Network Data: Massive Challenges and Meager Progress

*Elizabeth Ogburn*

Interest in and availability of social network data has led to increasing attempts to make causal and statistical inferences using data collected from subjects linked by social network ties. But inference about all kinds of estimands, from simple sample means to complicated causal peer effects, is challenging when only a single network of non-independent observations is available. There is a dearth of principled methods for dealing with the dependence that such observations can manifest. We demonstrate the dangerously anticonservative inference that can result from a failure to account for network dependence, explain why results on spatial-temporal dependence are not immediately applicable to this new setting, and describe a few different avenues towards valid statistical and causal inference using social network data.

#### Causal Reasoning for Events in Continuous Time: A Decision-Theoretic Approach

*Vanessa Didelez*

This talk will be concerned with causal reasoning in the context of continuous time (point) processes. It will be shown that various notions that we are familiar with from e.g. (causal) DAGs can be generalised to this case, such as interventions and graphical criteria for identifiability, as well as inverse probability weighting. The relevant graphs, however, are not DAGs, they are local independence graphs which are directed graphs allowing cycles. Using these with survival outcomes and thinking "causally" also allows interesting insights into the notion of independent censoring. The theory will be illustrated with original data from a Norwegian screening project for cervical cancer; the aim is to compare two types of HPV-tests which can be used in the screeing. Local independence graphs and appropriate weighting procedures turn out to be useful for the analysis of these data.

### Contributed Talks and Posters

#### Learning the Structure of Causal Models with Relational and Temporal Dependence

*Katerina Marazopoulou, Marc Meier and David Jensen*

Many real-world domains are inherently relational and temporal - they consist of heterogeneous entities that interact with each other over time. Effective reasoning about causality in such domains requires representations that explicitly model relational and temporal dependence. In this work, we provide a formalization of temporal relational models. We define temporal extensions to abstract ground graphs - a lifted representation that abstracts paths of dependence over all possible ground graphs. Temporal abstract ground graphs enable a sound and complete method for answering d-separation queries on temporal relational models. These methods provide the foundation for a constraint-based algorithm, TRCD, that learns causal models from temporal relational data. We provide experimental evidence that demonstrates the need to explicitly represent time when inferring causal dependence. We also demonstrate the expressive gain of TRCD compared to earlier algorithms that do not explicitly represent time.

#### Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates

*Babak Salimi and Leopoldo Bertossi*

Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-update problem. The unveiled relationships allow us to obtain new complexity results for query causality -the main focus of our work- and also for the two other areas.

#### Causal Interpretation Rules for Encoding and Decoding Models in Neuroimaging

*Sebastian Weichwald, Timm Meyer, Ozan Özdenizci, Bernhard
Schölkopf, Tonio Ball and Moritz Grosse-Wentrup*

How neural activity gives rise to cognition is arguably one of the most interesting questions in neuroimaging. While causal terminology is often introduced in the interpretation of neuroimaging data, causal inference frameworks are rarely explicitly employed.

In our recent work we cast widely used analysis methods in a causal framework in order to foster its acceptance in the neuroimaging community. In particular we focus on typical analyses in which variables' relevance in encoding and decoding models (also known as generative or discriminative models) with a dependent stimulus/response variable is interpreted. By linking the concept of relevant variables to marginal/conditional independence properties we demonstrate that (a) identifying relevant variables is indeed a first step towards causal inference; (b) combining encoding and decoding models can yield further insights into the causal structure, which cannot be gleaned from either model alone. We demonstrate the empirical relevance of our findings on EEG data recorded during a visuomotor learning task.

The rigorous theoretical framework of causal inference allows to expound the assumptional underpinnings and limitations of common (intuitive) analyses in this field. Furthermore, it sheds light on problems covered in recent neuroimaging literature such as confounds in multivariate pattern analysis or interpretation of linear encoding and decoding models.

#### Inference of Cause and Effect with Unsupervised Inverse Regression

*Eleni Sgouritsa, Dominik Janzing, Philipp Hennig and Bernhard Schölkopf*

We address the problem of causal discovery in the two-variable case, given a sample from their joint distribution. Since X -> Y and Y -> X are Markov equivalent, conditional-independence-based methods [Spirtes et al., 2000, Pearl, 2009] can not recover the causal graph. Alternative methods, introduce asymmetries between cause and effect by restricting the function class (e.g., [Hoyer et al., 2009]).

The proposed causal discovery method, CURE, is based on the principle of independence of causal mechanisms [Janzing and Schölkopf, 2010]. For the case of only two variables, it states that the marginal distribution of the cause, say P(X), and the conditional of the effect given the cause P(Y | X) are ``independent'', in the sense that they do not contain information about each other (informally P(X) ``independent of '' P(Y | X)). This independence can be violated in the backward direction: the distribution of the effect P(Y) and the conditional P(X | Y) may contain information about each other because each of them inherits properties from both P(X) and P(Y | X), hence introducing an asymmetry between cause and effect. For deterministic causal relations (Y = f(X)), all the information about the conditional P(Y | X) is contained in the function f, so independence boils down to P(X) ``independent of'' f. Previous work formalizes the independence principle by specifying what is meant by independence. For deterministic non-linear relations, Janzing et al. [2012] and Daniusis et al. [2010] define independence as uncorrelatedness between log f' and the density of P(X), both viewed as random variables. For non-deterministic relations, it is not obvious how to explicitly formalize independence between P(X) and P(Y | X). Instead, we propose an implicit notion of independence, namely that p_Y|X cannot be estimated based on p_X (lower case denotes density). However, it may be possible to estimate p_X|Y based on the density of the effect, p_Y .

In practice, we are given empirical data x in R^N, y in R^N from P(X, Y) and estimate p_X|Y based on y (intentionally hiding x). The relationship between the observed y and the latent x_u in R^N is modeled by a Gaussian Process (GP): p(y | x_u; theta) = N(y; 0;K_xu;xu + sigma^2_n * I_N) (this can be alternatively seen as a single output GP-LVM). Then, the required conditional p_X|Y is estimated as p_hat^y_(X_u |Y) : (x_u; y*) --> p(x_u | y*, y), with p(x_u |y*; y) estimated by marginalizing out the latent x_u and theta (GP hyperparameters).

CURE infers the causal direction by using the procedure above two times: one to estimate p_X|Y based only on y and another to estimate p_Y|X based only on x. If the first estimation is better, X -> Y is inferred. Otherwise, Y -> X. CURE was evaluated on synthetic and real data and often outperformed existing methods. On the downside, its computational cost is comparably high. This work was recently published at AISTATS 2015 [Sgouritsa et al., 2015].

#### Exploiting Causality for Efficient Monitoring in POMDPs

*Stefano V. Albrecht and Subramanian Ramamoorthy*

POMDPs are a useful model for decision making in systems with uncertain states. One of the core tasks in a POMDP is the monitoring task, in which the belief state (i.e. the probability distribution over system states) is updated based on incomplete and noisy observations. This can be a hard problem in complex real-world systems due to the often very large state space. In this article, we explore the idea of accelerating the monitoring task by automatically exploiting causality in the system. We consider a specific type of causal relation, called passivity, which pertains to how system variables cause changes in other variables. Specifically, a system variable is called passive if it changes its value only if it is directly acted upon, or if at least one of the variables that directly affect it (i.e. parent variables) change their values. This property can be readily determined from the conditional probability table of the system variable. We present a novel monitoring method, called Passivity-based Monitoring (PM), which maintains a factored belief state representation and exploits passivity to perform selective updates over the factored beliefs. PM produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PM is faster than two standard monitoring methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a real-world system such as a multi-robot warehouse, and how PM can exploit this to accelerate the monitoring task.

#### An Empirical Study of the Simplest Causal Prediction Algorithm

*Jerome Cremers and Joris Mooij*

We study the simplest causal prediction algorithm that uses only conditional independences in purely observational data. A specific pattern of only four conditional independence relations amongst a quadruple of random variables already implies that one of these variables causes another without any confounding. As a consequence, it is possible to predict what would happen under an intervention on that variable without actually performing the intervention. Although the method is asymptotically consistent and works well in settings with only few (latent) variables, we find that its prediction accuracy can be worse than simple noncausal baselines when many (latent) variables are present. We also find that the accuracy can sometimes be improved by adding more conditional independence tests, but even then the performance need not outperform the baselines. More generally, our findings illustrate that high accuracy of individual conditional independence tests is no guarantee for high accuracy of a combination of such tests. Also, they illustrate the severity of the faithfulness assumption in practice.

#### Visual Causal Feature Learning

*Krzysztof Chalupka, Pietro Perona and Frederick Eberhardt*

We provide a rigorous definition of the visual cause of a behavior that is broadly applicable to the visually driven behavior in humans, animals, neurons, robots and other perceiving systems. Our framework generalizes standard accounts of causal learning to settings in which the causal variables need to be constructed from micro-variables. We prove the Causal Coarsening Theorem, which allows us to gain causal knowledge from observational data with minimal experimental effort. The theorem provides a connection to standard inference techniques in machine learning that identify features of an image that correlate with, but may not cause, the target behavior. Finally, we propose an active learning scheme to learn a manipulator function that performs optimal manipulations on the image to automatically identify the visual cause of a target behavior. We illustrate our inference and learning algorithms in experiments based on both synthetic and real data.

#### Lifted Representation of Relational Causal Models Revisited: Implications for Reasoning and Structure Learning

*Sanghack Lee and Vasant Honavar*

Maier et al. (2010) introduced the relational causal model (RCM) for representing and inferring causal relationships in relational data. A lifted representation, called abstract ground graph (AGG), plays a central role in reasoning with and learning of RCM. The correctness of the algorithm proposed by Maier et al. (2013a) for learning RCM from data relies on the soundness and completeness of AGG for relational d-separation to reduce the learning of an RCM to learning of an AGG. We revisit the definition of AGG and show that AGG, as defined in Maier et al. (2013b), does not correctly abstract all ground graphs. We revise the definition of AGG to ensure that it correctly abstracts all ground graphs. We further show that AGG representation is not complete for relational d-separation, that is, there can exist conditional independence relations in an RCM that are not entailed by AGG. A careful examination of the relationship between the lack of completeness of AGG for relational d-separation and faithfulness conditions suggests that weaker notions of completeness, namely adjacency faithfulness and orientation faithfulness between an RCM and its AGG, can be used to learn an RCM from data.

#### Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information

*Séverine Affeldt and Hervé Isambert*

We report a novel network reconstruction method, which combines constraint-based and Bayesian frameworks to reliably reconstruct graphical models despite inherent sampling noise infinite observational datasets. The approach is based on an information theory result tracing back the existence of colliders in graphical models to negative conditional 3-point information between observed variables. In turn, this provides a confident assessment of structural independencies in causal graphs, based on the ranking of their most likely contributing nodes with (significantly) positive conditional 3-point information. Starting from a complete undirected graph, dispensible edges are progressively pruned by iteratively ``taking off'' the most likely positive conditional 3-point information from the 2-point (mutual) information between each pair of nodes. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3-point information of unshielded triples. This ``3off2'' network reconstruction approach is shown to outperform both constraint-based and Bayesian inference methods on a range of benchmark networks.

#### An Algorithm to Compute the Likelihood Ratio Test Statistic of the Sharp Null Hypothesis for Compliers

*Wen Wei Loh and Thomas S. Richardson*

In a randomized experiment with noncompliance, scientific interest is often in testing whether the treatment exposure X has an effect on the final outcome Y. We have proposed a finite-population significance test of the sharp null hypothesis that X has no effect on Y, within the principal stratum of compliers, using a generalized likelihood ratio test.

As both the null and alternative hypotheses are composite hypotheses (each comprising a different set of distributions), computing the value of the generalized likelihood ratio test statistic requires two maximizations: one where we assume that the sharp null hypothesis holds, and another without making such an assumption.

In our work, we have assumed that there are no Always Takers, such that the nuisance parameter is a bivariate parameter describing the total number of Never Takers with observed outcomes y = 0 and y = 1. Extending the approach to the more general case in which there are also Always Takers would require a nuisance parameter of higher dimension that describes the total number of Always Takers with observed outcomes y = 0 and y = 1 as well. This increases the size of the nuisance parameter space and the computational effort needed to find the likelihood ratio test statistic. We present a new algorithm that extends to solve the corresponding integer programs in the general case where there are Always Takers. The procedure for the infinite-population significance test may be illustrated using a toy example from our UAI Causal Inference Workshop 2013.

#### Segregated Graphs and Marginals of Chain Graph Models

*Ilya Shpitser*

Bayesian networks are a popular representation of asymmetric (for example causal) relationships between random variables. Markov random fields (MRFs) are a complementary model of symmetric relationships used in computer vision, spatial modeling, and social and gene expression networks. A chain graph model under the Lauritzen-Wermuth-Frydenberg interpretation (hereafter a chain graph model) generalizes both Bayesian networks and MRFs, and can represent asymmetric and symmetric relationships together.

As in other graphical models, the set of marginals from distributions in a chain graph model induced by the presence of hidden variables forms a complex model. One recent approach to the study of marginal graphical models is to consider a well-behaved supermodel. Such a supermodel of marginals of Bayesian networks, defined only by conditional independences, and termed the ordinary Markov model, was studied at length in (Evans and Richardson, 2014).

In this paper, we show that special mixed graphs which we call segregated graphs can be associated, via a Markov property, with supermodels of a marginal of chain graphs defined only by conditional independences. Special features of segregated graphs imply the existence of a very natural factorization for these supermodels, and imply many existing results on the chain graph model, and ordinary Markov model carry over. Our results suggest that segregated graphs define an analogue of the ordinary Markov model for marginals of chain graph models.

#### Recovering from Selection Bias using Marginal Structure in Discrete Models

*Robin J. Evans and Vanessa Didelez*

This paper considers the problem of inferring a discrete joint distribution from a sample subject to selection. Abstractly, we want to identify a distribution p(x, w) from its conditional p(x | w). We introduce new assumptions on the marginal model for p(x), under which generic identification is possible. These assumptions are quite general and can easily be tested; they do not require precise background knowledge of p(x) or p(w), such as proportions estimated from previous studies. We particularly consider conditional independence constraints, which often arise from graphical and causal models, although other constraints can also be used. We show that generic identifiability of causal effects is possible in a much wider class of causal models than had previously been known.

#### Advances in Integrative Causal Analysis

*Ioannis Tsamardinos*

Scientific practice typically involves studying a system over a series of studies and data collection, each time trying to unravel a different aspect. In each study, the scientist may take measurements under different experimental conditions and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different distributions. Even so, these are generated by the same causal mechanism. The general idea in Integrative Causal Analysis (INCA) is to identify the set of causal models that simultaneously fit (are consistent) with all sources of data and prior knowledge and reason with this set of models. Integrative Causal Analysis allows more discoveries than what is possible by independent analysis of datasets. In this talk, we'll present advances in this direction that lead to algorithms that can handle more types of heterogeneity, and aim at increasing efficiency or robustness of discoveries. Specifically, we'll present (a) general INCA algorithms for causal discovery from heterogeneous data, (b) algorithms for converting the results of tests to posterior probabilities and allow conflict resolution and identification of the confidence network regions, (d) proof-of-concept applications and massive evaluation on real data of the main concepts, (d) extensions that can deal with prior causal knowledge, and (e) extensions that handle case-control data.