
x-*
  SOFTWARE FOR "Latent Composite Likelihood Learning for the Structured
  Canonical Correlation Model", COMPANION TO THE UAI 2012 PAPER OF SAME NAME
*-x

This collection of MATLAB files provides the code used in the experiments. None
of this is fully debugged, and no effort whatsoever was made regarding making it
computationally efficient. Its main purpose is to provide documentation on how
the results of the paper were generated.

I cannot provide the NHS data, but the code allows you to create your own
synthetic experiments (the synthetic data used in the paper is also provided).
Use the scripts in the "app" directory. Directory "data" contains already 20
synthetic model/data files as used in the paper. You can recreate them using
the "generate_simulations.m" file from the "app" directory. You can then call
"test_simulations" to fit the data using the three methods discussed in the
paper. This might take a day or two depending in your setup. Finally,
"summarize_results.m" should generate tables and plots to summarize the
generated log files.

The next three sections explain, very briefly, how to setup the necessary paths;
an overview of the main files that might be useful for those that plan to play
with the original code; finally, the last section contains contact information.

1. SETUP

File "setpath.m" sets things up. That's the first one you need to run before
anything else.

You have Kevin Murphy's Bayes net toolbox installed:

> https://code.google.com/p/bnt/

The three main variables you have to change are "PATH_MAIN", which is the main
path of the source directory, and the hopefully self-explanatory "PATH_BN" and
"PATH_KPM" that refers to Murphy's toolbox. The ones in the provided "setpath.m"
file refers to my own directories as examples.

I'm afraid MATLAB's Stats toolbox is used in a few places. You might get away
using some replacements here and there with something like 'stixbox'
(http://www.maths.lth.se/matstat/stixbox/). The Optimization toolbox is
necessary though.

2. SOME DETAILS OF THE FILE STRUCTURE

/app: this directory mainly contains examples of scripts to generate, test and
      evaluate models using synthetic data. Hopefully it can be used as
      templates for other applications

- generate_simulations: generates synthetic data. This data is already available
  in the 'data' folder. Change sample size 'N' accordingly, as well as other
  parameters such as 'signal_range' etc. In the paper, we used a single batch of
  20 datasets of 10000 points each, and used the first 1000, 5000 and all of it
  in three different comparisons
- summarize_results: generate summaries of a batch of synthetic experiments
- synthetic_mcdn: generates MCDN data and a synthetic model
- test_simulations: runs a batch of experiments using the synthetic model files

/data: this contains a sample of synthetic experiments, plus a set of points
       used by the simple numerical integration procedure, and a file with
       the seed number used in a few initialization of the random number
       generator

/model:

- embed_cllik: perform embedding of latent points based on pairwise composite
  likelihood (as used in the NHS experiments described in the paper)
- learn_structure_mcdn_final: implements Algorithms 1 and 2 of the paper
- learn_structure_mcdn_simple: implements Algorithm 0 of the paper

/util

- build_x_weight_library: generates latent variable assignments and all possible
  weights for a range of possible correlations, essentially the content of the
  file "X_files.mat"
  
3. CONTACT

You can contact me at ricardo@stats.ucl.ac.uk. I'll do my best to get back to
you.

  Ricardo Silva, London, August 2012
