GLCsim {Rglimclim}R Documentation

Simulation of multisite, multivariate daily time series

Description

This routine is used to simulate data from models of class GLC.modeldef, typically created via calls to GLCfit. The routine can generate univariate or multivariate sequences: the multivariate case is handled by consecutively simulating from a set of linked models in which there are no circular dependencies. Imputation (i.e. simulation conditioned on all available data values) can be performed, as well as unconditional simulation.

Usage

GLCsim(modeldefs, siteinfo, start, end, nsims, impute.until = end, 
       output = c("daily", "monthly"), which.regions = 0, 
       which.daily = 1:nsims, daily.start = start, daily.end = end, 
       data.file, external.files, simdir, file.prefix, missval = -99.99)

Arguments

modeldefs

Either an object defining a model for a single variable (for univariate simulation), or a list of such objects, each defining a model for a different variable (multivariate simulation). For most variables, these individual model objects will be of class GLC.modeldef; however, the routine also allows for "precipitation-like" variables for which separate GLC.modeldef objects are required for occurrence (the probability of a non-zero value i.e. a "wet" day in the case of precipitation) and intensity (the value if non-zero). A precipitation model must itself be specified as a list with named components Occurrence and Intensity, which are GLC.modeldef objects for logistic and gamma GLMs respectively.

siteinfo

A siteinfo object containing information about the sites to be used in the simulation. This should give all of the attribute values for each site that are required by the models in modeldefs. It can be generated by a call to make.siteinfo. If no value is provided then the routine will use the siteinfo object provided to the GLCfit routine during model fitting, after checking that this object (and hence the site definitions) is the same for each of the variables defined in modeldefs.

start

Start date for simulation, in format YYYYMM where YYYY is the year and MM is the month (so to start in March 1984, the value should be given as 198403). The first day simulated is the first of the month.

end

End date for simulation, similarly. The last day simulated is the last of the month.

nsims

Number of simulations to perform.

impute.until

A date, in the form YYYYMM. Simulated values up to and including the last day of this month will be conditioned on all available observations in the data file (see "Details" section below for more on this). The default value is end, so that all simulated values will be conditioned on all available observations.

output

Chooses whether to produce daily output files, monthly output files or both (the default).

which.regions

A vector of integers containing the codes of regions for which monthly summaries should be produced if monthly outputs have been requested. NB region 0 is the entire area. For more on region definitions, see define.regions and make.siteinfo.

which.daily

Vector of simulation numbers for which to produce daily output if this has been requested. Defaults to all simulations.

daily.start

Start date for daily output, in form YYYYMM. Default is start (see above).

daily.end

End date for daily output. Default is end.

data.file

Name of data file from which to take data for initialisation and imputation. If this is not supplied, the routine will take data from the file that was used for fitting the model(s) in modeldefs (after checking that this file is the same for all elements of modeldefs).

external.files

Character vector of length 3, giving names of files from which to take "external" covariate data (yearly, monthly and daily) to drive the simulations (see the help to GLCfit for more on the use and structure of these files). If not supplied, the routine will take the file names from the model objects in modeldefs. By contrast with data.file, here there is no requirement that the external files should be the same for all elements of modeldefs.

simdir

Name of directory in which to store the output files. This will be interpreted as a pathname relative to the current working directory (see getwd). The routine removes any trailing directory separators "/" and "\" so that, for example, simdir="TestSim" and simdir="TestSim/" have the same effect.

file.prefix

Output files are named in a structured way as, for example, AshdownSim_Daily_Sim0001.dat. Here, AshdownSim is a user-specified character string defined via the file.prefix argument, and the remainder of the filename is generated automatically. By default, file.prefix is the same as simdir with any leading dots and slashes removed.

missval

The value representing missing observations in the input file. The default of -99.99 is the same as that used in GLCfit.

Details

This routine is designed to be used with one or more models that have been fitted to a single univariate or multivariate dataset using GLCfit. The result of a call to GLCfit is a GLC.modeldef object which stores the name of the file containing the data used for model fitting in component $filenames$Data, and also stores the names of the variables in that file in component $var.names. If the argument data.file is not supplied, the simulation routine expects to find this data file in the current working directory; data from the file will be used to initialise simulations, and also for conditioning purposes when imputing missing values (see below for more on both of these points). All of the models defined in modeldefs must reference the same data file; failure to do this will lead to an error.

The routine does not attempt to store the results of simulations internally within memory (see "Value" below for details of what is stored); instead, it writes ASCII files to the directory specified by the simdir argument. Control over these files is provided by the output, which.regions, which.daily, daily.start, daily.end and file.prefix arguments.

The data file may contain variables that are not required by the user when simulating. In this case, the output files will still contain columns corresponding to the non-simulated variables; thus the format of each of the daily output files is exactly the same as that as the original data file (see the data.file argument to GLCfit for details of this format).

The format of monthly output files is as follows:

Most realistic models will contain lagged values of one or more variables as covariates. To initialise a simulation therefore, values for these variables are needed for an appropriate number of days prior to the first day of the simulation. The routine takes these values from the data file if they are present; if not, it uses the overall mean value of each variable, as computed from the cases used to fit the model and stored in the model definition objects. This overall mean may not be a particularly realistic value: for example, in a region with an annual temperature range of 20 degrees, if a simulation is initialised in the middle of winter with the overall mean temperature then the initial values are likely to be around 10 degrees too high. In most practical applications, the effects of such initial condition errors are likely to be short-lived. Nonetheless, it is worth inspecting plots of the simulation results (see the plot method) to check that the period of interest is not affected by initial conditions. To ensure this, in some situations it may be helpful to start the simulations a few months prior to the period of interest. The argument daily.start can then be used to prevent the "start-up" values from being written to the daily output files.

Multivariate simulation cannot be carried out if there are direct or indirect circular dependencies between the variables. An example of a direct circular dependency would occur if the model for variable A included simultaneous (i.e. zero lag) values of B as covariates and vice versa. An indirect dependency would arise if A depended on B, B depended on C and C depended on A. The routine checks for circular dependencies, and terminates with an error message if any are found. Note that mutual dependence at lags greater than zero (e.g. A depends on the previous day's value of B, and B depends on the previous day's value of A) is not a problem.

As well as simulating sequences from the fitted models, the routine can (and will, unless explicitly prevented from doing so) perform random imputations of missing values in the data files: this is done by simulating, for each day, from the distribution of the missing data values conditioned both upon the covariates in the models (including lagged values that were either observed or have already been imputed) and upon the non-missing observations for that day. This provides a means of quantifying uncertainties in quantities of interest due to missing observations. To prevent the routine from carrying out any imputation (i.e. to ensure that the simulations run freely and are not conditioned upon any observations except during initialisation), set the argument impute.until to a date preceding start.

For more details on the algorithms used in the simulation and imputation routines, see the Appendices of the PDF package manual.

Value

The routine returns a list object of class GLCsim, for which print and plot methods are available - see the object class documentation. Use the names command to find the names of the list components. Many of them just duplicate arguments to the routine as called, although there are a few extra ones as well - most of these are self-explanatory. The component RNGstate stores the state of the R random number generator on entry. This is a list containing two named elements: RNGkind and seed. RNGkind is the result of a call to RNGkind on entry; and seed is the result of a call to .Random.seed. This can be used to reinitialise the random number generator to the same state that was used to produce a particular simulation (see "Note" below) - although in most cases, this will be more conveniently achieved using a call to set.seed.

Note

Daily simulation files can be large, so sometimes it may be necessary to delete them after use if storage space is limited. In this case, the simulations can always be recreated by resetting the random number generator and calling GLCsim again with exactly the same arguments as stored in the resulting object. In general, the recommended way to reset the random number generator is using a call to set.seed immediately before the call to GLCsim. For completeness however, the RNGstate component of a GLCsim object stores the values of both RNGkind and .Random.seed on entry.

Author(s)

Richard Chandler (r.chandler@ucl.ac.uk)

References

Yang, C., Chandler, R.E., Isham, V. and Wheater, H.S. (2005). Spatial-temporal rainfall simulation using Generalized Linear Models. Water Resources Research 41, doi:10.1029/2004WR003739.

See Also

GLCfit for information on Rglimclim model objects; also documentation for GLCsim class methods.


[Package Rglimclim version 1.3-6 Index]