This tutorial is in many ways several notches above the previous tutorials and samples you may have seen. It is inspired by a 'real-world' need. Do not be concerned if the concepts and jargon of the scientific domain are unknown to you.
In this tutorial we demonstrate a use case where we want to use the functionalities of R to assess a time series model behavior and performance, with a crude form of Monte-carlo analysis.
You will find the main R script for this tutorial in $r2clr\doc\TutorialEnvModel.r in the source code.
The tutorial relies on a stand-alone library, a simplified version of a C# environmental modelling framework. You will need to compile this library: $r2clr\doc\src\EnvModellingSample\EnvModellingSample.csproj
You will need to be able to install third party packages in R, from CRAN.
Open an R console.
We initialise the rClr package as we did in the previous tutorials. Once the CLR 'bridge' is initialised, we can load the assembly that contains our simplified environmental model, using clrLoadAssembly.
The time series used in this tutorial is in an RData file, as a 'zoo' time series. It comprises to climatic drivers, rainfall and potential evapotranspiration, and the observed runoff. All units are in mm/day, something perhaps surprising for the reader.
The data is derived
from real observations of a rather wet and seasonal catchment in Australia. Note that this is however synthetic data, slightly scrambled, and the metadata is deliberately not included.
The library includes a facade, the SimulationFactory, to create the model engine. The model structure returned is the AWBM (Australian Water Balance Model) (Boughton, W., and R. Mein, 1996). Note that this model implementation
must not be used for purposes other than the present tutorial
The simulation 'variable' is an R object with an external pointer. The function clrReflect returns all the public properties and methods of the underlying CLR object.
At the time of writing, there is not a way yet to show the signature of the methods (i.e., what parameters they take for their execution). So you often need to have access to the source code (C# etc.), in practice, to use it in rClr. The following figure shows
the methods we will use to access the model simulation from R.
Following the R code, you'll see the call to Set the time span of the simulation required to cast numbers with
for start and end indices of the simulation. This is because if you just pass numeric (the default when you type a number in R), then rClr looks for a method SetTimeSpan that takes two double precision floating points ('double').
Once the time span has been set, and the rainfall and PET input time series given, we are ready to roll. The following call will populate the output series of interest, with 20 years of daily data.
We can see visually that there is a strong correlation between rainfall and runoff events, as expected.
Even better for a default output, the superposition of calculated and observed runoff is very good. This is actually not rare for wet, tropical catchments.
A measure of goodness of fit between the calculated and observed runoff series is the
Let's sample the model parameters, ('BFI','KSurf','KBase','C1','C2','C3') to see the NSE score for each. The uniform random sampling used is the most basic form of Monte-Carlo analysis, in a way.
The following loop from 1 to n=1000 runs the model 1000 times with different parameters, we retrieve the 1000 time series of calculated runoff, and get an NSE score for each. We get a couple of summary data frames for further visualisation purposes.
Note that a total runtime more than 2 minutes is somewhat slow but far from totally disappointing. The .NET library 'EnvModellingSample' is as simplified and has a lot of software reflection: this can be overcome in 'real' modelling frameworks.
Also, the runtime may be substantially contributed to on the R side (TODO: profile).
Note: The rest of this tutorial does not illustrate rClr as such.
Using so called dotty plots graphs, one can see the upper enveloppe of the projected response surface
. Seems small and large KSurf are preferable.
Parallel plot. Maybe some patterns noticeable, but overcrowded. Interesting visualisation challenge.
Something appears if limiting to parameter sets with NSE scores above 0.57. Notably, some pattern in BFI, KSurf and KBase suggest parameter correlation.