Announcing the F# R Type Provider

August 1, 2012 at 2:52 pm | Posted in opensource | 24 Comments
Tags: , , , , ,

Here at BlueMountain we like to perform statistical analysis of data.  The stats package R is great for doing that.  We also like to use the data retrieval and processing capabilities of F#. F#’s interactive environment lends itself pretty well to data exploration, and we can also easily access our existing .NET-based libraries.  Once we are done, we can build and release production-supportable applications.

Nothing on the .NET platform competes with R for statistical functionality, so we set about bridging the gap between F# and R.  F# 3.0 provides a nice innovative mechanism for doing this, through Type Providers.

We have released an Open Source RProvider on github.  Here’s an example of how to use it:

// Pull in stock prices for some tickers then compute returns
let data = [
    for ticker in [ "MSFT"; "AAPL"; "VXX"; "SPX"; "GLD" ] ->
        ticker, getStockPrices ticker 255 |> R.log |> R.diff ]

// Construct an R data.frame then plot pairs of returns
let df = R.data_frame(namedParams data)
R.pairs(df)

Any of the calls above that begin R. are actually evaluated inside the R engine.

This produces a lovely pair plot like this:

While we intend to continue to enhance the provider to meet our needs, we really hope others will do the same.  If you use F# and work in the statistical/econometrics space, please try it out.  If you use R and are looking for a robust environment in which to develop applications, also try it (and F#) out.  If you have ideas for improvements, please feel free to share them with us.  And if you develop enhancements/fixes, please submit a pull request!

The RProvider is built on the RDotNet project, which handles all the gnarly interop with unmanaged data structures used by R.DLL.  The Type Provider provides an easy-to-use layer on top of that to use R from F#.  Many thanks go to the RDotNet author, Kosei.

24 Comments »

RSS feed for comments on this post. TrackBack URI

  1. I’m working on a similar thing for IronPython (with smaller scale). However, my main problem was running into the aggressive R GC, and losing objects left and right – causing AccessViolationExceptions. I could not find any Protect() calls in your code. How do you fix these objects?

  2. We use RDotNet (rdotnet.codeplex.com) to handle the interop with R.DLL. This uses a type called SymbolicExpression, which is a subclass of SafeHandle. So the SEXP’s get unprotected when the instance of SymbolicExpression is finalized. I would suggest you use R.Net too.

  3. I actually use the same thing. However if I’m constructing something large (say a document corpus) in .Net, R itself starts garbage collecting the older items, before I can finish filling in the CharacterVector. No, calling .Protect() on the vector itself does not solve the issue (but I do have a workaround).

    Anyways, my code is here: http://rdotnet.codeplex.com/SourceControl/network/forks/sukru/pythondynamic . I will also look into yours for ideas.

    • Can you put together a simple test case in C#, just using RDotNet, that reproduces your problem? Make it as minimal as possible. Maybe throw in some explicit R GCs to make it happen more quickly. Send it to us at opensource at bluemountaincapital dot com. I’m trying to put together tests for failing cases so we can experiment with fixes.

      • I sent the sample code to your email.

  4. [...] at least for the kind of data analysis I do. I'm surprised that more aren't using F#, but this (Announcing the F# R Type Provider BlueMountain Capital Tech Blog) is a good start. Of course, I'm still waiting for the rest of the world to realize that LISP is [...]

  5. How to access complex objects?

    I just tried the example, truly great intellisense support!

    I have one question here on accessing R objects. It is easy to access int/real and vectors, etc. However, the objects that returned by library evaluations are usually S3 or S4 objects. How to access these objects?

    My idea is to expose the REngine object, which is inside a internal class RInteropInternal. As engine supports SetSymbol and GetSymbol and Evaluate(), we can let R do things that are simple in R, e.g. the expression x[1:10, ], fit$coef, fit@beta, etc.

    Maybe what I want has already a solution in RProvider?

    Thanks!

    Yin

    • Most of the functionality of S3/S4 objects is exposed as “methods”, which are called using generic functions in R. For example, print() is a generic function that calls the print method on the object. This is accessible as R.print(). As you point out, we don’t have a way of getting at the components/slots in the underlying data structures. I expected that using R.“$“(object, “component”) would work for this, but it doesn’t seem to.

      The more natural way of accessing components would be to define an operator ?, which is the F# dynamic operator. So if one had a dataframe with a column A, one could just call df?A to access A.

      Obviously it is possible to expose the underlying engine and use eval, but I would rather not. The point of the provider is to expose R functionality in an F#-friendly manner, and that would not be F# friendly. So we should try to find solutions that map into F# constructs wherever possible.

      • Thanks for the ? point. I will define some operator to reduce the syntax.

        Now, I use .AsList() to transform an S3 object to a GenericList and use .["name"] to access the value. e.g.

        let fit = R.glmnet(x, y).AsList() # do a regression

        flt.GetAttribute(“lambda”) # get fit$lambda out

        I haven’t tested S4 objects yet.

      • sorry. flt.GetAttribute(“lambda”) should be fit.["lambda"] …

      • Cool. BTW AsList() does no transformation, as such, on the R side. You are basically doing an F#-side type-cast from SymbolicExpression to GenericList. The S3 class is a list. Defining the ? operator would make the syntax slightly nicer (fit?lambda instead of fit.AsList().["lambda"]), but the mechanism seems to work for what you need, right?

  6. Yes. fit.AsList().["lambda"] works fine for S3 objects. This trick does not work for S4 objects. I think I need to dig into the source code of RDotNet to get a good solution. But it is not a problem in RDotNet itself as one can always use engine.Evaluate(“xxx”) to get simple objects inside an S4 object.

    • Right. AsList will only work when the representation is a list. Can you give examples of where you need to access the raw slots of an S4 object?

      • Many of the R libraries I use use S3 objects. But there are a few, say fGarch package, that use S4 objects. The following R code is a typical example where I want slots from an S4 object:

        fit = garchFit(timeSeries)
        fit@h.t

        I know that some of the slots have corresponding functions in the package to extract them out as simple R objects. But it would be nice to have a general way to extract them.

  7. [...] of the projects they have built to support their efforts, like Type Providers for Excel and the R statistical programming language. F# for Trading from [...]

  8. [...] F# comes with Type Providers for SQL, WSDL and OData, but you can also create your own or use one created by the community. A example may be the Type Provider for R. [...]

  9. If “#r @”bin/Debug/RProvider.dll” throws the exception “The type initializer for ‘.$Interop threw an exception” then you need to re-install R as Administrator rather than current user.
    (http://www.broadcastbabble.com/debugging-f-type-providers-and-other-ways-to-waste-hours-of-your-life/)

  10. [...] We like to say "F# loves R", because we can use R packages from F#, through an R type provider for F#.  [...]

  11. [...] Announcing the F# R Type Provider - Released this past August (2012), Blue Mountain has made available an F# R Type Provider (available from GitHub here), allowing for the very powerful analytic combinations that a functional programming manipulation of R’s resources provides. [...]

  12. […] F# comes with Type Providers for SQL, WSDL and OData, but you can also create your own or use one created by the community. A example may be the Type Provider for R. […]

  13. […] aim is to enable similar interoperability as that achieved by the F# Type Provider for R, which allows access to thousands of R statistical packages directly from F#. This project is also […]

  14. […] access databases and getting types from the schema for free, but when Howard Mansell released the R Language Type Provider everything changed. We now had a way to build slick typed APIs on top of almost any other […]

  15. Hello, thank you for the great provider! Are there any plans to support types in the parameters to R calls? for now I see that most parameters in R calls are of type ‘obj’, e.g. in a call “R.acf” in your example code. This doesn’t seem convenient, plus undermines the FP idea of minimal debugging. What do you think? Thank you, Pavel.

    • Glad you like it!

      Providing static types is something we thought about (see ) but it is rather hard. We cannot discover static type information from R, because it is dynamically typed. And because it is dynamically typed, R functions often allow a variety of different types for individual inputs.

      We could take an approach similar to , allowing a separate type definition file that mirrors someone’s understanding of the static type of a dynamically typed module. But it’s a significant engineering effort, and someone would have to build and maintain all those type information files.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com. | Customized Pool Theme.
Entries and comments feeds.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: