Discussion About Object-Oriented Design At Carl Boettiger’s Blog, Plus Notes On “Broken” Tutorials
A blessing and a curse of the R programming interface is that methods development occurs at a community scale. Progress tends to occur in a piecemeal fashion as developers make small advances within a massive, highly integrated, but often chaotically organized community. The great thing about R is that so many people are doing so many things in a modular framework that nearly any sort of analysis is possible. The drawback is that anytime there’s a flaw or a fundamental change in a given module (e.g., an R package, or a particular function), it can have widespread and unanticipated effects on any package that depends on it.
Over at his website/blog, Carl Boettiger points out that this problem is exacerbated by the fact that object-oriented design is rare in R, and as a result, changes in the underlying mechanics of functions tend to result in critical changes in the nature of their output, which spells trouble for dependencies. Carl does a nice job explaining how the use of object-oriented design can prevent such problems.
The context of Carl’s post is that the authors of the popular “geiger” R package have just released a major update that changes to many functions, function names, and outputs. According to Carl, most of these changes are great, but the rub is that several cause failures in packages, scripts, or tutorials that depend on “geiger”. These include Carl’s excellent “pmc” package, Travis Ingram‘s “surface” package, and perhaps most relevant to the Bodega crowd… several of our R phylogenetics tutorials! I’ve gotten several emails over the past week or so mentioning that these tutorials no longer work due to the “geiger” change.
If you’re interested in this sort of thing, check out Carl’s post, as well as the lengthy discussion it’s generated in his comments section. As for our tutorials, I’ll try to post new versions of the ones I wrote for continuous traits shortly. Most of the updates will be straightforward (thanks to Jeremy Brown for highlighting the necessary changes when he first wrote me about it), but those involving “pmc” will likely have to wait until Carl releases an update that’s compatible with the new “geiger”. Of course in the meantime, it should be possible to run the tutorials using older versions of “geiger”.
Luke, thanks for highlighting Carl’s blog post and the changes to Geiger. I agree with Carl’s ideas about standardizing certain programming practices to make interoperable code more stable, but also empathize with those who’ve commented on his post and point out the lack of incentive for doings so. In any case, hopefully it won’t be too much trouble to get Carl’s pmc package and the various tutorials and packages that use geiger back up and running soon.
Thanks Jeremy. I agree – folks there raise very good points. These sorts of practices can be a bit altruistic in that they come at a cost, and don’t usually result in any formal recognition. Although presumably if you start early, it becomes second nature and isn’t a big deal (see the recent post by Mike May). Of course this isn’t something I myself have done (or frankly an issue I’ve been terribly aware of until recently)….
Thanks for pointing out these changes Luke. These types have changes have been going on for as long as R as been used for phylogenetics. It might help if the developers who are making these changes give users some justification for why the changes are being made. I remember back when Paradis’s ape package used negative numbers to denote internal nodes. In this case, he changed them to positive integers shortly after publishing a book that taught users how to take advantage of ape and this caused some confusion for students in the applied phylogenetics workshop. I think the change made sense, but it sure was a nuisance at the time.
Couldn’t be said better: “It might help if the developers who are making these changes give users some justification for why the changes are being made”. It may not be the case here but too many times changes are made regardless whether it’s the right time or not for those changes and with no firm reasons.