Scrap Your Reprinter

Back in 2013, Andrew Rice and I were doing some initial groundwork on how to build tools to help scientists write better code (e.g., with the help of refactoring tools and verification tools). We talked to a lot of scientists who wrote Fortran almost exclusively, so we started creating infrastructure for building tools to work on Fortran. This was the kernel of the CamFort project for which we got an EPSRC grant in 2015 (which is ongoing). The CamFort tool now has a couple of fairly well developed specification/verification features, and a few refactoring features. Early on, I started building everything in Haskell using the brilliant uniplate library, based on the Scrap Your Boilerplate [1] work. This helped us to get the tool off the ground quickly by utilising the power of datatype generic programming. Fairly quickly we hit upon an interesting problem with building refactoring tools: how do you output source code for a refactored AST whilst preserving all the original comments and white space? It is not enough just to pretty print the AST, unless your AST contains all the comments and layout information. Building a parser to capture all this information is extremely hard, and we use a parser generator which limits flexibility (but is really useful for a large grammar). Another approach is to output patch/edit information for the original source code, calculated from the AST.

In the end, I came up with a datatype generic algorithm which I call the reprinter. The reprinter takes the original source code and an updated AST (which contains location information) and maps them into a new piece of source code. Here is an illustration which I’ll briefly explain:


Some source text (arithmetic code in prefix notation here) is parsed into an AST. The AST contains the “spans” of each syntactic fragment: the start position and end position in the original source code (for simplicity in this illustration, just the column number is represented). Some transformation/refactoring is applied next. In this case, the transformation rewrites redundant additions of 0, which happens in the node coming from source locations 10 to 16. The refactored node is marked in red. The reprinting then runs, stitching together the original source code with the updated source tree. A pretty printer is used to generate code for any new nodes, but all the original source text for the other nodes is preserved. The cool thing about this algorithm is that it is datatype generic: it works for any datatype, with some modest side conditions about storing source spans. The implementation uses the Scrap Your Zipper [2] library to do a context-dependent generic traversal of a datatype. In essence, the algorithm is similar to what one might do if you were to spit out edit information from an AST, then apply this to a piece of source text. But, the algorithm does this generically, and in a single simultaneous pass of the AST and the input source text.

I’ve always thought it was a cute and useful algorithm, which combined some cool techniques from functional programming.  As with all the “Scrap Your X” libraries it saves huge amounts of time and messing around, especially when your AST representation keeps changing (which it did/does for us). The algorithm is really useful anywhere you need to update human-written source code in a layout-preserving way; for example, IDEs and refactoring tools but also in interactive theorem provers and program synthesis tools, where you need to synthesise source text into some existing human-written code. This is one of the ways it is used in CamFort, where specifications are synthesised from code analysis data and then inserted as comments into user code.

This summer, I was fortunate enough to have the resources to hire several interns. One of the interns, Harry Clarke, (amongst other things) worked with me to tidy up the code for the reprinter, add some better interfaces, make it usable as a library for others, and write it all up. He presented the work at IFL 2017 and the pre-proceedings version of the paper is available. We are working on a post-proceedings version for December, so any comments gratefully appreciated.

[1] Lämmel, Ralf, and Simon Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. Vol. 38. No. 3. ACM, 2003.

[2] Adams, Michael D. “Scrap your zippers: a generic zipper for heterogeneous types.” Proceedings of the 6th ACM SIGPLAN workshop on Generic programming. ACM, 2010.