A little while ago we<a
an interface to Selenium, a web testing framework.
Since then we’ve learned that Selenium is simply too slow to
use in our work-flow. Hence we started on a faster
reimplementation of the Selenium Remote Control.
The code I spent most time on was the HTTP parser. It is structured as a state machine, so the initial version used the classic FP pattern ofmutually tail recursive functions. The code for parsing an HTTP request looked something like this:
(define (parse-request) (define request-line #f) (define headers #f) (define (parse-request-line) (set! request-line (read-request-line)) (parse-headers)) (define (parse-headers) (let ([line (read-line)]) (if (end-of-input line) (begin (set! headers (reverse headers)) (do-something)) (begin (set! headers (cons line headers)) (parse-headers))))) (parse-request-line))
The code for parsing an HTTP response was very similar:
(define (parse-response) (define response-line #f) (define headers #f) (define (parse-response-line) (set! response-line (read-response-line)) (parse-headers)) (define (parse-headers) (let ([line (read-line)]) (if (end-of-input line) (begin (set! headers (reverse headers)) (do-something-different)) (begin (set! headers (cons line headers)) (parse-headers))))) (parse-response-line))
The real code was several screens long. I wanted to make it
simpler by changing to a functional style, and reusing
common code between the request and response parsing
functions. Converting to functional style is simple:
(define (parse-request) (define (parse-request-line) (define request-line (read-request-line)) (parse-headers request-line)) (define (parse-headers request-line) (define headers (let loop ([line (read-line)]) (if (end-of-input line) null (cons line (loop (read-line)))))) (do-something request-line headers)) (parse-request-line))
Reusing common code is not simple. The finite state machine
pattern doesn’t abstract the next state. For example
do-something whereas the otherwise
identical version in
I solved this by refactoring the code into<a
style, leading to code that looks like the following:
;; shared between parse-request and parse-response (define (parse-headers request-line k) (define headers (let loop ([line (read-line)]) (if (end-of-input line) null (cons line (loop (read-line)))))) (k request-line headers)) (define (parse-request) (define (parse-request-line k) (define request-line (read-request-line)) (k request-line)) (parse-request-line (cut parse-headers <> do-something)))
Note that I’ve used <a
as a short-cut for
I’ve got code reuse but the code itself isn’t nice. The
arguments lists were quite a bit longer in the real code and
most of the time arguments are just passed from function to
function without being used (I’ve seen these sort of
arguments called “tramp data”). I also find
that CPSed code can be quite difficult to read — you
have to construct the control flow graph in your head and
then look at the application site to fill in all the
One way to get rid of tramp data is to use
parameters, and this something we talk about in our experience report. However that solution isn’t appropriate here. It forces me to stick with CPS so I can set the parameters in the dynamic extent of the succeeding code, and it extends the lifetime of the values beyond what is strictly necessary. This could be an issue if storing, say, a large request body in a parameter.
Notice that the different versions of
parse-request only differed in which function
they called with the value they computed. If I separate out
the computation of that value, and the decision of which
function to call I can get code reuse without CPS, and I
don’t have long argument lists! This is what my final solution looks like:
;; shared between parse-request and parse-response (define (parse-headers request-line) (let loop ([line (read-line)]) (if (end-of-input line) null (cons line (loop (read-line)))))) (define (parse-request) (define (parse-request-line) (read-request-line)) (do-something (parse-request-line) (parse-headers))) (define (parse-response) (define (parse-response-line) (read-response-line)) (do-something-different (parse-response-line) (parse-headers)))
It’s short and sweet, and easy to understand.
So let’s recap what I did:
- I started with the mutually tail-recursive FSM pattern (that’s a mouthful!)
- I refactored into continuation-passing style.
- I separated computation and control, and refactored back to direct style.
So three refactoring (direct style to CPS, separating
computation and control, and CPS to direct style), two of
which are particular to functional languages, and one
pattern. I could do with a better name than
“separating computation and control”. If you’re
aware of some prior work or can think of a better name do
let me know.
Although they use different terminology, the programming
language theory and software engineering communities have
explored a lot of the same ground from different
perspectives. Program transformations are pretty much the
same thing as refactorings, though the former are often
presented in the context of compiler optimisations.<a
Pearls are very similar to design patterns.
If you’re a student of software engineering in functional
languages it is necessary to familiarise yourself with this
literature. This can be difficult. There are no books
summarising this literature, as you’ll find for OO
languages, and the papers are often terse and are not
focused on software engineering. This means they can be
difficult to read if you don’t have a background in
programming languages, and you have to read between the
lines a bit.