weblogging at the beach

« A Scheme Case Study | Main | S3 Doesn't Count the Pennies (Yet) »

August 06, 2007

Refactoring Functional Programs

A little while ago we released an interface to Selenium, a web testing framework. Since then we've learned that Selenium is simply too slow to use in our work-flow. Hence we started on a faster reimplementation of the Selenium Remote Control.

A key part of this system is a proxy server, which is necessary to get around the same origin security restriction in Javascript. I've just finished a large refactoring of the proxy code, and I think the experience is interesting enough to warrant a blog post. While there is a large literature on refactoring object-oriented programs, there is rather less on refactoring functional programs, and what there is tends to concentrate on program transformation tools (long a FP strength) at the expense of collecting useful FP refactorings. This post is a small contribution to redressing the balance.

The code I spent most time on was the HTTP parser. It is structured as a state machine, so the initial version used the classic FP pattern of mutually tail recursive functions. The code for parsing an HTTP request looked something like this:

(define (parse-request)
  (define request-line #f)
  (define headers #f)

  (define (parse-request-line)
    (set! request-line (read-request-line))
    (parse-headers))

  (define (parse-headers)
    (let ([line (read-line)])
      (if (end-of-input line)
          (begin (set! headers (reverse headers))
                 (do-something))  
          (begin (set! headers (cons line headers))
                 (parse-headers)))))

  (parse-request-line))

The code for parsing an HTTP response was very similar:

(define (parse-response)
  (define response-line #f)
  (define headers #f)

  (define (parse-response-line)
    (set! response-line (read-response-line))
    (parse-headers))

  (define (parse-headers)
    (let ([line (read-line)])
      (if (end-of-input line)
          (begin (set! headers (reverse headers))
                 (do-something-different))  
          (begin (set! headers (cons line headers))
                 (parse-headers)))))

  (parse-response-line))

The real code was several screens long. I wanted to make it simpler by changing to a functional style, and reusing common code between the request and response parsing functions. Converting to functional style is simple:

(define (parse-request)
  (define (parse-request-line)
    (define request-line (read-request-line))
    (parse-headers request-line))

  (define (parse-headers request-line)
    (define headers
      (let loop ([line (read-line)])
        (if (end-of-input line)
            null
	    (cons line (loop (read-line))))))
     (do-something request-line headers))

  (parse-request-line))

Reusing common code is not simple. The finite state machine pattern doesn't abstract the next state. For example parse-headers in parse-request always calls do-something whereas the otherwise identical version in parse-response calls do-something-different.

I solved this by refactoring the code into continuation-passing style, leading to code that looks like the following:

;; shared between parse-request and parse-response
(define (parse-headers request-line k)
  (define headers
    (let loop ([line (read-line)])
      (if (end-of-input line)
          null
	  (cons line (loop (read-line))))))
  (k request-line headers))

(define (parse-request)
  (define (parse-request-line k)
    (define request-line (read-request-line))
    (k request-line))

  (parse-request-line
    (cut parse-headers <> do-something)))
Note that I've used cut as a short-cut for lambda.

I've got code reuse but the code itself isn't nice. The arguments lists were quite a bit longer in the real code and most of the time arguments are just passed from function to function without being used (I've seen these sort of arguments called “tramp data”). I also find that CPSed code can be quite difficult to read — you have to construct the control flow graph in your head and then look at the application site to fill in all the continuations. Ugh.

One way to get rid of tramp data is to use parameters, and this something we talk about in our experience report. However that solution isn't appropriate here. It forces me to stick with CPS so I can set the parameters in the dynamic extent of the succeeding code, and it extends the lifetime of the values beyond what is strictly necessary. This could be an issue if storing, say, a large request body in a parameter.

Notice that the different versions of parse-request only differed in which function they called with the value they computed. If I separate out the computation of that value, and the decision of which function to call I can get code reuse without CPS, and I don't have long argument lists! This is what my final solution looks like:

;; shared between parse-request and parse-response
(define (parse-headers request-line)
  (let loop ([line (read-line)])
    (if (end-of-input line)
        null
	(cons line (loop (read-line))))))

(define (parse-request)
  (define (parse-request-line)
    (read-request-line))

  (do-something (parse-request-line)
                (parse-headers)))

(define (parse-response)
  (define (parse-response-line)
    (read-response-line))

  (do-something-different (parse-response-line)
                          (parse-headers)))

It's short and sweet, and easy to understand.

So let's recap what I did:

So three refactoring (direct style to CPS, separating computation and control, and CPS to direct style), two of which are particular to functional languages, and one pattern. I could do with a better name than “separating computation and control”. If you're aware of some prior work or can think of a better name do let me know.

Although they use different terminology, the programming language theory and software engineering communities have explored a lot of the same ground from different perspectives. Program transformations are pretty much the same thing as refactorings, though the former are often presented in the context of compiler optimisations. Functional Pearls are very similar to design patterns.

If you're a student of software engineering in functional languages it is necessary to familiarise yourself with this literature. This can be difficult. There are no books summarising this literature, as you'll find for OO languages, and the papers are often terse and are not focused on software engineering. This means they can be difficult to read if you don't have a background in programming languages, and you have to read between the lines a bit.

Posted by Noel at August 6, 2007 01:23 PM

Trackback Pings

TrackBack URL for this entry:
http://www.untyped.com/mt/mt-tb.cgi/119

Comments

In your final solution there are a lot of k arguments that are unused, just fyi.

I think the basic design problem in your first version is that each inner function does too much. parse-response-line does not only parse the response line, it also parses the headers! So a more accurate name for it would be parse-response-line-and-then-parse-headers, and this should immediately make you want to split that function into two.

The next design problem is that parse-response-line and parse-headers deliver their results through side-effects to shared variables. This is a consequence of the first design flaw.

And once the inner functions return their results normally you will notice that they can be lifted outside their containing functions, and that they can share implementation.

That implementation can be functional or imperative (it won't matter to the outside world anymore -- the interface will be the same).

There is nothing specifically functional about this advice. It works the same in the imperative world too. Try to minimize the responsibilities of each function/procedure/module if possible.

Posted by: Magnus Jonsson at August 7, 2007 04:46 AM

Thanks Magnus. I removed the extra ks

It's interesting that you choose a different route to the final solution. I wanted to illustrate that functional code can be systematically refactored, to show there are refactorings and patterns specific to functional code, and mention where one might learn about them. I think these points stand regardless of what you consider the flaws in the initial code.

Posted by: Noel at August 7, 2007 03:53 PM