Wednesday, February 20, 2013

Syntax checking in Scala String Interpolators

In the second post of this series I showed how you can write your own string interpolators for Scala 2.10. The examples were designed to work regardless of the content of the processed string. In many other cases, we care what the string looks like and its form affects what we want to do. Often we want to complain if the content of the string is not acceptable. Options are a run-time error or a compile-time one. The latter case requires us to extend the compiler, which I will do using 2.10’s experimental macro features.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

Octal number literals

Suppose that we want to use the notation o"177" to stand for an integer literal whose value is specified in octal (base 8). Our interpolator will have to perform a numeric conversion or complain if the string literal is not a legal octal number. (For simplicity, we ignore interpolated expressions in this example.)

We can define a simple octal number interpolator as follows.

implicit class OctalContext (val sc : StringContext) {

  def o () : Int = {
    val orig = sc.parts.mkString
    val OctalNum = "[0-7]+".r
    orig match {
      case OctalNum () =>
        Integer.parseInt (orig, 8)
      case _ =>
        sys.error ("Can only contain 0-7 characters")
    }
  }

}

We access the string literal using the parts method of the StringContext which returns a list of all of the context string parts from the literal. In this case there will only be one since we are not supporting interpolated expressions.

Once we have the string, a regular expression pattern checks whether the literal is in the correct form or not. If the form is ok, we convert the string and return its integer value. Otherwise, we throw an error to complain about the format violation.

println (o"177")
println (o"49")
127
java.lang.RuntimeException: Can only contain 0-7 characters
        at scala.sys.package$.error(package.scala:27)
        at Octal$OctalContext.o(Octal.scala:12)
        at Octal$.main(Octal.scala:20)

Compile-time checking

Run-time errors are fine in some situations, but many of us would like to get stronger guarantees about our code. In particular, we’d like the compiler to complain if we try to write an octal number literal but get the format wrong.

One way to achieve this kind of checking is to extend the compiler with a macro. Macros are a new experimental feature in Scala 2.10 and are planned to be fully-supported in 2.11. In a nutshell, a macro is given access to the compiler’s abstract syntax tree (AST) for a method call. It can return a new AST that the compiler will use instead of the original call. Thus, the macro can replace the call that a user writes with any legal expression.

(Strictly speaking, the macros we use here are def macros because they implement the bodies of def constructs. Other macro styles in development will be able to replace types and other Scala constructs.)

Another alternative to get this kind of checking is a full-blown compiler plugin. A plugin can be more powerful than a macro, because it has full access to the whole compilation unit. However, writing a plugin is harder since much of the plumbing and book-keeping must be implemented. In contrast, the macro system takes care of many of the details of hooking into the compilation process, accessing the abstract syntax tree, and so on. We can focus on the actual replacement that we want to construct.

In our experience Scala 2.10 macros are pretty reliable, but you should be aware that they are experimental so it might be dangerous to rely on them. It’s also easy to get yourself or the compiler in a tangle when writing a macro since you are essentially working with the compiler’s internal representations.

An octal number literal macro

Our plan is to replace the octal number interpolator we wrote above with one that is implemented by a macro. The advantage is that the macro will execute at compile time so it will be able to issue compile-time errors if we get the string literal wrong. In the non-error case the macro will be able to perform the number conversion, thereby saving the program from having to perform it at run-time.

First, we write the o method, but instead of giving the full implementation, we use the macro keyword to indicate that this method is a macro that is implemented by the OctalImpl.oImpl method. This simple notation suffices to get the compiler to call the macro at compile-time and use its result.

implicit class OctalContext (val sc : StringContext) {

  def o () : Int =
    macro OctalImpl.oImpl

}

An import of scala.language.experimental.macros or the corresponding command-line option will be necessary to enable the macros feature. It is also necessary to ensure that the macro and its uses are not in the same compilation unit, since the compiler needs to have access to the compiled macro implementation when it compiles the uses.

The macro signature

The signature of a macro is closely related to that of the method that it implements. The signature of the oImpl method that implements o is as follows.

def oImpl (c : Context) () : c.Expr[Int]

The first parameter list contains a Context argument that can be used by the macro to find out about the context in which it has been called. The second parameter list contains one argument for each argument of the method. The arguments passed here are the Scala AST representations of the argument expressions used in the original method call. In our case, the o method has no parameters so the second parameter list of the macro is empty. Finally, the return value of the macro is a Scala AST that represents the expression which we want to use to replace the macro call. The return type is path-dependent on the context and says that the value must be an expression whose type is the type of the original method. Thus, we have a return type of c.Expr[Int] here because o returns an Int.

The context and the trees

The context gives us access to many wonderful things. For example, the context provides position information for the macro call and a way to issue error messages.

The context’s universe gives us the Scala AST node definitions which we will need to query and construct trees. To save space in the code, we introduce a local alias u for the universe.

import c.{universe => u}

and import everything

import u._

Importantly for the octal number literal macro, the context gives us access to the AST on which the method call in question has been made. This tree is returned by c.prefix.tree. The call o"177" desugars to

OctalContext (StringContext ("177")).o ()

after the implicit conversion has been applied. The prefix tree represents the bit without the method call.

OctalContext (StringContext ("177"))

A common development approach is to print out the trees to see what the compiler is giving you. The u.show and u.showRaw methods are particularly useful for this purpose. For example, u.show (c.prefix.tree) returns the following in the macro we are defining for the call o"177" (modulo some formatting). This tree has fully-qualified names and explicitly calls the apply method to construct the string context, but otherwise is the same as above.

OctalMacros.OctalContext (scala.StringContext.apply ("177"))

We can see the actual tree nodes used in the representation of this expression in the compiler by printing u.showRaw (c.prefix.tree) (reformatted to make the structure clear).

Apply (
  Select (Ident (OctalMacros), newTermName ("OctalC")),
  List (
    Apply (
      Select (Select (Ident (scala), scala.StringContext), newTermName ("apply")),
      List (
        Literal (Constant ("177"))))))

Thus, we can see that inside the compiler the prefix will be represented by an AST containing Apply nodes where methods are applied and a literal constant node containing the string "177".

Getting at the literal string

Armed with our knowledge about the structure of the prefix tree, we can easily pattern match the literal string out of the tree and call it orig.

val Apply (
      _,
      List (
        Apply (
           _,
          List (
            Literal (Constant (orig : String)))))) =
              c.prefix.tree

The layout of this pattern parallels the layout of the output of showRaw above.

Check, convert or complain

Now that we have the string literal in our grasp, the macro can proceed to check its format. The code is similar to the code we use earlier in our run-time version.

val OctalNum = "[0-7]+".r

orig match {

  case OctalNum () =>
    c.Expr[Int] (Literal (Constant (Integer.parseInt (orig, 8))))

  case _ =>
    c.error (c.enclosingPosition, "Must only contain 0-7 characters")
    c.Expr[Int] (Literal (Constant (0)))

}

The differences are in what is returned. If the format is ok, we perform the conversion to get the integer value. However, instead of returning that number, we return an AST that represents an expression that evaluates to that number. That expression is

Literal (Constant (Integer.parseInt (orig, 8))))

In other words, a literal integer constant containing the value we want. The c.Expr[Int] constructor combines the expression tree with its type representation.

In the error case, we call the c.error method to report a compile-time error. c.enclosingPosition gives us the source code position of the macro call. c.error is a Unit method, so we need a dummy return value to satisfy the return type.

Our macro implementation is complete. In summary, the macro is given the AST that represents the call. We delve into the AST to access the string literal from the call. We process that literal to check its format. If the format is ok, we convert it to a decimal value and return an AST that represents that value. If the format is not ok, we trigger a compile-time error that says so.

Using the macro-based interpolator

The macro can be used (in a different compilation unit) in the same way as our previous non-macro implementation. This abstraction is nice since you can switch the implementation to a macro without requiring users to rewrite their code. Of course, they must recompile it.

def main (args : Array[String]) {
  println (o"177")
}
127

If we try a literal that is not a valid octal number, we get the expected compile-time error.

OctalWithMacro.scala:8: error: Must only contain 0-7 characters
    println (o"49")
             ^

Why use a string interpolator?

Some readers may be wondering why we bother with a string interpolation for this example. All of the work is being done by the macro and the interpolation is not really contributing very much. In this case that is true, although I quite like the o"177" syntax, compared to something like o ("177") which we would have to use with a pure macro implementation.

String interpolations come much more into their own as an interface to macros when the string format is more complex and the processed strings can contain interpolated expressions. Having a standard and concise syntax to indicate where the expressions are to be placed is a big advantage. Otherwise, every macro writer needs to invent their own convention to pass the pieces to the macro.

What’s next?

The octal number example is a simple case of a very general problem. The format of a processed string can be quite tightly defined, perhaps by a formal syntax. We would like to be informed by the compiler if we err in the syntax of a string. Later posts will revisit this issue.

First though, in the next post we flip the problem around. Instead of using processed strings to construct values, we will use them to deconstruct values via pattern matching. We will see that the processed string syntax leads to quite concise pattern matching of complex structures.

No comments: