In the second post of this series I showed how you can write your own string interpolators for Scala 2.10. The examples were designed to work regardless of the content of the processed string. In many other cases, we care what the string looks like and its form affects what we want to do. Often we want to complain if the content of the string is not acceptable. Options are a run-time error or a compile-time one. The latter case requires us to extend the compiler, which I will do using 2.10’s experimental macro features.
The complete code for all examples in this series will be made available in the accompanying BitBucket project.
Octal number literals
Suppose that we want to use the notation o"177" to stand for an
integer literal whose value is specified in octal (base 8).
Our interpolator will have to perform a numeric conversion or complain
if the string literal is not a legal octal number.
(For simplicity, we ignore interpolated expressions in this example.)
We can define a simple octal number interpolator as follows.
implicit class OctalContext (val sc : StringContext) { def o () : Int = { val orig = sc.parts.mkString val OctalNum = "[0-7]+".r orig match { case OctalNum () => Integer.parseInt (orig, 8) case _ => sys.error ("Can only contain 0-7 characters") } } }
We access the string literal using the parts method of the
StringContext which returns a list of all of the context string
parts from the literal.
In this case there will only be one since we are not supporting
interpolated expressions.
Once we have the string, a regular expression pattern checks whether the literal is in the correct form or not. If the form is ok, we convert the string and return its integer value. Otherwise, we throw an error to complain about the format violation.
println (o"177") println (o"49")
127 java.lang.RuntimeException: Can only contain 0-7 characters at scala.sys.package$.error(package.scala:27) at Octal$OctalContext.o(Octal.scala:12) at Octal$.main(Octal.scala:20)
Compile-time checking
Run-time errors are fine in some situations, but many of us would like to get stronger guarantees about our code. In particular, we’d like the compiler to complain if we try to write an octal number literal but get the format wrong.
One way to achieve this kind of checking is to extend the compiler with a macro. Macros are a new experimental feature in Scala 2.10 and are planned to be fully-supported in 2.11. In a nutshell, a macro is given access to the compiler’s abstract syntax tree (AST) for a method call. It can return a new AST that the compiler will use instead of the original call. Thus, the macro can replace the call that a user writes with any legal expression.
(Strictly speaking, the macros we use here are def macros because
they implement the bodies of def constructs.
Other macro styles in development will be able to replace types and
other Scala constructs.)
Another alternative to get this kind of checking is a full-blown compiler plugin. A plugin can be more powerful than a macro, because it has full access to the whole compilation unit. However, writing a plugin is harder since much of the plumbing and book-keeping must be implemented. In contrast, the macro system takes care of many of the details of hooking into the compilation process, accessing the abstract syntax tree, and so on. We can focus on the actual replacement that we want to construct.
In our experience Scala 2.10 macros are pretty reliable, but you should be aware that they are experimental so it might be dangerous to rely on them. It’s also easy to get yourself or the compiler in a tangle when writing a macro since you are essentially working with the compiler’s internal representations.
An octal number literal macro
Our plan is to replace the octal number interpolator we wrote above with one that is implemented by a macro. The advantage is that the macro will execute at compile time so it will be able to issue compile-time errors if we get the string literal wrong. In the non-error case the macro will be able to perform the number conversion, thereby saving the program from having to perform it at run-time.
First, we write the o method, but instead of giving the full
implementation, we use the macro keyword to indicate that this
method is a macro that is implemented by the OctalImpl.oImpl method.
This simple notation suffices to get the compiler to call the macro at
compile-time and use its result.
implicit class OctalContext (val sc : StringContext) { def o () : Int = macro OctalImpl.oImpl }
An import of scala.language.experimental.macros or the corresponding
command-line option will be necessary to enable the macros feature.
It is also necessary to ensure that the macro and its uses are not in
the same compilation unit, since the compiler needs to have access to
the compiled macro implementation when it compiles the uses.
The macro signature
The signature of a macro is closely related to that of the method that
it implements.
The signature of the oImpl method that implements o is as follows.
def oImpl (c : Context) () : c.Expr[Int]
The first parameter list contains a Context argument that can be
used by the macro to find out about the context in which it has been
called.
The second parameter list contains one argument for each argument of
the method.
The arguments passed here are the Scala AST representations of the
argument expressions used in the original method call.
In our case, the o method has no parameters so the second parameter
list of the macro is empty.
Finally, the return value of the macro is a Scala AST that represents
the expression which we want to use to replace the macro call.
The return type is path-dependent on the context and says that the
value must be an expression whose type is the type of the original
method.
Thus, we have a return type of c.Expr[Int] here because o returns
an Int.
The context and the trees
The context gives us access to many wonderful things. For example, the context provides position information for the macro call and a way to issue error messages.
The context’s universe gives us the Scala AST node definitions which
we will need to query and construct trees.
To save space in the code, we introduce a local alias u for the
universe.
import c.{universe => u}
and import everything
import u._
Importantly for the octal number literal macro, the context gives us
access to the AST on which the method call in question has been made.
This tree is returned by c.prefix.tree.
The call o"177" desugars to
OctalContext (StringContext ("177")).o ()
after the implicit conversion has been applied. The prefix tree represents the bit without the method call.
OctalContext (StringContext ("177"))
A common development approach is to print out the trees to see what
the compiler is giving you.
The u.show and u.showRaw methods are particularly useful for this
purpose.
For example, u.show (c.prefix.tree) returns the following in the
macro we are defining for the call o"177" (modulo some formatting).
This tree has fully-qualified names and explicitly calls the apply
method to construct the string context, but otherwise is the same as
above.
OctalMacros.OctalContext (scala.StringContext.apply ("177"))
We can see the actual tree nodes used in the representation of this
expression in the compiler by printing u.showRaw (c.prefix.tree)
(reformatted to make the structure clear).
Apply ( Select (Ident (OctalMacros), newTermName ("OctalC")), List ( Apply ( Select (Select (Ident (scala), scala.StringContext), newTermName ("apply")), List ( Literal (Constant ("177"))))))
Thus, we can see that inside the compiler the prefix will be
represented by an AST containing Apply nodes where methods are
applied and a literal constant node containing the string "177".
Getting at the literal string
Armed with our knowledge about the structure of the prefix tree, we
can easily pattern match the literal string out of the tree and call
it orig.
val Apply ( _, List ( Apply ( _, List ( Literal (Constant (orig : String)))))) = c.prefix.tree
The layout of this pattern parallels the layout of the output of
showRaw above.
Check, convert or complain
Now that we have the string literal in our grasp, the macro can proceed to check its format. The code is similar to the code we use earlier in our run-time version.
val OctalNum = "[0-7]+".r orig match { case OctalNum () => c.Expr[Int] (Literal (Constant (Integer.parseInt (orig, 8)))) case _ => c.error (c.enclosingPosition, "Must only contain 0-7 characters") c.Expr[Int] (Literal (Constant (0))) }
The differences are in what is returned. If the format is ok, we perform the conversion to get the integer value. However, instead of returning that number, we return an AST that represents an expression that evaluates to that number. That expression is
Literal (Constant (Integer.parseInt (orig, 8))))
In other words, a literal integer constant containing the value we
want.
The c.Expr[Int] constructor combines the expression tree with its
type representation.
In the error case, we call the c.error method to report a compile-time
error.
c.enclosingPosition gives us the source code position of the
macro call.
c.error is a Unit method, so we need a dummy return value to
satisfy the return type.
Our macro implementation is complete. In summary, the macro is given the AST that represents the call. We delve into the AST to access the string literal from the call. We process that literal to check its format. If the format is ok, we convert it to a decimal value and return an AST that represents that value. If the format is not ok, we trigger a compile-time error that says so.
Using the macro-based interpolator
The macro can be used (in a different compilation unit) in the same way as our previous non-macro implementation. This abstraction is nice since you can switch the implementation to a macro without requiring users to rewrite their code. Of course, they must recompile it.
def main (args : Array[String]) { println (o"177") }
127
If we try a literal that is not a valid octal number, we get the expected compile-time error.
OctalWithMacro.scala:8: error: Must only contain 0-7 characters println (o"49") ^
Why use a string interpolator?
Some readers may be wondering why we bother with a string
interpolation for this example.
All of the work is being done by the macro and the interpolation is
not really contributing very much.
In this case that is true, although I quite like the o"177" syntax,
compared to something like o ("177") which we would have to use with
a pure macro implementation.
String interpolations come much more into their own as an interface to macros when the string format is more complex and the processed strings can contain interpolated expressions. Having a standard and concise syntax to indicate where the expressions are to be placed is a big advantage. Otherwise, every macro writer needs to invent their own convention to pass the pieces to the macro.
What’s next?
The octal number example is a simple case of a very general problem. The format of a processed string can be quite tightly defined, perhaps by a formal syntax. We would like to be informed by the compiler if we err in the syntax of a string. Later posts will revisit this issue.
First though, in the next post we flip the problem around. Instead of using processed strings to construct values, we will use them to deconstruct values via pattern matching. We will see that the processed string syntax leads to quite concise pattern matching of complex structures.















No comments:
Post a Comment