Friday, May 3, 2013

Scala worksheet for Sublime Text 3

My latest weekend hacking project is a worksheet plugin for Scala in Sublime Text 3. It was inspired by the Scala IDE for Eclipse worksheet plugin.

The idea of the Sublime Text plugin is that you enter Scala REPL commands into a file, then hit a keystroke to send the commands to the REPL. The plugin will display the results in another editor view and align them with the input so you can easily see which result was produced by which command. It’s a nice way to try some things out without having to re-enter or edit things in the REPL itself.

The project page has more details. Please let me know how it goes if you try it out.

Wednesday, February 27, 2013

Pattern Matching with String Interpolation

So far in this series we’ve seen how to use string interpolation in Scala 2.10 to construct values of various types. Optionally, we can perform run-time or compile-time checking to make sure that the processed strings are sensible.

In this post we consider how to use string interpolation with pattern matching to deconstruct values. Instead of interpolating expression values, we will interpolate arbitrary patterns and thereby be able to extend pattern matching in nice ways.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

A more general version of s

As it stands, the library interpolator s can only be used to construct string values.

println (s"One plus one is ${1 + 1}")
One plus one is 2

Suppose that we want to use s to pattern match as well. In other words, we want to get exact matches for the constant strings and to recursively match interpolated sub-patterns.

Here’s an example of what we want to achieve. Our new interpolator is called mys. Construction works just as for s.

println (mys"One plus one is ${1 + 1}")

We also want to use mys in a pattern-matching context; e.g., in a case of a match expression. In this example, we match the string "The sky is blue" against the pattern "The $thing is $colour".

val msg = "The sky is blue" match {
            case mys"The $thing is $colour" =>
              mys"A $colour thing is $thing"
            case _ =>
              "no match"
println (msg)

The constant strings "The ", " is " and an empty string at the end match exactly. The interpolated patterns thing and colour match the characters between the constant strings. In this case the interpolated patterns are variable patterns so the effect is to bind those variables to the strings that are matched in those positions. The bound variables are used in the body of the case to construct the result.

One plus one is 2
A blue thing is sky

Construction and deconstruction together

Our first step toward building mys is to consider how we can make it both construct and deconstruct. In our earlier examples, the function that performed the interpolation was a method of the context implicit class. Clearly we can’t make a single method both construct and deconstruct and keep the same interface. But we can use an object to provide both directions. We rely on the fact that an object with an apply method can be used as a method.

The overall structure of MySContext is as follows.

implicit class MySContext (val sc : StringContext) {

  object mys {

    def apply (args : Any*) : String =
      sc.s (args : _*)




Instead of a mys method, we now have a mys object. The mys.apply method implements the construction direction by calling s. We can add other methods to mys as necessary. In particular, we will add an unapplySeq method to make mys into an extractor object that can be called in a pattern matching situation.

Desugaring pattern matching

Before we finish writing MySContext, we need to understand how a processed string desugars when it is used in a pattern context. The general form is the same as before, except that the interpolated pieces marked with dollars are patterns now, not expressions.

id"text0${pat1}text1 ... ${patn}textn"

This general form is translated by the compiler into

StringContext ("text0", "text1", ... , "textn").id (pat1, ... , patn)

The assumption is that id is an extractor object. If you are hazy on how extractor objects work you can find a quick overview in the Tour of Scala.

The unappply (or more usually, unapplySeq) method of id is responsible for doing the matching and returning values that are to be recursively matched against pat1 to patn.

Making mys into an extractor object

Now that we know how an interpolated pattern desugars, we need to add an unapplySeq method to the mys object. We will use unapplySeq instead of unapply because we want to be able to return an arbitrary number of results when a match is found. We can’t predict how many nested patterns there will be.

Our method for implementing the matching process is to construct a regular expression and to rely on regular expression matching to do the hard work. The constant strings are left unchanged, while each interpolated pattern is represented by (.+). The parentheses define a group so when we run the match we will be given the value that matched this sub-regexp.

Here is the full version of mys with unapplySeq.

implicit class MySContext (val sc : StringContext) {

  object mys {

    def apply (args : Any*) : String =
      sc.s (args : _*)

    def unapplySeq (s : String) : Option[Seq[String]] = {
      val regexp = ("(.+)").r
      regexp.unapplySeq (s)



Many readers will observe that this implementation is ripe for an injection attack. unapplySeq does not deal with a situation where the constant strings contain regular expression notation. You might like to extend the code to escape that notation before trying the match.

Using more complex sub-patterns

Let’s look at a more complex example of using this interpolator where sub-patterns are used to constrain what can match.

Suppose that we want to match strings like "val age = 34". Furthermore, we want to make sure that the value name is a valid identifier and that the assigned value is a number. We assume that identifiers are any string of letters and digits that begins with a letter, and that numbers are a non-empty sequence of digits.

The method matchValDef uses an interpolation to match the overall structure of the strings we want to accept. Sub-patterns defined using regular expressions further constrain the non-constant parts. Since the interpolated parts are just patterns, we can use any valid Scala pattern matching syntax. matchValDef returns a pair of the matched non-constant parts if the match succeeds, or None if it doesn’t.

def matchValDef (s : String) : Option[(String,String)] = {

  val Ident = """(\p{Alpha}\p{Alnum}*)""".r
  val Value = """(\d+)""".r

  s match {
    case mys"val ${Ident (ident)} = ${Value (value)}" =>
      Some ((ident, value))
    case _ =>


Now we can use matchValDef to pull different candidate strings apart.

println (matchValDef ("val age = 34"))
println (matchValDef ("val sum88 = 12345"))
println (matchValDef ("val age = bob"))
println (matchValDef ("val 99 = 34"))
println (matchValDef ("age = 34"))
println (matchValDef ("val age 34"))

We could easily convert matchValDef into an extractor that could be used directly in a case.

What’s next?

The ability to extend pattern matching using extractors is very powerful. Connecting extractors to string interpolation as we’ve seen in this post makes that power more natural to use since the fixed parts are written explicitly.

Our examples have been confined to deconstructing strings. We can also build deconstructing interpolators that take non-strings as input and return non-string values. In the following posts in this series, we will show examples where the values being matched and returned are abstract syntax trees that represent more complex structures. Construction and deconstruction will use a formal language syntax but the values being manipulated will be trees. This capability is extremely useful if we want to write programs that manipulate structured data.

Wednesday, February 20, 2013

Syntax checking in Scala String Interpolators

In the second post of this series I showed how you can write your own string interpolators for Scala 2.10. The examples were designed to work regardless of the content of the processed string. In many other cases, we care what the string looks like and its form affects what we want to do. Often we want to complain if the content of the string is not acceptable. Options are a run-time error or a compile-time one. The latter case requires us to extend the compiler, which I will do using 2.10’s experimental macro features.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

Octal number literals

Suppose that we want to use the notation o"177" to stand for an integer literal whose value is specified in octal (base 8). Our interpolator will have to perform a numeric conversion or complain if the string literal is not a legal octal number. (For simplicity, we ignore interpolated expressions in this example.)

We can define a simple octal number interpolator as follows.

implicit class OctalContext (val sc : StringContext) {

  def o () : Int = {
    val orig =
    val OctalNum = "[0-7]+".r
    orig match {
      case OctalNum () =>
        Integer.parseInt (orig, 8)
      case _ =>
        sys.error ("Can only contain 0-7 characters")


We access the string literal using the parts method of the StringContext which returns a list of all of the context string parts from the literal. In this case there will only be one since we are not supporting interpolated expressions.

Once we have the string, a regular expression pattern checks whether the literal is in the correct form or not. If the form is ok, we convert the string and return its integer value. Otherwise, we throw an error to complain about the format violation.

println (o"177")
println (o"49")
java.lang.RuntimeException: Can only contain 0-7 characters
        at scala.sys.package$.error(package.scala:27)
        at Octal$OctalContext.o(Octal.scala:12)
        at Octal$.main(Octal.scala:20)

Compile-time checking

Run-time errors are fine in some situations, but many of us would like to get stronger guarantees about our code. In particular, we’d like the compiler to complain if we try to write an octal number literal but get the format wrong.

One way to achieve this kind of checking is to extend the compiler with a macro. Macros are a new experimental feature in Scala 2.10 and are planned to be fully-supported in 2.11. In a nutshell, a macro is given access to the compiler’s abstract syntax tree (AST) for a method call. It can return a new AST that the compiler will use instead of the original call. Thus, the macro can replace the call that a user writes with any legal expression.

(Strictly speaking, the macros we use here are def macros because they implement the bodies of def constructs. Other macro styles in development will be able to replace types and other Scala constructs.)

Another alternative to get this kind of checking is a full-blown compiler plugin. A plugin can be more powerful than a macro, because it has full access to the whole compilation unit. However, writing a plugin is harder since much of the plumbing and book-keeping must be implemented. In contrast, the macro system takes care of many of the details of hooking into the compilation process, accessing the abstract syntax tree, and so on. We can focus on the actual replacement that we want to construct.

In our experience Scala 2.10 macros are pretty reliable, but you should be aware that they are experimental so it might be dangerous to rely on them. It’s also easy to get yourself or the compiler in a tangle when writing a macro since you are essentially working with the compiler’s internal representations.

An octal number literal macro

Our plan is to replace the octal number interpolator we wrote above with one that is implemented by a macro. The advantage is that the macro will execute at compile time so it will be able to issue compile-time errors if we get the string literal wrong. In the non-error case the macro will be able to perform the number conversion, thereby saving the program from having to perform it at run-time.

First, we write the o method, but instead of giving the full implementation, we use the macro keyword to indicate that this method is a macro that is implemented by the OctalImpl.oImpl method. This simple notation suffices to get the compiler to call the macro at compile-time and use its result.

implicit class OctalContext (val sc : StringContext) {

  def o () : Int =
    macro OctalImpl.oImpl


An import of scala.language.experimental.macros or the corresponding command-line option will be necessary to enable the macros feature. It is also necessary to ensure that the macro and its uses are not in the same compilation unit, since the compiler needs to have access to the compiled macro implementation when it compiles the uses.

The macro signature

The signature of a macro is closely related to that of the method that it implements. The signature of the oImpl method that implements o is as follows.

def oImpl (c : Context) () : c.Expr[Int]

The first parameter list contains a Context argument that can be used by the macro to find out about the context in which it has been called. The second parameter list contains one argument for each argument of the method. The arguments passed here are the Scala AST representations of the argument expressions used in the original method call. In our case, the o method has no parameters so the second parameter list of the macro is empty. Finally, the return value of the macro is a Scala AST that represents the expression which we want to use to replace the macro call. The return type is path-dependent on the context and says that the value must be an expression whose type is the type of the original method. Thus, we have a return type of c.Expr[Int] here because o returns an Int.

The context and the trees

The context gives us access to many wonderful things. For example, the context provides position information for the macro call and a way to issue error messages.

The context’s universe gives us the Scala AST node definitions which we will need to query and construct trees. To save space in the code, we introduce a local alias u for the universe.

import c.{universe => u}

and import everything

import u._

Importantly for the octal number literal macro, the context gives us access to the AST on which the method call in question has been made. This tree is returned by c.prefix.tree. The call o"177" desugars to

OctalContext (StringContext ("177")).o ()

after the implicit conversion has been applied. The prefix tree represents the bit without the method call.

OctalContext (StringContext ("177"))

A common development approach is to print out the trees to see what the compiler is giving you. The and u.showRaw methods are particularly useful for this purpose. For example, (c.prefix.tree) returns the following in the macro we are defining for the call o"177" (modulo some formatting). This tree has fully-qualified names and explicitly calls the apply method to construct the string context, but otherwise is the same as above.

OctalMacros.OctalContext (scala.StringContext.apply ("177"))

We can see the actual tree nodes used in the representation of this expression in the compiler by printing u.showRaw (c.prefix.tree) (reformatted to make the structure clear).

Apply (
  Select (Ident (OctalMacros), newTermName ("OctalC")),
  List (
    Apply (
      Select (Select (Ident (scala), scala.StringContext), newTermName ("apply")),
      List (
        Literal (Constant ("177"))))))

Thus, we can see that inside the compiler the prefix will be represented by an AST containing Apply nodes where methods are applied and a literal constant node containing the string "177".

Getting at the literal string

Armed with our knowledge about the structure of the prefix tree, we can easily pattern match the literal string out of the tree and call it orig.

val Apply (
      List (
        Apply (
          List (
            Literal (Constant (orig : String)))))) =

The layout of this pattern parallels the layout of the output of showRaw above.

Check, convert or complain

Now that we have the string literal in our grasp, the macro can proceed to check its format. The code is similar to the code we use earlier in our run-time version.

val OctalNum = "[0-7]+".r

orig match {

  case OctalNum () =>
    c.Expr[Int] (Literal (Constant (Integer.parseInt (orig, 8))))

  case _ =>
    c.error (c.enclosingPosition, "Must only contain 0-7 characters")
    c.Expr[Int] (Literal (Constant (0)))


The differences are in what is returned. If the format is ok, we perform the conversion to get the integer value. However, instead of returning that number, we return an AST that represents an expression that evaluates to that number. That expression is

Literal (Constant (Integer.parseInt (orig, 8))))

In other words, a literal integer constant containing the value we want. The c.Expr[Int] constructor combines the expression tree with its type representation.

In the error case, we call the c.error method to report a compile-time error. c.enclosingPosition gives us the source code position of the macro call. c.error is a Unit method, so we need a dummy return value to satisfy the return type.

Our macro implementation is complete. In summary, the macro is given the AST that represents the call. We delve into the AST to access the string literal from the call. We process that literal to check its format. If the format is ok, we convert it to a decimal value and return an AST that represents that value. If the format is not ok, we trigger a compile-time error that says so.

Using the macro-based interpolator

The macro can be used (in a different compilation unit) in the same way as our previous non-macro implementation. This abstraction is nice since you can switch the implementation to a macro without requiring users to rewrite their code. Of course, they must recompile it.

def main (args : Array[String]) {
  println (o"177")

If we try a literal that is not a valid octal number, we get the expected compile-time error.

OctalWithMacro.scala:8: error: Must only contain 0-7 characters
    println (o"49")

Why use a string interpolator?

Some readers may be wondering why we bother with a string interpolation for this example. All of the work is being done by the macro and the interpolation is not really contributing very much. In this case that is true, although I quite like the o"177" syntax, compared to something like o ("177") which we would have to use with a pure macro implementation.

String interpolations come much more into their own as an interface to macros when the string format is more complex and the processed strings can contain interpolated expressions. Having a standard and concise syntax to indicate where the expressions are to be placed is a big advantage. Otherwise, every macro writer needs to invent their own convention to pass the pieces to the macro.

What’s next?

The octal number example is a simple case of a very general problem. The format of a processed string can be quite tightly defined, perhaps by a formal syntax. We would like to be informed by the compiler if we err in the syntax of a string. Later posts will revisit this issue.

First though, in the next post we flip the problem around. Instead of using processed strings to construct values, we will use them to deconstruct values via pattern matching. We will see that the processed string syntax leads to quite concise pattern matching of complex structures.

Sunday, February 17, 2013

Writing your own Scala 2.10 String Interpolators

In the first post of this series I showed how Scala 2.10’s new processed string syntax allow us to interpolate expression values into literal strings. We saw the s, raw and f interpolators that are provided by the Scala library. Now we will see how to write your own interpolators.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

Syntactic sugar for processed strings

The first step to writing your own interpolators is to understand how the Scala compiler interprets the new processed string syntax. A processed string has this general form:

id"text0${expr1}text1 ... ${exprn}textn"

where id is an identifier, the text pieces are constant string fragments, and the expr pieces are arbitrary expressions. The general form of processed string is translated into an expression of the following form.

StringContext ("text0", "text1", ... , "textn").id (expr1, ... , exprn)

The constant parts of the string literal are extracted and passed to the constructor of the Scala library’s StringContext class. The id method of the StringContext object is called and the interpolated expressions are passed as arguments.

What does s do?

More concretely, the expression

s"You are ${age / 10} decades old, $name!"

is really

StringContext ("You are ", " decades old, ", "!").s (age / 10, name)

The StringContext.s method takes the constant parts, interprets any escape sequences they contain, and interleaves them with the values of the expression arguments. The value returned is equivalent to

"You are " + (age / 10) + " decades old, " + (name) + "!"

which is probably what we would have written prior to Scala 2.10.

Adding methods to classes using an implicit conversion

It should be clear by now that to write your own interpolator, all you need is to add a method to the StringContext class. Of course, you can’t actually modify StringContext, but you can use an implicit conversion to achieve a similar effect.

Prior to 2.10 we would implement an implicit conversion as follows. Suppose we want to add a method sayhello to values of type Int. We can declare a new class MyInt that has the sayhello method and an implicit conversion that wraps Int values in an instance of the MyInt class.

class MyInt (val i : Int) {
  def sayhello = s"Hello there: $i"

implicit def IntToMyInt (i : Int) : MyInt =
  new MyInt (i)

Now we can use the sayhello method on an Int

println (42.sayhello)

which is treated by the compiler as

println (new MyInt (42).sayhello)

so we get the expected output:

Hello there: 42

Implicit classes

In 2.10 we can write the same conversion in a shorter way using another new feature: implicit classes.

implicit class MyInt (i : Int) {
  def sayhello = s"Hello there: $i"
println (42.sayhello)
Hello there: 42

The implicit modifier on the class MyInt causes the compiler to synthesise the IntToMyInt conversion that we wrote earlier.

In practice, you would probably also want to make MyInt a value class (another new feature in 2.10) to avoid the overhead of the object wrapping. We will avoid value classes here to keep the examples simple.

Writing our own interpolator

Armed with implicit conversions, it is easy to write a new interpolator. As our first example, let’s write one that performs exactly like s but returns the reverse of the result string.

implicit class ReverseContext (val sc : StringContext) {
  def rev (args : Any*) : String = {
    val orig = sc.s (args : _*)

The ReverseContext class wraps a StringContext and adds the rev method. rev passes all of its expression arguments to the s method of the StringContext and then returns the reverse of the result.

val msg = "Hello world!"
println (rev"Backwards version of $msg")
!dlrow olleH fo noisrev sdrawkcaB

Constructing values that are not strings

Even though processed strings start out as a form of string literal, the value that they stand for does not have to be a string. This observation leads to much of the power of processed strings.

For example, the following code is a simple interpolator that first applies s and then counts the number of space characters in the result. The count is returned in a Count object.

case class Count (num : Int)

implicit class SpaceCountC (val sc : StringContext) {
  def nspaces (args : Any*) : Count = {
    val orig = sc.s (args : _*)
    Count (orig.count (_.isSpaceChar))
val msg1 = "Hello world!"
val msg2 = s"a b $msg1 c d"
println (nspaces"$msg1")
println (nspaces"$msg2")

What’s next?

These examples have been chosen to be simple to make the mechanism clear. In the next post, we consider a more realistic example: octal number literals. As well as requiring a (slightly) more complex interpolator, octal numbers prompt us to think about the form of the literal that we can accept and what to do if the literal’s form is not legal.

Thursday, February 14, 2013

String Interpolation in Scala 2.10

I spoke at a recent ScalaSyd meeting about string interpolation in the Scala 2.10 release. You can find the slides here, but since there is little in the way of explanation in them, I’ll be blogging here to explain the examples.

I begin in this post with the basics of string interpolation via processed strings in Scala 2.10. Later posts show how to write your own interpolators that implement custom value construction as well as pattern matching. The final posts show how macros can be used to get compile-time guarantees about the content of your processed strings.

The complete code for all examples in this series will be made available in the accompanying BitBucket project.

The story so far

Scala from before 2.10 supports basic string literals delimited by single quotes. Single-quoted literals can contain escape sequences but not newlines.

"A string on one line"

A triple-quoted form is also available in which non-Unicode escape sequences are not interpreted and newlines can occur.

"""A long string with only Unicode escapes and

possibly newlines in it"""

Commonly used techniques for constructing strings include using the plus operator to concatenate the pieces.

"The " + animal1 + " jumped over the " + animal2

Fans of printf-style string formatting can use the format method to build their strings.

"The %s jumped over the %s".format (animal1, animal2)

Processed strings

Scala 2.10 extended the syntax by adding processed strings that allow expression values to be interpolated (inserted) into the middle of a string literal.

Interpolation is requested by prefixing a string literal with an identifier. There must be nothing between the identifier and the literal. The examples I show use the single-quoted form of string literal, but tripled-quoted strings can also be processed in an analogous fashion.

Processed strings can include arbitrary expressions marked by dollar signs. The way in which the dollar-marked expressions are processed depends on the details of interpolator.

For example, the Scala library provides an interpolator called s that can be used as follows.

val answer = 42
println (s"answer is $answer, dollar is $$")

val animal1 = "fox"
val animal2 = "dog"
println (s"The $animal1 jumped over the $animal2")

println (s"One plus one is ${1 + 1}")

println (s"The inserted expressions are blocks ${
  val x = "!"
  x * 3
answer is 42, dollar is $
The fox jumped over the dog
One plus one is 2
The inserted expressions are blocks !!!

The s interpolator produces a string value by concatenating the constant parts of the string literal (after interpreting escape sequences) with the values of the embedded expressions. A dollar sign is obtained in the output by including two consecutive dollar signs in the literal. If the expression marked by a dollar sign is more than a single identifier it must be enclosed in braces. The final example shows that the expressions are actually block expressions so they can contain local declarations.

The Scala library also contains an interpolator called raw that behaves just like s except that it doesn’t interpret escape sequences in the constant parts of the string literal.

It is important to realise that the expressions embedded in a processed string are checked in the usual way by the Scala compiler. Errors will be reported at compile time.

Formatted interpolation

The string interpolation equivalent of the format method is provided by the Scala library’s f interpolator.

val pi = 3.14159
println (f"pi ($pi) = $pi%1.3f")

val msg = "G'day!"
println (f"msg.length   = ${msg.length}%5d")
pi (3.14159) = 3.142
msg.length   =     6

The difference between s/raw and f is that in an f string the embedded expressions can be followed by format specifiers beginning with a percent sign. If no format specifier is given for an expression then it defaults to %s so the string value of the expression is used.

An interesting aspect of f is that the interpolation process checks compatibility between the embedded expressions and the format specifiers at compile time. For example, if msg is a string, the following will not compile since a string cannot be formatted as a floating-point value.

f"msg can't be formatted as $msg%1.3f"

This kind of checking goes above and beyond the normal checking that the compiler will do. In the example, the compiler will normally ensure that msg is in scope of its use in the string. The extra checking to make sure that msg is compatible with %1.3f is performed by the interpolation process. Since we want the extra checking to be performed at compile time, it is implemented by a macro that augments the compiler’s capabilities. I’ll show some examples of using macros for this kind of checking later in this series.

What’s next?

As you might expect, the interpolators s, raw and f are not particularly special. It’s easy to write your own and in the next post I’ll show you how.