ANTLR for Ruby

Action Blocks

updated Sunday, August 04, 2013 at 09:59PM EDT

ANTLR grammars are basically composed of a few different types of optional configuration blocks, several optional named action blocks, and a series of grammar rules. Rules, in turn, are essentially composed of alternatives, subrules, token references, rule references, and action blocks.

Action blocks are snippets of ruby code enclosed in curly braces; they allow a developer to execute arbitrary code at various points of the recognition process. This article covers aspects of referencing rule components within action blocks, syntactic restrictions developers should be aware of when writing code blocks, and the named actions available for use with this ANTLR target.

Action Block Syntax

ANTLR is designed allow developers to plug-in implementations for other languages. The tool is designed to parse grammar files without knowledge of the syntax in code blocks; it generally does a good job of extracting source code without knowing how to tokenize the target language, as mainstream programming languages usually share a set of standard lexical conventions. However, as ANTLR wasn’t designed with Ruby’s complicated syntax in mind, occasionally, you may stumble across a frustrating or confusing clash between ANTLR syntax and Ruby syntax.

How ANTLR Extracts Source Code Blocks

Basically, when ANTLR is lexing the grammar source file, and it hits an open brace {, it tries to skip over every character until it finds what appears to be a matching closing brace }. The action lexer assumes:

  1. block comments look like /* ... */
  2. single-line comments look like // ...<code> (and not <code># ..., as it is in Ruby)
  3. strings are enclosed by single or double quotes: 'blah blah' or "blah"
  4. an open brace { will be closed by }
  5. a slash \ escapes the value of the next character outside of quoted strings
  6. any other characters are skipped over until it meets an outer brace }

Additionally, action blocks would not be especially powerful if they were not able to reference contextual information, such as tokens, tree nodes, or parameters, among other objects. Thus, there are two syntactic “enhancements” to blocks, both of which clash with Ruby’s syntax.

Global Variables vs. Property/Parameter References

ANTLR interprets names prefixed with $ are references to rule properties, labels, arguments, return values, ANTLR scopes, or tokens. Ruby developers understand names prefixed with $ as global variables, such as $LOAD_PATH. Thus, references to global variables must be escaped in code blocks. $LOAD_PATH must be written as \$LOAD_PATH in a code block, or the ANTLR tool will think it is a reference to something that does not exist within the grammar specification.

01
grammar DollarSignUsage;
02
03
wrong
04
  : ID $global_name = $ID.text; }
05
  ;
06
07
correct
08
  : ID { \$global_name = $ID.text; }
09
  ;

Template Literals

ANTLR features a template output mode, which can be useful for many language translation tasks. It also introduces the idea of “template literals” into action blocks, which are used to create anonymous template objects or to reference specific named templates. Refer to the article on Template Mode for more information on templates and template literals. However, a developer should be aware that these constructs are introduced with a percent character %. Thus, anytime a % is used in code, even inside of a quoted string, it must be escaped if it is not part of an actual template literal.

01
grammar PercentSignUsage;
02
03
wrong
04
  : ID { puts( "id = %p" % $ID ) }
05
  ;
06
07
correct
08
  : ID { puts( "id = \%p" \% $ID ) }
09
  ;

Referencing Labels, Rule Values, Tokens, and Yada Yada Yada

As outlined in ANTLR’s primary documentation, action blocks can access a number of different grammar entities.

Token References

01
declr:  var=ID '=' INT ';'  { @variables[ $var.text ] = $INT.text.to_i };

Rule Value References

Property Type Description
$rule.text String The full text over which the parser advanced during the rule’s execution
$rule.start ANTLR3::Token The first token covered within the span of the rule’s execution
$rule.stop ANTLR3::Token The last token covered within the span of during the rule’s execution
$rule.tree ANTLR3::AST::Tree The tree node constructed during rule’s execution
$rule.template ANTLR3::Template The template value constructed during rule’s execution