ANTLR for Ruby

Getting Started

updated Sunday, August 04, 2013 at 09:59PM EDT



The simplest way to install the antlr3 ruby project is via rubygems.

Installing Via Rubygems
~> gem install antlr3

Source Archive

There are source zip-file distributions of the source code available for download at Rubyforge. However, be aware that the zip distribution does not include any setup/installation script currently and the antlr4ruby tool must be able to locate the file java/antlr-full-3.×.×.jar in the project’s base directory.


The full source code and development history can be cloned from my GitHub repository. Note that the repository code is much larger and full of dependencies than the source packages listed above. If you are primarily interested in simply using this package to generate recognition code, you are probably better off installing the gem or source package as described above. The gem and zip package distributions contain everything needed to use the ruby target without any external dependencies.

Running ANTLR

Included in the package is the program antlr4ruby, which is a thin wrapper around the ANTLR tool command. Thus, it is invoked exactly like the ANTLR tool. Refer to this list of ANTLR tool command-line options for more information. So, to generate recognition code for input grammar Whatevs.g,

Generating Ruby Code From a Sample ANTLR Grammar, Whatevs.g
~> antlr4ruby Whatevs.g

Grammar File Options

To generate ruby code, you must specify a top-level option block that sets language = Ruby;

Ruby-Targetted Grammars Require language Option
grammar AnyRubyGrammarYouWrite;
options { language = Ruby; }

If you do not set this option, ANTLR assumes language = Java;. You will generate Java source code.

Quickly Testing ANTLR Output

Every recognizer file generated by ANTLR has a built-in recognizer script that can be used to quickly try out the recognizer. The test script code is only executed when the file is run directly, not when it is loaded by other ruby files. For example, say you are developing a grammar for recognizing the C language, C.g. When the grammar’s filled out to the point where ANTLR can compile it without errors, run

~> antlr4ruby -fo C.g

Now, if you have some sample C code in a file named sample.c and you would like to test out your lexer to see how it tokenizes the output code, you do not have to build a driver by hand. Simply run:

Built-In Test Driver: Lexer Files
~> ruby CLexer.rb sample.c
--> T__23           "typedef"       @ line 1   col 0
#   WS              " "             @ line 1   col 7   (hidden)
--> T__39           "unsigned"      @ line 1   col 8
#   WS              " "             @ line 1   col 16  (hidden)
--> T__34           "int"           @ line 1   col 17
#   WS              " "             @ line 1   col 20  (hidden)
--> IDENTIFIER      "size_t"        @ line 1   col 21
--> T__24           ";"             @ line 1   col 27
#   WS              "\n"            @ line 1   col 28  (hidden)
#   LINE_COMMAND    "# 325 \"stddef.h\" 3 4\n" @ line 2   col 0   (hidden)

Both parsers and tree parsers also have similar built in test drivers. You can run the file with a —help switch to get more information about arguments and available options. See section Built-In Drivers for more information on using the built in quick test script feature.