Matlab grammar

I described in a previous post how I was trying to parse Matlab code. I’ve given up on this endeavour because it is way too much work (we designated a Matlab compiler as a last resort to get everything stand-alone, we will require Matlab in our project to speed up development).

I will however provide my incomplete and broken grammar, as promised. I hope somebody can use this later on as we have dropped the compiler approach completely.

On a side note, we figured out why nobody has a complete grammar for Matlab: its too darn difficult – if you manage to somehow describe the syntax, implementing it (type checking and internal Matlab functions for example) will be a heck of a lot of work. But it can be done, that much is obvious by now.


// Simple grammar for interpreting Matlab files for the MDDP project.
// Written by Berend Dekens
// Note: this part of the project is abandoned and will not be completed. The grammar is mostly working in Antlr except
// for some dodgy errors. If you find this usefull and/or manage to fix the parser errors, please let me know so I can
// fix the problems.
// Known limitations:
// - No function calls without parenthesis
// Matlab allows function calls in the form of 'function_name arguments'. This is annoying and thus not allowed.
// - No functions calls (period)
// Currently we do not allow function calls at all. Implementing this means supporting a large portion of basic
// Matlab functions and support for declaring new functions across files. This is beyond the scope of this project.
// - No characters or strings in variables
// Our application is matrices and vectors (integers). Boolean logic is included for the sake of logic blocks and loops.
grammar simplematlab;
options {

// Grammar rules below, start the grammar with a list of statements
: statement (lineSep+ statementList? )?

: ';' | '\n' | ','

: 'if' parExpression statementList ( lineSep 'elseif' parExpression statementList)* ('else' statementList)? lineSep 'end'
| 'for' Identifier '=' (Identifier | integerLiteral) ':' (Identifier | integerLiteral) (':' (Identifier | integerLiteral))? statementList 'end'
| parExpression

// Expressions
: '(' expression ')'
| expression

: conditionalOrExpression (assignmentOperator expression)?
| '(' conditionalOrExpression (assignmentOperator expression)? ')'

: '='
| '+='
| '-='
| '*='
| '/='

: conditionalAndExpression ( '||' conditionalAndExpression )*

: equalityExpression ( '&&' equalityExpression )*

: relationalExpression ( ('==' | '!=') relationalExpression )*

: additiveExpression ( relationalOp additiveExpression )*

: '<='
| '>='
| '<'
| '>'

: multiplicativeExpression ( ('+' | '-') multiplicativeExpression )*

: unaryExpressionNotPlusMinus ( ( '*' | '/' ) unaryExpressionNotPlusMinus )*

: '~' unaryExpressionNotPlusMinus
| '!' unaryExpressionNotPlusMinus
| primary ('++'|'--')?

: parExpression
| literal
| Identifier (identifierSuffix)?

: integerLiteral
| booleanLiteral
| 'null'

: Digit+

: 'true'
| 'false'

: ('[' ']')+ '.' 'class'
| ('[' expression ']')+ // can also be matched by selector, but do here
| arguments

: expression (',' expression)*

: '(' expressionList? ')'

: Letter (Letter | Digit)*

Letter : 'a'..'z' | 'A'..'Z';
Digit : '0'..'9';



I Hate Grammar

I hate grammar. I learned the English language when I was 9 years old (as it is not my native language) while trying to read the manual of QuickBasic under DOS. By the time we actually got English lessons in school I was years ahead of most people in my class and even up to the end of high school I never had to read the books we used in school. Instead I read a lot of English books which provided enough sence for the language to stay clear of any official grammar.

But I am drifting here. The problem in computer land is that we use grammar. Every programming language uses a grammar to allow a human to tell the computer what it should do.

Right now I am working on a project where we want to compile Matlab code into C or C++, feed it to GCC and finally upload it into a Virtex IV FPGA. The FPGA has a Sparc V8 compatible CPU (the Leon 2 to be exact) and has a number of auxiliary processorswhich are dedicated for mathematical calculations. For those who care: we are using Montium cores for the calculations.

Even though we pretty much have solved how to drive the whole thing in theory (not in real life as we are just starting), putting it all together is a bit harder.

It starts with the Matlab interpreter and the grammar needed for the interpreter. I’ve been working with ANTLR in the past to generate a compiler for my own toy language. It supported pretty much everything the old QuickBasic language did except the syntax was much more like Java.

I installed ANTLRWorks to use the shiny new GUI to speed up stuff and immediately ran into a wall: Matlab uses a syntax which sometimes end a statement with a semi-colon (‘;’) and sometimes not, depending if the programmer wants to see intermediate results. I am trying to base my grammar on Java here as it is nice and strict – but stuff like this is rapidly making the adaption a pain in the …

Another hair puller is the syntax of the ‘if’ statement: no block structure… Another one lacking the block structure is the function definition. As we are targeting mathematical speedup here and our Virtex board has no display, I will probably sacrifice some stuff to simplify the compiler.

Right now I’m keeping integers and booleans, logic structures like ‘if’-‘else’ and ‘for’ loops. Floating points, bit operators, function definitions and calls – all have to go. Perhaps I will re-add them later on when needed but right now I don’t see any reason to keep them around.

As soon as I got my sub-set grammar complete I will put it up on my site as nobody on the internet seems to have done this before (I found posts from 1992 with no solutions…).