Categories
Programming

Matlab grammar

I described in a previous post how I was trying to parse Matlab code. I’ve given up on this endeavour because it is way too much work (we designated a Matlab compiler as a last resort to get everything stand-alone, we will require Matlab in our project to speed up development).

I will however provide my incomplete and broken grammar, as promised. I hope somebody can use this later on as we have dropped the compiler approach completely.

On a side note, we figured out why nobody has a complete grammar for Matlab: its too darn difficult – if you manage to somehow describe the syntax, implementing it (type checking and internal Matlab functions for example) will be a heck of a lot of work. But it can be done, that much is obvious by now.

matlab.g

// Simple grammar for interpreting Matlab files for the MDDP project.
// Written by Berend Dekens
//
// Note: this part of the project is abandoned and will not be completed. The grammar is mostly working in Antlr except
// for some dodgy errors. If you find this usefull and/or manage to fix the parser errors, please let me know so I can
// fix the problems.
//
// Known limitations:
// - No function calls without parenthesis
// Matlab allows function calls in the form of 'function_name arguments'. This is annoying and thus not allowed.
// - No functions calls (period)
// Currently we do not allow function calls at all. Implementing this means supporting a large portion of basic
// Matlab functions and support for declaring new functions across files. This is beyond the scope of this project.
// - No characters or strings in variables
// Our application is matrices and vectors (integers). Boolean logic is included for the sake of logic blocks and loops.
//
grammar simplematlab;
options {
output=AST;
backtrack=true;
}

// Grammar rules below, start the grammar with a list of statements
statementList
: statement (lineSep+ statementList? )?
;

lineSep
: ';' | '\n' | ','
;

statement
: 'if' parExpression statementList ( lineSep 'elseif' parExpression statementList)* ('else' statementList)? lineSep 'end'
| 'for' Identifier '=' (Identifier | integerLiteral) ':' (Identifier | integerLiteral) (':' (Identifier | integerLiteral))? statementList 'end'
| parExpression
;

// Expressions
parExpression
: '(' expression ')'
| expression
;

expression
: conditionalOrExpression (assignmentOperator expression)?
| '(' conditionalOrExpression (assignmentOperator expression)? ')'
;

assignmentOperator
: '='
| '+='
| '-='
| '*='
| '/='
;

conditionalOrExpression
: conditionalAndExpression ( '||' conditionalAndExpression )*
;

conditionalAndExpression
: equalityExpression ( '&&' equalityExpression )*
;

equalityExpression
: relationalExpression ( ('==' | '!=') relationalExpression )*
;

relationalExpression
: additiveExpression ( relationalOp additiveExpression )*
;

relationalOp
: '<='
| '>='
| '<'
| '>'
;

additiveExpression
: multiplicativeExpression ( ('+' | '-') multiplicativeExpression )*
;

multiplicativeExpression
: unaryExpressionNotPlusMinus ( ( '*' | '/' ) unaryExpressionNotPlusMinus )*
;

unaryExpressionNotPlusMinus
: '~' unaryExpressionNotPlusMinus
| '!' unaryExpressionNotPlusMinus
| primary ('++'|'--')?
;

primary
: parExpression
| literal
| Identifier (identifierSuffix)?
;

literal
: integerLiteral
| booleanLiteral
| 'null'
;

integerLiteral
: Digit+
;

booleanLiteral
: 'true'
| 'false'
;

identifierSuffix
: ('[' ']')+ '.' 'class'
| ('[' expression ']')+ // can also be matched by selector, but do here
| arguments
;

expressionList
: expression (',' expression)*
;

arguments
: '(' expressionList? ')'
;

Identifier
: Letter (Letter | Digit)*
;

Letter : 'a'..'z' | 'A'..'Z';
Digit : '0'..'9';