I described in a previous post how I was trying to parse Matlab code. I’ve given up on this endeavour because it is way too much work (we designated a Matlab compiler as a last resort to get everything stand-alone, we will require Matlab in our project to speed up development).
I will however provide my incomplete and broken grammar, as promised. I hope somebody can use this later on as we have dropped the compiler approach completely.
On a side note, we figured out why nobody has a complete grammar for Matlab: its too darn difficult – if you manage to somehow describe the syntax, implementing it (type checking and internal Matlab functions for example) will be a heck of a lot of work. But it can be done, that much is obvious by now.
matlab.g
// Simple grammar for interpreting Matlab files for the MDDP project.
// Written by Berend Dekens
//
// Note: this part of the project is abandoned and will not be completed. The grammar is mostly working in Antlr except
// for some dodgy errors. If you find this usefull and/or manage to fix the parser errors, please let me know so I can
// fix the problems.
//
// Known limitations:
// - No function calls without parenthesis
// Matlab allows function calls in the form of 'function_name arguments'. This is annoying and thus not allowed.
// - No functions calls (period)
// Currently we do not allow function calls at all. Implementing this means supporting a large portion of basic
// Matlab functions and support for declaring new functions across files. This is beyond the scope of this project.
// - No characters or strings in variables
// Our application is matrices and vectors (integers). Boolean logic is included for the sake of logic blocks and loops.
//
grammar simplematlab;
options {
output=AST;
backtrack=true;
}
// Grammar rules below, start the grammar with a list of statements
statementList
: statement (lineSep+ statementList? )?
;
lineSep
: ';' | '\n' | ','
;
statement
: 'if' parExpression statementList ( lineSep 'elseif' parExpression statementList)* ('else' statementList)? lineSep 'end'
| 'for' Identifier '=' (Identifier | integerLiteral) ':' (Identifier | integerLiteral) (':' (Identifier | integerLiteral))? statementList 'end'
| parExpression
;
// Expressions
parExpression
: '(' expression ')'
| expression
;
expression
: conditionalOrExpression (assignmentOperator expression)?
| '(' conditionalOrExpression (assignmentOperator expression)? ')'
;
assignmentOperator
: '='
| '+='
| '-='
| '*='
| '/='
;
conditionalOrExpression
: conditionalAndExpression ( '||' conditionalAndExpression )*
;
conditionalAndExpression
: equalityExpression ( '&&' equalityExpression )*
;
equalityExpression
: relationalExpression ( ('==' | '!=') relationalExpression )*
;
relationalExpression
: additiveExpression ( relationalOp additiveExpression )*
;
relationalOp
: '<='
| '>='
| '<'
| '>'
;
additiveExpression
: multiplicativeExpression ( ('+' | '-') multiplicativeExpression )*
;
multiplicativeExpression
: unaryExpressionNotPlusMinus ( ( '*' | '/' ) unaryExpressionNotPlusMinus )*
;
unaryExpressionNotPlusMinus
: '~' unaryExpressionNotPlusMinus
| '!' unaryExpressionNotPlusMinus
| primary ('++'|'--')?
;
primary
: parExpression
| literal
| Identifier (identifierSuffix)?
;
literal
: integerLiteral
| booleanLiteral
| 'null'
;
integerLiteral
: Digit+
;
booleanLiteral
: 'true'
| 'false'
;
identifierSuffix
: ('[' ']')+ '.' 'class'
| ('[' expression ']')+ // can also be matched by selector, but do here
| arguments
;
expressionList
: expression (',' expression)*
;
arguments
: '(' expressionList? ')'
;
Identifier
: Letter (Letter | Digit)*
;
Letter : 'a'..'z' | 'A'..'Z';
Digit : '0'..'9';