An EBNF Auto-Completion Parser

After 3 months of programming I decided it was time to tackle my very own “first project” and boy was it terrible. My scope was a scientific calculator with a front facing GUI and after coding about 1500 lines of rubbish I made, for a lack of better words, something. It wasn’t total trash, but was no where near user ready, had no test, glitches were present, and it never parsed the users input meaning someone could enter “5 + + + 5 – – – 5 ++ ” which as we can tell is NOT an equation.

The Spring of 2022 I was taking 2 classes, 1 being a software engineering class where I could make basically any project I wanted, so I decided to take a second go at my calculator. The second was a language fundamental class where in the first few weeks we tackled EBNF grammar. EBNF, for those unaware, is a context free grammar within the family of metasyntax notations. What this means is that an EBNF defines the structure of a language. For example

<main> ::== a <animal> <action>
<animal> ::== ?<adjective> dog | cat | bird
<adjective> ::== big | small
<action> ::== sits | walks

With our grammar here, we start with <main>, “a <animal> <action>”. Next we can pull the animal, lets say cat. “A cat <action>”. Finally we pull our action, “A cat sits”. EBNF allows flexibility, for example if we pick dog instead of cat we can have “A dog sits” or “A big dog sits”.

EBNF is used to define a programming languages syntax as well, and when a user compiles or interprets their code one thing that happens is it gets parsed in comparison with the EBNF syntax. If there is an intrusion, such as
if (x == y) :
if (and or) :
a compiling error occurs. Some languages EBNFs span only a few hundred lines, others a couple thousand. Pythons for example is just over 600 lines.

So, what is the significance of bringing this up? Well, as I said above, my first calculators iteration allowed users to enter ” 5 ++”… but I wanted to build a defined EBNF that restricted the inputs users could enter. So, I did.

_ exp ::=
=
exp ‘+’ term
exp ‘-‘ term
term

_term ::=
term ‘*’ factor
term “/” factor
term ‘//’ factor
term ‘%’ factor
factor

_factor ::=
‘(‘ exp ‘)’
power

_power ::=
digit ‘**’ factor
function

_function ::=
[digit | function] [abs | sqrt | 1/x | n! | 10^y |x^y | log | neg]
digit

_digit ::=
int ‘.’ int
int

_int ::=
int int
[0 | 1 | .. | 9]

These 20 sum lines of code encompass all user inputs into the calculator. Turns out, building the EBNF parser was the easy part, live parsing became the issue. What I mean by this is say the user presses “5”. “5” is valid, so we will move on. Next they enter “+”. “5 +” is not a valid sentence structure according to our EBNF, as in the simplest form an expression needs to be an ‘expression + term’, where “5 +” is just an ‘expression +’.

So I had to build around predictability. If a user presses a button that can lead to a correct EBNF syntax, but isn’t quite there yet, my program assumed the user was going in the right direction and held off on marking it as incorrect syntax. But what if the user then presses “=” to solve with a “5 +”? Does this break my program and give to much power to the user? Not really. To combat this awkwardness of incomplete equations that follow EBNF partially up to a point I implemented an auto-completion function that would autofill an equation in accordance to the EBNF. So “5 + =” would become “5 + 5 =”. The user loses freedom when they try to bend grammatical rules.

So, you may ask “then what if the user enters “-” after their “5 +”. That’s simple, we change the most recent operator, “+”, to a “-“. “5 -“. What if the user enters a function, say square()? We copy the ‘=’ pattern and autofill, square(5+5), and this is also true for incomplete parentheses. Say ‘(‘ is entered, followed by a ‘)’. Our program throws a 0 in there for ‘(0)’. Its all about making sure that whatever is inside our equation line is following our EBNF rules. Originally, on my first go, a lot of this was if elif else statements following a very linear path, meaning I was only tackling edge cases directly. With my new iteration, the one including the EBNF, I was able to create rules and laws about how things worked and I literally saw 100’s of hard coded solutions turn into a few small functions that tackled the syntax rather than the event.

This of course is a very brief and simple run down as 1) I feel I lack the ability to explain exactly how everything works beyond a brief abstraction, 2) the technicality comes into play with code, and I don’t want to just throw all my code up here. If you’re interested in seeing my 600 lines of EBNF, you can view it on Github. The whole project is here.

Leave a Reply Cancel reply