This is quite important for things like scripting, my cross assembler, etc.
The way it works essentially forces you to parse a string from left to right, and eats up a lot of time in recursions. I don't mind the speed much, and was able to get it quite fast, but there was one issue I couldn't avoid.
Without assignment operators, and with some magic on parenthesis and unary operators, the entire thing was trivial to write as left-associative. Luckily, it matched C's operator associativity in all but one case: ternary expressions. They're right-to-left in C. Now, I could take the cheap way out and claim it's following PHP's style, which is also left-to-right, but I figured it'd be interesting to revisit the idea, and make it more adaptable, so that right-to-left evaluations would be easy to add. Eg in case there's ever a need to add assignment support or somesuch.
To allow interchangeable left/right associativity, and to speed things up a bit, I broke things up into individual functions and used a dual sliding window approach.
The idea is you start with markers on the left and right ends of the string. Left-associative operators move the markers on the left toward the right, and right-associative move the right markers left, until they meet up. Recursion is used to enable operator precedence.
The idea seems to work great, and so far it seems quite fast and clean. My ultimate goal is to write some base templates and allow custom grammars to be generated quickly and easily. Something like boost::spirit for dummies.
But I have one problem I can't solve ... supporting the grouping operator, or parenthesis. I don't suppose anyone here could help me out with that?
It seems easy enough. You'd think I could just add parenthesis below at the highest precedence (eg above muldivmod), and make that call the lowest precedence evaluator (eg ternary.) However, I run into the problem that the lower precedence operators hit their own symbols and thus prematurely shrink the left/right markers.
Eg for example, (1+2)*3. You go to ternary, there is none so go to addition, and it sees +, so it splits the expression to "(1" + "2)*3", which then goes finally to parenthesis, where it is only given "(1".
The only solution I can think of is to embed parenthesis scanning at each and every eval_n() call. And that just seems really messy. Surely there has to be a way to hide that overhead.
How it worked with the RDP was that we could simply detect the first byte as an opening parenthesis, and evaluate that to a single value. With everything packed into one function and only scanning in one direction, that worked fine. But I can't do that the same way with this new left+right marking scanner.
Even if I were to resolve parenthesis first, I wouldn't be able to do anything with the values. For instance, "(1+2)*3" -> I see (1+2) and resolve it. Great, now I have 3 and "*3". But the multiplication function can only evaluate "3*3", not 3 and "*3".
And even if I could, it seems like it'd cause problems with associativity. Eg if I resolve parenthesis, but I need right associativity, I'd have to evaluate the parenthesis in the other direction. Seems like a major hassle.
So, one, can anyone help with this? And two, I suck with the terminology. Is there a name for the type of parser I've created so far? I don't think it strictly qualifies as recursive descent anymore, even though it is still recursive. I also don't see what kind of grammar it cannot parse that a LL/LR/LALR parser cannot. So it doesn't seem to be purely RDP.
Many thanks in advance. Here's the code I have so far:
Code: Select all
//no associativity
int eval_integer(const char *sl, const char *sr) {
int result = 0;
for(; sl < sr; sl++) {
if(*sl < '0' || *sl > '9') printf("bad match %c\n", *sl);
result *= 10;
result += *sl - '0';
}
return result;
}
//left-associative
int eval_muldivmod(const char *sl, const char *sr) {
for(int n = 1; sr - n > sl; n++) {
if(*(sr - n) == '*' || *(sr - n) == '/' || *(sr - n) == '%') {
int left = eval_muldivmod(sl, sr - n);
int right = eval_integer(sr - n + 1, sr);
return
*(sr - n) == '*' ? left * right :
*(sr - n) == '/' ? left / right :
*(sr - n) == '%' ? left % right : 0;
}
}
return eval_integer(sl, sr);
}
//left-associative
int eval_addsub(const char *sl, const char *sr) {
for(int n = 1; sr - n > sl; n++) {
if(*(sr - n) == '+' || *(sr - n) == '-') {
int left = eval_addsub(sl, sr - n);
int right = eval_muldivmod(sr - n + 1, sr);
return
*(sr - n) == '+' ? left + right :
*(sr - n) == '-' ? left - right : 0;
}
}
return eval_muldivmod(sl, sr);
}
//right-associative
int eval_ternary(const char *sl, const char *sr) {
for(int n = 0; sl + n < sr; n++) {
if(*(sl + n) == '?') {
int condition = eval_addsub(sl, sl + n);
for(int o = n + 1; sl + o < sr; o++) {
if(*(sl + o) == ':') {
int left = eval_addsub(sl + n + 1, sl + o);
int right = eval_ternary(sl + o + 1, sr);
return condition ? left : right;
}
}
throw;
}
}
return eval_addsub(sl, sr);
}
int eval(const char *sl) {
const char *sr = sl + strlen(sl);
return eval_ternary(sl, sr);
}
int main() {
int n = eval("1?2:3?4:5");
fprintf(stdout, "%d\n", n);
return 0;
}
Left-associative:
Code: Select all
2+3-1+2 -> (((2+3)-1)+2) -> 6
Code: Select all
1?2:3?4:5 -> 1?2:(3?4:5) -> 2