Re: DialogueChecker (Latest version: V2.19, 12 February 2015)
TheLowKing said:
Pim_gd said:
The problem was not iterating over the lines, it was starting over at the start of the list everytime.
But that would have negligible performance impact in faster languages like, C or C++. Since it clearly has a much larger effect in AS, I figure looping over lines itself must also be slow. In which case looping once might be noticably faster than looping 6(?) times. Obviously the impact won't be anywhere as large as changing from O(n^2) to O(n), but still.
Iterating over the lines can't be what's costing me performance. Performance test:
var lines:Array = dialogue.getLines();
for each (var line:Line in lines) {
checkLine(line, dialogue, options);
}
This takes me 270-278 ms for fleacks 4000 line dialogue (GrammerChecker).
Remove the checkline function and do something simple with each line instead...
var lines:Array = dialogue.getLines();
var num:int = 0;
for each (var line:Line in lines) {
//checkLine(line, dialogue, options);
num += line.getLineNumber();
}
trace(num);
0-2 ms. I think we're having trouble with the granularity of my stopwatch here - it can't be more precise than 2 milliseconds. So likely this is taking around ~1 ms.
Probably be even faster if I used a uint like I should. And a vector.
Point is, it's the actual checking that's eating most of the time.
However, replace the line iteration with a loop that goes from 0 to 8000000...
var lines:Array = dialogue.getLines();
var num:uint = 0;
for (var i:int = 0; i < 8000000; i++) {
num += i;
}
trace(num);
And the GrammarChecker takes 560-580 ms to run.
Add a function call per iteration and 1.5 seconds doesn't seem that strange anymore.
It's entirely the algorithm that was hurting me here, and not the language.
Besides, if I didn't have this problem at 4000 lines it would have popped up at 10000 lines or at 20000 lines. Or perhaps in the future, when the DialogueChecker would check multiple dialogues at the same time (for multi-dialogue projects).
As for the architectural change, that's something that's not worth it to do. Right now I have modularity - I can take out a Checker easily and slide new ones in. You could even write your OWN checker using my framework - you get a Dialogue object that you can ask for lines, and from there you can iterate over the lines and perform your checks. That is far more important than the performance benefits of changing O(5n) to O(n). ... And even that's not the case; If N is a checked line, 4000 lines are still checked. It's only the iteration count that has gone up. So what you're saying is not changing O(5n) to O(n) (80% performance increase), it's O(5m + n) to O(m + n). And m turns out to be... well, given that 8 million iterations take 560-580 milliseconds, 4000 iterations would be 0.29 milliseconds. Meanwhile, checking all the lines normally takes 2200 milliseconds ... with overheads everywhere, granted, but even if we reserve half of this as "overhead"... then if checking 4000 lines takes 1200 milliseconds and 4000 iterations takes 0.29 milliseconds, it looks like m = n*0,0002417.
Math aside, it boils down to the iterations having less than a 1% impact. Adding more lines to the dialogue would impact runtime far more.
Last point;
C++ is a language I'm not that good at. It would likely take me longer to develop the DialogueChecker in C++. For this reason alone, it is not smart to develop the DialogueChecker in C++. You see, I only develop this if I feel like it. And if I don't feel like working on it, no work gets done. So if I wrote the DialogueChecker in C++, I wonder whether we'd even have the wide range of functionality and checks like we have today. Perhaps we'd still be stuck with something from v1 - I'd never have the courage to rewrite the whole thing.
This looks like a rant and it probably is a rant, but you shouldn't feel attacked. Your suggestion has merit, but I've ran the tests, and it's not worth losing functionality for.
Feel free to download the code from the archive in the second post and look for optimizations - 2.2 seconds is fast, but I'd like it to be yet another 50% faster. This because if I do ever start supporting multiple dialogue checking, I want to be able to run WeeWillie's slave bazaar in under 10 seconds. And right now it takes...
69ms
90ms
1433ms
1359ms
212ms
90ms
Huh. It takes 3253ms to run the whole lot.
Well. I assume he'll be expanding the dialogues a lot - I want to be able to check any dialogue under 10 seconds, because if it is too much effort to run the tool repeatedly, people will stop using it.
EDIT:
So after this long winded post I couldn't really stop thinking about the performance and I went to do some more about it. I got myself a profiler, and then spent a lot of time getting the profiler to give me some usable data.
So turns out the SyntaxChecker spends a lot of time toying with TriggerTypes. And identifying a TriggerType is a lot of work. You see, the previous implementation of a TriggerType.identifyType went through multiple arrays and called indexOf(trigger) on them. Basically it makes a list and then goes "nope, nope, nope, nope, nope..." until it finds one it likes.
I changed it to use a lookup table. I'm not quite sure what the internal implementation is of that, but it's a lot faster because it doesn't check them one-by-one. I think it's kinda like looking for a word in a dictionary; you start with the first letter, then the next, and search alphabetically like that until you've found what you need. This is, of course, a lot faster than checking ALL the words in a dictionary for if it's the word you're looking for.
This all in v2.21, which I will release when I'm done dicking about.