-
Notifications
You must be signed in to change notification settings - Fork 1
Synopsis
ok, so far so good. Here's a summary of how the system works.
The main app is used to test different input files for parsability and output the results. All the app does is create a Router object and run the start() method. This function parses the command-line options for the input file location. Other options can also be specified like which parser to use, any extra formatting that needs to be done like stripping off the white-space as well as output location. If no input file is designated you can optionally enter the input directly on the command line. If no input is found in either location, it will try to read from standard input but it's mainly looking for an input file designation with the '-i' switch.
Once the input file is prepared the router reads the entire file contents into a string which is then run through any mogrifiers which allow for pre-parsing input manipulations. At this point the input string is prepared for parsing.
The next step is to find an appropriate parser; this is done in the router method called get_parser() which contains the so-called "intelligence" to determine which parser is actually the best match for the input string. Currently a brute force method of parser selection is used which actually runs all known parsers on the input before any branching is attempted.
This functionality might be better suited to parser comparison reporting as described in the matrix application below because it is a bit of overkill. We don't need to run all parsers on the input if the first parser we run satisfies our selection criteria. A short-circuit approach would offer better performance, testing the strongest parser first and stopping there if it finds the data it's looking for. A command-line switch for said behavior is in the works.
How the parser is selected is that first we see if any of known parser results contain an ordered list of questions. If the Question objects that are parsed out start with number one and step up incrementally by one each question, then this data is considered "ordered". Ordered data is the primary thing we're looking for. If none of the parsers have ordered data maybe because one question in the middle had a strange format then we fall back to the next criteria which is symmetry.
If all of a parser's questions each have the same number of options, like: a) b) c) d), then these questions are considered to be "symmetrical". This is not as good an indicator of a well-formed data set as "ordered" data, but it does offer us insight. The third and last criteria is magnitude. If a parser finds any data at all, at this point, it will be used.
The router definitely plays favorites when it comes to parser selection so the order that the criteria are applied is important. It would be nice to have this parser order, as well as the criteria selection order, maintainable in a configuration file and eventually in a database.
Once the parser is selected and after running it's parse() method we have a list of Question objects that we hope contains a good representation of the original test. At this point we apply any and all filters that we specified on the command-line to these questions one at a time. Finally we write the output to the output file which will be standard out if none specified.
Here's what the basic functional flow looks like for the Router:
Router(): start(), load(), setup(), get_input(), mogrify(), parse(), filter(), <show stats>, write()