-
Notifications
You must be signed in to change notification settings - Fork 1
Description
We have a good first implementation in ontodev/valve.py. Now that I've had a chance to use it, I'm considering some revisions and clarifications. As always, I want the user to be able to form a simple mental model of how the VALVE works, making it easy to learn and use, and avoid edge-cases and surprises.
- add regular expression matches to the grammar
/foo/and interpret this as amatchfunction - add regular expression substitutions to the grammar
s/foo/bar/ - ideally enforce that these are implemented as PCREs
- generalize the datatype table to be reusable conditions in a hierarchy -- maybe rename to "condition" table
- generalize datatypes from a tree to a DAG by allowing multiple parents
- maybe enforce that datatype names are single words
- maybe rework
split(pattern, count, expression, ...)asconcat(slot, slot, slot), e.g.concat(cell.label, " & ", gates) - drop
CURIEand replace with more generalconcat(prefix.prefix, ":", local_name) - a tree with a split is not a tree, it's a directed acyclic graph -- I'd like to distinguish
treefromdag(or maybehierarchy) - I'm worried that the current grammar has a lot of ambiguity: double quoted strings vs double quoted datatypes or column names or table names -- maybe this doesn't matter
A condition defines a list of checks. Each check defines a predicate (function) that takes a string and returns a boolean, as well as a bunch of information about the check: name, parents, level, message, etc. For each cell, we go through the list of checks in order, and ensure that the cell satisfies the predicate.
A predicate can also be thought of as a set of strings for which the predicate is true. A set of strings can be defined extensionally or intensionally. For an extensionally defined set we have a list of all the strings, so we just look up the string in the set -- this is how in and under work. For an intensionally defined set we have a rule for determining if the string is in the set -- this is how regex matches and list work. Even distinct can be thought of as: this cell is not in the set of other cells in this column.
tree and lookup are a bit different. lookup takes a pair of strings to a boolean. tree does validate a cell but also defines a structure that under can use.