Skip to content

WIP: General improvements #4

@jamesaoverton

Description

@jamesaoverton

We have a good first implementation in ontodev/valve.py. Now that I've had a chance to use it, I'm considering some revisions and clarifications. As always, I want the user to be able to form a simple mental model of how the VALVE works, making it easy to learn and use, and avoid edge-cases and surprises.

  • add regular expression matches to the grammar /foo/ and interpret this as a match function
  • add regular expression substitutions to the grammar s/foo/bar/
  • ideally enforce that these are implemented as PCREs
  • generalize the datatype table to be reusable conditions in a hierarchy -- maybe rename to "condition" table
  • generalize datatypes from a tree to a DAG by allowing multiple parents
  • maybe enforce that datatype names are single words
  • maybe rework split(pattern, count, expression, ...) as concat(slot, slot, slot), e.g. concat(cell.label, " & ", gates)
  • drop CURIE and replace with more general concat(prefix.prefix, ":", local_name)
  • a tree with a split is not a tree, it's a directed acyclic graph -- I'd like to distinguish tree from dag (or maybe hierarchy)
  • I'm worried that the current grammar has a lot of ambiguity: double quoted strings vs double quoted datatypes or column names or table names -- maybe this doesn't matter

A condition defines a list of checks. Each check defines a predicate (function) that takes a string and returns a boolean, as well as a bunch of information about the check: name, parents, level, message, etc. For each cell, we go through the list of checks in order, and ensure that the cell satisfies the predicate.

A predicate can also be thought of as a set of strings for which the predicate is true. A set of strings can be defined extensionally or intensionally. For an extensionally defined set we have a list of all the strings, so we just look up the string in the set -- this is how in and under work. For an intensionally defined set we have a rule for determining if the string is in the set -- this is how regex matches and list work. Even distinct can be thought of as: this cell is not in the set of other cells in this column.

tree and lookup are a bit different. lookup takes a pair of strings to a boolean. tree does validate a cell but also defines a structure that under can use.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions