Skip to content

Multi-table relationships #413

@vankesteren

Description

@vankesteren

From several use-cases (CBS/YOUth) we realized that it is important to support some form of multi-table relationships, especially pertaining to ID variables. Initial thoughts would be to support primary / foreign key relations between tables. The exact implementation will be discussed as part of this issue.

At the same time, we do not want to stray too far from our original design principles: introducing primary/foreign key support means introducing a relationship between variables, with potential privacy issues as a result. We would need to investigate this a little (without completely blocking this feature).

Note that table relationships can result in a graph, which can form the basis of a metadata file or something on the level of the database, similar to how sdv does multi-table relations (but simpler?).

For example, see this schema from our election database repo:

Image

Implementation thoughts

Several options exist:

  • Build this into metasyn directly, with a new module where you can do something like from metasyn.multitable import MetaDatabase
  • Create a new package which imports metasyn. In this way, we don't change the scope of the metasyn package itself (and the JOSS paper stays correct for example). It's a bit less convenient for new users.
  • Build metasyn into SDV as a data generation method (???)

Questions

  • Do we want to keep multiple gmf files? Or not? for example, a metadata folder like so:
    • users.gmf
    • products.gmf
    • transactions.gmf
    • meta_relations.json
  • How does generation actually work?
    1. create graph of table relations
    2. "resolve" graph to determine generation order (probably, just generate all primary keys first and then each individual table)
    3. generate each table in order
  • what to do with the length of each table? Uniqueness? Subset proportion (watch out for disclosure)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    TDCC-SSHIssues related to the TDCC-SSH projectdiscussionDiscussions about how to move ahead

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions