Make column names valid Python identifiers

Since we need a three level header in the lyDATA columns to represent (1) diagnostic modality, (2) side, and (3) LNL in the involvement table, every other column e.g. about age or HPV status also has three levels. To keep it short, I thoght it was a good idea to use the filler `#` as the second level header for patient information.

However, that turned out to be a stupid idea, since pandas allows accessing multi-level headers using dot-notation if (and only if) the header names are valid Python identifiers. E.g., `dataframe.patient.info.age` would be a valid way of accessing the column `("patient", "info", "age")`. But obviously `dataframe.patient.#.age` is NOT valid Python code.

Similarly, I thought is was maybe useful to allow multiple synchronous tumors by enumerating the tumors a patient has in the second level header. So, a patient could have info about one tumor under `("tumor", "1", "t_stage")` and about another under `("tumor", "2", "t_stage")`. But this too was stupid: Few patients have synchronous tumors and most of them are somewhat extraordinary and excluded from most studies. So, all this scheme did was again to create column headers that are NOT valid Python identifiers.

Therefore, I propose to change the second level heading for both the `"patient"` and the `"tumor"` top-level headers to a simple underscore. Then we could use `dataframe.patient._.age` and `dataframe.tumor._.t_stage`. This also would open the door for nice type hinting and auto-completion in an IDE like VS Code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make column names valid Python identifiers #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make column names valid Python identifiers #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions