Skip to content

Questions on Project #1

@upadhyd

Description

@upadhyd

@invitae-viv , Had a few questions regarding the project. They are as follows:

  1. Would the dataset contain duplicates? If not, which fields uniquely identify each record?
  2. Length(Gene) - Are there any constraints on the length of the Gene?
  3. NUCLEOTIDE_CHANGE - Can this field contain array of nucliotides. If so, should they displayed on the new line in the table?
  4. In the tsv file, I see that for some of the records, gene is null (eg - line 9) of the file?
    • Assumption - In this case, the gene would be the same as the one in previous record. In this case RTP5. Is the assumption correct?

RTP5 CM000664.1:g.242812080_243048760del,NC_000002.11:g.242812080_243048760del236681 CM000664.1,NC_000002.11 not provided Not Provided ClinVar 2017-09-14 https://www.ncbi.nlm.nih.gov/clinvar/RCV000161254 GRCh37 2 242812080 243048760 - - NC_000002.11 NULL NULL
CM000665.1:g.65191847_65215804del,NC_000003.12:g.65206172_65230129del23958,NC_000003.11:g.65191847_65215804del23958 CM000665.1,NC_000003.12,NC_000003.11 not provided Not Provided ClinVar 2017-09-14 https://www.ncbi.nlm.nih.gov/clinvar/RCV000161287 GRCh37 3 65191847 65215804 - - NC_000003.11 NULL NULL

  1. Language for backend. --> I am much familiar with Java/Spring framework and plan to design and develop the api in the same. If time permits, I will also replicate the same apis with python+flask or python+django through there is a bit of a learning curve for these on my end. Would that be acceptable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions