Skip to content

Import college readiness data #68

@PhantomWatson

Description

@PhantomWatson

This page has college readiness data that should be imported, but it uses a different structure than the rest of IDOE's spreadsheets and doesn't identify schools/corporations by their IDOE codes.

Identifying schools

Since IDOE codes aren't used, the school/corporation's name will need to be matched to an existing record. In the event that no exact match is found, An array of all names could be loaded and the top three candidates could be displayed based on which names start with the same letter and have the lowest levenshtein() values and then the correct name could be chosen by the user.

Reconciling with existing import script

It may take less work to create a script that converts a college readiness spreadsheet into a new spreadsheet formatted like all of the other IDOE sheets, i.e. with

  • Each school displayed once per worksheet
  • The first two columns are IDOE code and name
  • The first row has column headers
  • Every cell to the right of codes and names and below the headers contains statistical data (or nulls)

Rationale

My presumption at this point is that...

  • It would take a very cumbersome overhaul of the import-stats command in order to be able to feed these college readiness spreadsheets into it. It would require the acknowledgement of two different spreadsheet formats, and determining how the code knows which format a file uses and how each method in ImportStatsCommand and ImportFile would need to be adjusted based on the format sounds like a headache. Specifically, one that would balloon the complexity of these classes and hurt their maintainability.
  • It would also take a tremendous amount of work to manually reformat these spreadsheets to match the expected format.
  • A command that reformats these spreadsheets would be contained and likely not very large.
  • The precedent of putting spreadsheets into a common format and running them through the same import-stats command seems like a better long-term plan than adding code to import-stats that accounts for every format variation that's been encountered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions