Skip to content

Update frequency: Re-crawl SCC calendar every 30 minutes; only do a deep crawl if something changes #7

@davepeck

Description

@davepeck

At the moment, I've configured a GitHub Action to crawl the SCC calendar once a day (at 12:30AM) and, if there is new content, extract & summarize + deploy to https://scc.frontseat.org/

This isn't good enough. Today, a couple meetings got added in the middle of the day. I'd like summaries to show up as soon as possible.

For this task, we will:

  • Update the ./engage legistar crawl-calendar command to support a new --deep flag. If this flag is provided, we'll do what we currently do: crawl Calendar.aspx and everything it links to. However, if --deep is not provided, we'll only look at Calendar.aspx itself. Unless it has materially changed, we won't proceed with a full crawl and will exit with a special status. Our update script will see this status and decide no further summarization work is needed.
  • Add a deployment pass directly to our update pass; if we update, then we deploy. Currently these are separated (aka our update action happens at 12:30AM and our deploy at 2:30AM to leave time for the update to finish).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions