Skip to content

[RFC] "compact" deprecation lists #75

@davidstosik

Description

@davidstosik

What

Currently, the YAML files produced by deprecation toolkit will contain hundreds of identical deprecations produced by a single test. The reason is rooted in Collector's implementation.

In this RFC I'd like to suggest a more compact format.

Why

I find the volume of these deprecation files daunting, and because deprecations are not sorted, it can be difficult to evaluate whether a new deprecation is completely new within a test, or "one more" in a long list of deprecations that were already produced by that test.

The same volume will produce some noise in pull requests, unnecessarily inflating the diff count (which at least might have a psychological effect), and potentially making it hard to scroll in a code diff.

How

My proposition is to change for a format where, instead of repeating identical deprecations, we count them. The format could look something like this:

---
test_my_test_title:
- deprecation: 'DEPRECATION WARNING: Don''t use bla bla'
  count: 1
- deprecation: 'DEPRECATION WARNING: Don''t do this'
  count: 87
I quickly tried writing a patch in an existing app, and it looked like this. (Click to open.)
require "deprecation_toolkit/read_write"

# I limited my changes to the YAML read/write logic, but this could be more efficient.

module DeprecationToolkit
  module ReadWriteHelper
    def read(test)
      deprecation_file = Bundler.root.join(recorded_deprecations_path(test))
      data = YAML.load(deprecation_file.read).fetch(test_name(test), [])
      if data.first.is_a?(Hash)
        data.flat_map do |hash|
          [hash["deprecation"]] * hash["count"]
        end
      else # for backwards compatibility
        data
      end
    rescue Errno::ENOENT
      []
    end

    def write(deprecation_file, deprecations_to_record)
      create_deprecation_file(deprecation_file) unless deprecation_file.exist?

      content = YAML.load_file(deprecation_file)

      deprecations_to_record.each do |test, deprecations|
        if deprecations.any?
          content[test] = deprecations.tally.sort.map do |deprecation, count|
            {
              "deprecation" => deprecation,
              "count" => count,
            }
          end
        else
          content.delete(test)
        end
      end

      if content.any?
        deprecation_file.write(YAML.dump(content))
      else
        deprecation_file.delete
      end
    end
  end
end

Anything else

I guess this is really not a big deal, but somehow I found myself annoyed by these huge lists and thought I'd spend a bit of time understanding and thinking about a possible improvement.

I noticed that the current logic reads and rewrites a given test file's deprecation list YAML after each test in the file is completed. In big test suites where single test files contain a lot of tests, and lots of repeated deprecations, this could have a slight beneficial impact on file read/write time. (Less stuff to read/write every time.)

I also wrote a script that could update all deprecation list files to the new format, it's rather trivial.
require "yaml"

Dir["test/deprecations/**/*.yml"].each do |file|
  compacted = YAML.load_file(file).transform_values do |deprecations|
    deprecations.tally.sort.map do |deprecation, count|
      {
        "deprecation" => deprecation,
        "count" => count,
      }
    end
  end
  File.write(file, YAML.dump(compacted))
end

I haven't run the test suite on this change, so it is possible it needs more work to prevent breaking anything.

Before putting more work in, I'd like to hear opinions. 🙏🏻

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions