Skip to content

pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

@dshean

Description

@dshean

Is your feature request related to a problem? Please describe.
pc_align returns a lot of output to stdout, including some key metrics for evaluation of the transformation quality. New users don't know how to interpret all of this, or how to evaluate whether the final transform actually improved the alignment between their input datasets. Many just run the tool, and proceed with analysis, even though sometimes the transformation made the alignment between their datasets worse.

There is some limited information on evaluation in the current doc, but we should offer improved guidelines or recommendations for evaluation of the results.

https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html#interpreting-the-transform
https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html#error-metrics-and-outliers

We can also use more sophisticated sampling approaches to validate the improvement of the transformation.

Describe the solution you'd like
pc_align should report statistics for the "improvement" beyond just reporting the initial and final residuals.

Input: error percentile of smallest errors (meters): 16%: 0.604849, 50%: 2.01722, 84%: 3.62022
Input: mean of smallest errors (meters): 25%: 0.442947, 50%: 0.92913, 75%: 1.4753, 100%: 2.09795

and

Output: error percentile of smallest errors (meters): 16%: 0.690319, 50%: 1.67165, 84%: 2.68878
Output: mean of smallest errors (meters): 25%: 0.519557, 50%: 0.974861, 75%: 1.30205, 100%: 1.75889

There should be final lines of output summarizing stats on the difference between input and output residuals, computed on a point-by-point basis, and perhaps differences between the summary statistics...

We typically look at the difference in the median (50%) "before" and "after" numbers, plus the difference in the spread (so "84% minus 16% before" and "84% minus 16% after") of the distributions to evaluate improvement. These two numbers could be used as primary stats for success/improvement. pc_align should compute and displace the spread before and after.

I recommend that we change the terms "error percentile of smallest errors (meters)" and "mean of smallest errors (meters)". I realize pc_align throws out 25%, which is why "smallest errors" is included in these terms, but I think we can be more descriptive. Really, we're talking about "point distance residuals", not necessarily "errors", as some of the residuals could be due to real changes in some parts of the surface (e.g., glacier melt, vegetation change).

I think we should report stats for the "inliers" used during the "calibration" as well as the full sample of difference values. I realize this why two lines of output are provided, but I think we can improve how this is reported so it is easier for users to understand.

Personally, I would like to see a more sophisticated sampling approach that isolates random samples for calibration and validation. One way to do this would be to remove the initial 25% outliers, and then from the inliers, use a random subset (say, 80%) for the calibration and a random subset (say, 20%) for validation to independently check the result. Right now by default, we are using the same set of points for both calibration and validation, unless the user withholds samples before calling pc_align and then does their own validation independently of the tool.

We should at least include some newlines in pc_align output for improved readability, but I think it would be best to report these "improvement metrics" separately from (after) the main stdout stream (which includes runtimes, the transformation and other stuff that can be overwhelming for new users). Right now the numbrers that matter are buried. Basically, make it is easier for people to easily see relevant information and determine whether things worked.

The documentation for pc_align should have a section dedicated to interpreting the stdout (beyond just describing the metrics and recommending people visualize the output). Right now there is only this...

As such, a way of judging the effectiveness of the tool is to look at the mean of the smallest 75% of the errors before and after alignment.

As mentioned above, this is not what we typically use. I am open to other suggestions here on the best stats to use. Really, it might be best to compute signed (rather than absolute) residuals along local "down" direction (or normal to ellipsoid), as the absolute errors will potentially miss skewed distributions. This could be done as a final step for reporting, after minimizing absolute distances.

I think the doc should also mention how to review the observed translation magnitude and evaluate whether it is appropriate given the expected geolocation accuracy of the two inputs. For example, if aligning a WV DEM with expected horizontal/vertical geolocation accuracy of ~3-5 m CE90/LE90 and ICESat-2 points with expected horizontal/vertical geolocation accuracy of ~3/0.1 m, the combined translation magnitude should be <10 m. If the resulting magnitude is 200 m, then something went wrong, and the output should not be used for analysis.

Describe alternatives you've considered
We currently do this type of evaluation with custom scripts that ingest the csv files and/or pc_align output log to compute/extract relevant numbers and plot with Python scripts. Seems much better to have pc_align report this directly.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions