Duplication detection is prohibitively slow on large crawls

When crawling a site with a large number of URLs, the process gets stuck during the duplication detection step and never reaches `completed` status.

## What happens

* The crawl finishes normally (`No more URLs to crawl`)
* The process starts "Running duplication detection..."
* It appears to hang at this stage and the UI remains in `running`
* The user has to manually stop the process to see the results

## Cause

`detect_duplication_issues` in `issue_detector.py` compares every page with every other page using `SequenceMatcher`.

This results in an O(n²) operation. On large crawls this means millions of comparisons, which can take hours.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplication detection is prohibitively slow on large crawls #45

What happens

Cause

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Duplication detection is prohibitively slow on large crawls #45

Description

What happens

Cause

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions