Skip to content

Bump nutch from 1.11 to 1.18#2

Open
dependabot[bot] wants to merge 1 commit intogeneral_scraperfrom
dependabot/maven/org.apache.nutch-nutch-1.18
Open

Bump nutch from 1.11 to 1.18#2
dependabot[bot] wants to merge 1 commit intogeneral_scraperfrom
dependabot/maven/org.apache.nutch-nutch-1.18

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Mar 21, 2022

Bumps nutch from 1.11 to 1.18.

Changelog

Sourced from nutch's changelog.

Nutch Change Log

Nutch 1.18 Release 14/01/2021 (dd/mm/yyyy) Release Report: https://s.apache.org/lqara

Breaking Changes

- As part of NUTCH-2805, the plugin urlfilter-domainblacklist has been renamed to urlfilter-domaindenylist. And the fields required for the plugin urlfilter.domainblacklist.rules and urlfilter.domainblacklist.file has been replaced with urlfilter.domaindenylist.rules and urlfilter.domaindenylist.file respectively. See NUTCH-2802 for more details.

Sub-task

[NUTCH-2671] - Upgrade ant ivy library
[NUTCH-2672] - Ant build erronously installs *-test.jar instead *.jar for target "nightly"
[NUTCH-2805] - Rename plugin urlfilter-domainblacklist
[NUTCH-2809] - Upgrade any23 plugin dependency to 2.4
[NUTCH-2816] - Add Spotbugs target to ant build
[NUTCH-2817] - Avoid check for equality of URL path and file part using ==/!=
[NUTCH-2829] - Fix ant target "clean-cache"

Bug

[NUTCH-2669] - Reliable solution for javax.ws packaging.type
[NUTCH-2697] - Upgrade Ivy to fix the issue of an unset packaging.type property
[NUTCH-2801] - RobotsRulesParser command-line checker to use http.robots.agents as fall-back
[NUTCH-2810] - FreeGenerator to actually apply configured number of fetch lists
[NUTCH-2813] - MoreIndexingFilter - can't parse erroneous date - 2019-07-03T10:28:14
[NUTCH-2814] - HttpDateFormat's internal time zone may change after parsing a date
[NUTCH-2818] - Ant build: upgrade Apache Rat report task
[NUTCH-2823] - IllegalStateException in IndexWriters.describe() when validating url param for SolrIndexer
[NUTCH-2824] - urlnormalizer-basic to unescape percent-encoded host names

Improvement

[NUTCH-1190] - MoreIndexingFilter refactor: move data formats used to parse "lastModified" to a config file.
[NUTCH-2582] - Set pool size of XML SAX parsers used for MIME detection in Tika 1.19
[NUTCH-2730] - SitemapProcessor to treat sitemap URLs as Set instead of List
[NUTCH-2782] - protocol-http / lib-http: support TLSv1.3
[NUTCH-2796] - Upgrade to crawler-commons 1.1
[NUTCH-2799] - Add .asf.yaml file
[NUTCH-2833] - Upgrade to Tika 1.25
[NUTCH-2835] - Upgrade commons-jexl from 2 --> 3
[NUTCH-2836] - Upgrade various commons dependencies
[NUTCH-2837] - Update multiple dependencies
[NUTCH-2841] - Upgrade xercesImpl dependency

Wish

[NUTCH-2834] - Deduplication mode via command line in crawl script

Task

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [nutch](https://github.com/apache/nutch) from 1.11 to 1.18.
- [Release notes](https://github.com/apache/nutch/releases)
- [Changelog](https://github.com/apache/nutch/blob/master/CHANGES.txt)
- [Commits](https://github.com/apache/nutch/commits/release-1.18)

---
updated-dependencies:
- dependency-name: org.apache.nutch:nutch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants