Skip to content

Conversation

@onilton
Copy link

@onilton onilton commented Jul 11, 2014

https://docs.python.org/2/library/urlparse.html#urlparse.urljoin provides a robust way to make a relative url into a absolute one.

This fixes some issues like this one:

When accessing this url:
http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/

We find relative links like this:
resultado_busca.html?letra=a

The browser (chrome) build the absolute url like this:
http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/resultado_busca.html?letra=a

But crawley build the url like this:
http://www1.abracom.org.br/resultado_busca.html?letra=a

urljoin fixes the issue, keeping the right behavior for /relativeurl:

In a hypothetical page http://mydomain.com/my/web/page.html:

'/relativeurl.html' link should become 'http://mydomain.com/relativeurl.html'

and

'relativeurl.html' link should become 'http://mydomain.com/my/web/relativeurl.html'

@onilton
Copy link
Author

onilton commented Nov 13, 2015

Any news on that? :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant