Skip to content

Use urljoin to fix relative urls#14

Open
onilton wants to merge 1 commit into
jmg:masterfrom
onilton:urljoin
Open

Use urljoin to fix relative urls#14
onilton wants to merge 1 commit into
jmg:masterfrom
onilton:urljoin

Conversation

@onilton
Copy link
Copy Markdown

@onilton onilton commented Jul 11, 2014

https://docs.python.org/2/library/urlparse.html#urlparse.urljoin provides a robust way to make a relative url into a absolute one.

This fixes some issues like this one:

When accessing this url:
http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/

We find relative links like this:
resultado_busca.html?letra=a

The browser (chrome) build the absolute url like this:
http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/resultado_busca.html?letra=a

But crawley build the url like this:
http://www1.abracom.org.br/resultado_busca.html?letra=a

urljoin fixes the issue, keeping the right behavior for /relativeurl:

In a hypothetical page http://mydomain.com/my/web/page.html:

'/relativeurl.html' link should become 'http://mydomain.com/relativeurl.html'

and

'relativeurl.html' link should become 'http://mydomain.com/my/web/relativeurl.html'

@onilton
Copy link
Copy Markdown
Author

onilton commented Nov 13, 2015

Any news on that? :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant