Skip to content

URLs replaced in inappropriate contexts (i.e. inside <a href="">...) #4

@carljm

Description

@carljm

This is similar/related to #3, but it's a broader issue, not specific to Wikipedia.

There is no context-sensitivity in the replacement, so we've had cases where a link to a Flickr photo (that was intended to be just a link) got replaced with totally invalid HTML:

<a href="http://www.flickr.com/photos/gruber/4309828383">something</a>

gets turned into:

<a href="<img src="http://farm3.static.flickr.com/2690/4309828383_6cc07082f6_m.jpg" alt="Jobs Listens to Mossberg\'s Ideas About What\'s Wrong With the iPad"></img>">something</a>

I realize that given the way OEmbed uses regexes, this is a tough nut to crack in the general case. Is the only real solution to never run OEmbed on chunks of text that might already contain HTML?

Apart from the heavyweight options that don't seem realistic (parsing the text into a DOM tree and only running OEmbed on the cdata nodes?), one simple "80%" fix would be to enforce at least one character of white-space on either end of the URL. Technically a link could have href=" http://..." but that's pretty unlikely, so I think this would improve the situation quite a bit.

Would a working patch like that be considered, or is this just a case of "don't do that"?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions