[go: up one dir, main page]

The vulnerability might be in the proof‑of‑concept
Seth Larson @ 2025-08-27
The Security Developer-in-Residence role at the Python Software Foundation is funded by Alpha-Omega. Thanks to Alpha-Omega for sponsoring security in the Python ecosystem.

I'm on the security team for multiple open source projects with ~medium levels of report volume. Over the years, you see patterns in how reporters try to have a report accepted as a vulnerability in the project.

One pattern that I see frequently is submitting proof-of-concept code that itself contains the vulnerability. However, the project code is also used, so the reporters try to convince you that the vulnerability is in the project code.

Here's a simplified version of reports that the Python Security Response Team sees fairly frequently:

user_controlled_value = "..."

# ...(some layers of indirection)

eval(user_controlled_value)  # RCE!!!

This isn't a vulnerability in Python, clearly. Python is designed to execute code, so if you tell Python to execute code it will do so. But it can be less obvious when there's a more subtle vulnerability in the proof-of-concept. The below example filters user-controlled URLs and returns an HTTP response for acceptable URLs:

import urllib3
from urllib.parse import urlparse

def safe_url_opener(url):
    input_url = urlparse(url)
    input_scheme = input_url.scheme
    input_host = input_url.hostname

    block_schemes = ["file", "ftp"]
    block_hosts = ["evil.com"]
    if input_scheme in block_schemes:
        return None
    if input_host in block_hosts:
        return None

    return urllib3.request("GET", url)

The reporter claimed that there was a vulnerability in urlparse because the parser behaved differently than urllib3.request and thus an attacker would be able to circumvent the block list with a URL crafted to exploit these differences (“SSRF”).

Keep in mind both urlparse and urllib3 both implement RFC 3986, but due to backwards compatibility urllib3 supports “scheme-less” URLs in the form “localhost:8080/” to be accepted and handled as “http://localhost:8080/”.

I didn't agree with this reporters determination, instead I asserted that the safe_url_opener() function contains the vulnerability. To prove this, I implemented a safe_url_opener() function that uses urlparse with urllib3 securely:

import urllib3
from urllib.parse import urlparse

def safe_url_opener(unsafe_url):
    safe_url = urlparse(unsafe_url)

    # Use an allow-list, not a block-list.
    allow_schemes = ["https"]
    allow_hosts = ["good.com"]
    if safe_url.scheme not in allow_schemes:
        return
    if safe_url.hostname not in allow_hosts:
        return

    # Check the URL doesn't have components we don't expect.
    if safe_url.auth is not None or safe_url.port is not None:
        return

    # Use the safe parsed values, not the unsafe URL.
    pool = urllib3.HTTPSConnectionPool(
        host=safe_url.hostname,
        assert_hostname=safe_url.hostname,
    )
    target = safe_url.path or "/"
    if safe_url.query:
        target += f"?{safe_url.query}"
    return pool.request("GET", target)

The above program could be even more secure and use urllib3's urllib3.util.parse_url() function to completely remove SSRF potential.

This post is meant as a reminder to security teams and maintainers of open source projects that sometimes the vulnerability is in the proof-of-concept and not your own project's code. Having a security policy (e.g. “urlparse strictly implements RFC 3986 regardless of other implementation behaviors”) and threat model (e.g. “users must not combine with other URL parsers”) documented for public APIs means security reports can be treated consistently while minimizing stress and reducing repeated-research into historical decisions around API design.

Wow, you made it to the end!