Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lean on SSLContext to do HTTPS hostname verification #517

Closed
dstufft opened this issue Dec 1, 2014 · 16 comments · Fixed by #2178
Closed

Lean on SSLContext to do HTTPS hostname verification #517

dstufft opened this issue Dec 1, 2014 · 16 comments · Fixed by #2178
Labels
Milestone

Comments

@dstufft
Copy link
Contributor

dstufft commented Dec 1, 2014

In Python 3.4 and 2.7.9 there is a check_hostname attribute on SSLContext which will have the SSLContext instance handle checking the hostname inside of the do_handshake. I think it would be good for urllib3 to rely on this where possible instead of doing the check itself. I think this would go good with the other efforts to use SSLContext as the "bag of configuration" for TLS stuff.

This can be detected by determining if the SSLContext object has a check_hostname attribute or not.

There is one downside, this relies on passing the hostname as part of SSLContext().wrap_socket(server_name=). Originally this only worked if the OpenSSL had SNI enabled. However Python 3.4.3 and 2.7.9 will accept server_name even if SNI isn't enabled.

This would mean that the code a little ugly until you drop older versions of Python 3, it would look something like (paraphrased and super simplified to just be the ugly parts):

import ssl
import sys

# In reality these would use the SSLContext shim that urllib3 has in
# order to work on Python 2.6 too and 2.7 earlier than 

if ssl.HAS_SNI or sys.version_info[0] == 2 or sys.version_info[:3] >= (3, 4, 3):
    ctx = SSLContext(ssl.PROTOCOL_SSLv23)
    ctx.check_hostname = True
    sock = ctx.wrap_socket(sock, server_name="example.com")
else:
    ctx = SSLContext(ssl.PROTOCOL_SSLv23)
    sock = ctx.wrap_socket(sock)

    if not ssl.match_hostname(sock.getpeercert(), "example.com"):
        raise Error()

Maybe this is a bad idea? I think it would be great to lay the ground work for eventually moving the responsibility for checking hostnames into the stdlib, however you wouldn't actually be able to do that completely without the ugly if statement until 3.4.3 was your minimum supports Python 3 version.

@shazow
Copy link
Member

shazow commented Dec 1, 2014

+1

@mjpieters
Copy link

Currently, it appears that because urllib3 doesn't use this, an exception is raised.

I haven't had a chance to isolate this yet, but a pip install traceback from this Stack Overflow question using Python 2.7.9 implicates urllib3's use of wrap_socket():

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/basecommand.py", line 232, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/commands/install.py", line 339, in run
    requirement_set.prepare_files(finder)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/req/req_set.py", line 333, in prepare_files
    upgrade=self.upgrade,
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/index.py", line 305, in find_requirement
    page = self._get_page(main_index_url, req)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/index.py", line 783, in _get_page
    return HTMLPage.get_page(link, req, session=self.session)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/index.py", line 872, in get_page
    "Cache-Control": "max-age=600",
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/sessions.py", line 473, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/download.py", line 365, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/sessions.py", line 461, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/cachecontrol/adapter.py", line 43, in send
    resp = super(CacheControlAdapter, self).send(request, **kw)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/packages/urllib3/connectionpool.py", line 518, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/packages/urllib3/connectionpool.py", line 322, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/packages/urllib3/connectionpool.py", line 727, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/packages/urllib3/connection.py", line 238, in connect
    ssl_version=resolved_ssl_version)
  File "/usr/local/lib/python2.7/site-packages/pip-6.0.7-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py", line 254, in ssl_wrap_socket
    return context.wrap_socket(sock)
  File "/usr/local/lib/python2.7/ssl.py", line 350, in wrap_socket
    _context=self)
  File "/usr/local/lib/python2.7/ssl.py", line 537, in __init__
    raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname

@mjpieters
Copy link

Followup: This is pip 6.0.7 (obvious from the traceback), which uses requests 2.5.1 (see pypa/pip@ec51b69) which in turn uses a urllib3 checkout with SHA @a27758625e4169330fcf965652b1093faf5aaaa2 (see kennethreitz/requests@d2d576b).

Should a new ticket be opened? Or should this be tracked under a different issue?

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Feb 4, 2015

It's a new ticket, and it's what we should have opened when I closed kennethreitz/requests#2435.

@nartes
Copy link

nartes commented Jan 20, 2019

Looks like the problem is:

  1. ssl_context supports check_hostname to be set to False. As a result match_hostname from ssl module will not be called upon a handshake stage.
  2. But urllib3 has built in a repetitive logic, which relies on its own parameters to control hostname checking.
  3. The behaviour is usefull when we are not relying on hostname verification, but only ensure the certified key matches a public key used by the other side for encryption.
  4. To make a workaround, developer have added fingerprint parameter for PoolManager. This way
    https://github.com/urllib3/urllib3/blob/master/src/urllib3/connection.py#L346
        if self.assert_fingerprint:
            assert_fingerprint(self.sock.getpeercert(binary_form=True),
                               self.assert_fingerprint)
        elif context.verify_mode != ssl.CERT_NONE \
                and not getattr(context, 'check_hostname', False) \
                and self.assert_hostname is not False:
            # While urllib3 attempts to always turn off hostname matching from
            # the TLS library, this cannot always be done. So we check whether
            # the TLS Library still thinks it's matching hostnames.
            cert = self.sock.getpeercert()
            if not cert.get('subjectAltName', ()):
                warnings.warn((
                    'Certificate for {0} has no `subjectAltName`, falling back to check for a '
                    '`commonName` for now. This feature is being removed by major browsers and '
                    'deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 '
                    'for details.)'.format(hostname)),
                    SubjectAltNameWarning
                )
            _match_hostname(cert, self.assert_hostname or server_hostname)

        self.is_verified = (
            context.verify_mode == ssl.CERT_REQUIRED or
            self.assert_fingerprint is not None
        )

no DO REPEAT YOURSELF logic is going to be called.
But fingerprint verification and certificate matching with a public key are not equal things. Since fingerprint is a shorter version of a full public key.
5. So urllib3 provides an abstraction layer which introduce lots of unnecessary clutter, yet doesn't stand for clear, open cryptography standards. That ensure people know and can control what's going on inside.
6. So we are left with the following clumsy hackaround to be built:

class CustomAdapter(requests.adapters.HTTPAdapter):                                                                                                                                                                                                                                                                                               
    def __init__(self, poolmanager_server_hostname, *args, **kwargs):                                                                                                                                                                                                                                                                             
        self.poolmanager_server_hostname = poolmanager_server_hostname                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                  
        requests.adapters.HTTPAdapter.__init__(self, *args, **kwargs)                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                                  
    def init_poolmanager(self, *args, **kwargs):                                                                                                                                                                                                                                                                                                  
        self.poolmanager = requests.adapters.PoolManager(                                                                                                                                                                                                                                                                                         
            *args,                                                                                                                                                                                                                                                                                                                                
            server_hostname=self.poolmanager_server_hostname,                                                                                                                                                                                                                                                                                     
            **kwargs)         
s = requests.Session()
prefix = 'https://<ip_addr>'
server_certificate_hostname = '<some_obfuscated_hostname_inside_self_signed_certificate>'
s.mount(prefix, CustomAdapter(poolmanager_server_hostname=server_certificate_hostname))
with s.request(<parameters>)
<etc>

@sethmlarson
Copy link
Member

sethmlarson commented Jan 20, 2019

Maybe I'm not understanding your comment but I wanted to make clear that we do hostname verification of certs, not just fingerprinting. When an HTTPSConnection has assert_hostname set to a non-False and the SSLContext object that we've constructed from create_urllib3_context() has check_hostname set to False (meaning we've successfully told the SSLContext object that we're handling hostname checking) then our own hostname checking fires within _match_hostname. You can find the implementation within urllib3.packages.ssl_match_hostname._implementation. It'll look familiar if you look at how CPython handles hostname checking.

I can't speak for requests but if you want to accomplish what I believe you want to do with just urllib3 you can do this:

import urllib3
pool = urllib3.PoolManager(
    cert_reqs='REQUIRED',
    assert_hostname=server_certificate_hostname,
    # If you want SNI to emit this hostname as well you need this too:
    # server_hostname=server_certificate_hostname
)
r = pool.request('GET', 'https://<ip address>')

If you have more questions about how we do hostname verification, SNI, or TLS I can answer them.

@nartes
Copy link

nartes commented Jan 21, 2019

@sethmlarson, yeah seems like assert_hostname is what needed!

  1. Is it uncommon practice to ignore hostname verification for self-signed certificates?
    I though it is unnecessary, since relying on DNS resolution, or doing IP pinning, won't give more security.
  2. It is still like:
import requests
class CustomAdapter(requests.adapters.HTTPAdapter):                                                                                                                                                                                                                                                                                               
    def __init__(self, *args, assert_hostname=None, **kwargs):                                                                                                                                                                                                                                                                             
        self.assert_hostname = assert_hostname                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
        requests.adapters.HTTPAdapter.__init__(self, *args, **kwargs)                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                                  
    def init_poolmanager(self, *args, **kwargs):                                                                                                                                                                                                                                                                                                  
        self.poolmanager = requests.adapters.PoolManager(                                                                                                                                                                                                                                                                                         
            *args,                                                                                                                                                                                                                                                                                                                                
            assert_hostname=self.assert_hostname,                                                                                                                                                                                                                                                                                     
            **kwargs)         
s = requests.Session()
prefix = 'https://<ip_addr>'
s.mount(prefix, CustomAdapter(assert_hostname=False))
with s.request(<parameters>) as resp:
    <etc>
  1. Probably it's better to switch to urllib3 only. I'm looking at the problem from the requests library point of view. They mention almost nothing in their documentation regarding PoolManager options. Just a brisk example.
  • proxy configuration parameters in requests are quite simple
  • although requests doesn't integrate with async, and I need to run thread_pool_executor
  • ssl configuration parameters are unnecessarily simplified in requests, urllib3 has them much closer to the ones in openssl, curl implementations.

@sethmlarson
Copy link
Member

  1. I'm not sure! I'd say that keeping things in development environments as close to production environments as possible is preferable. So if you do hostname verification in production you should strive to do it in development too.

2+3. This is up to you, you've got your adapter written already and now you can just use and extend it if you want. urllib3 is intended to be a low-level interface and requests is a higher-level interface.

@ofek
Copy link
Contributor

ofek commented Mar 26, 2020

Any progress on leveraging stdlib's logic?

@sethmlarson
Copy link
Member

I don't think so, this issue is up for grabs still :)

@sethmlarson
Copy link
Member

sethmlarson commented Dec 9, 2020

Couple of things that came up during my initial work-through of this change:

  • (wrt remove common name support Feature: Remove support for common names in certificates #497) hostname_checks_common_name isn't available in Python 3.6 so we can't disable
    via SSLContext. Do we keep our custom match_hostname() verification code and simply change it to not check commonName for Python 3.6 until Python 3.7 is the minimum supported version?

  • assert_hostname versus server_hostname, currently we allow sending an SNI X and then verifying
    certificate matches hostname Y via server_hostname=X, assert_hostname=Y. I wonder how frequent this
    arrangement is used? To continue supporting this may need to keep our custom match_hostname.
    I've been kind-of bugged by the whole assert_hostname/server_hostname option duality in the past.

@pquentin
Copy link
Member

pquentin commented Dec 9, 2020

  • I'm not clear on the details, but is it possible to use hostname_checks_common_name in Python 3.7+ and only use our custom match_hostname() verification code for Python 3.6? Then the code will be easy to remove when we drop 3.6 support in 12 months.
  • This is definitely something I struggle with too. I think we should continue supporting this because there's no good reason to remove it, but we should probably improve the documentation. https://urllib3.readthedocs.io/en/latest/advanced-usage.html#custom-sni-hostname only gives an example where we use the same value twice.

(The way I'm seeing this is that removing match_hostname() should not be the goal. The goal should be to rely on the standard library whenever possible.)

@pquentin
Copy link
Member

An additional complication is that ssl.match_hostname() has been deprecated in Python 3.7+.

@pquentin
Copy link
Member

pquentin commented Mar 8, 2021

So there are three things that urllib3 does that are not possible by setting context.check_hostname to True:

  • alternative hostnames with assert_hostname (with or without SNI)
  • fingerprint identification
  • stop checking common name in Python 3.6

If we decide to keep the first two, as I think we should, then this is issue is not a breaking change and can always be handled later in v2, alongside #2111. (And it will be easier when we drop Python 3.6 support.)

@pquentin
Copy link
Member

Talked with Seth and he's worried that doing this later breaks weird use cases that call our internal but documented functions. So I'll work on this as part of 2.0, not 2.x.

Another complication: pyOpenSSL only learned recently how to check hostnames in a simple way. But the release that includes that change also drops support for OpenSSL 1.0.2. So we're also going to check hostnames manually with pyOpenSSL. And when we'll drop support for OpenSSL 1.0.2, we'll also require pyOpenSSL >= 20.0.0.

@pquentin
Copy link
Member

An additional complication is that ssl.match_hostname() has been deprecated in Python 3.7+.

Actually, this is fine: we need to maintain our own implementation anyway to remove commonName support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants