Before cutting to the chase, some background is necessary going forward. If the words: gopher, SSRF, libcurl ring any bells, then feel free to skip the first section.

Background

Server-Side Request Forgery - SSRF

SSRF is a security vulnerability where an adversary can abuse a server’s functionality in order to access internal services and subsequently update/read resources or leak information. Usually, the adversary provides (directly or indirectly) a URL in order for the server to fetch information from.

In order to make this more concrete, consider for example a social network that users can set their profile photo to an image that the server will fetch from a user-supplied URL. The server will most likely use some kind of HTTP client in order to fetch the image and store it. For the sake of simplicity assume that the user has some kind of feedback about the “fetching” that’s going on, on the backend. Now, if the user supplies the following URL:

http://localhost:22

A TCP connection will be opened on the server’s localhost interface and if that succeeds, and HTTP request will be send through it. Since port 22 is usually used for SSH, this will throw some kind of error back to the user since the SSH handshake won’t be successful. Then, if the user supplies the following URL:

http://localhost:8081

Assuming that no service is listening on port 8081, no TCP connection will be opened and so the HTTP client will through an error stating that it couldn’t successfully open a TCP connection.

From the information leaked above a malicious user can figure out which ports are open and potentially what service is running behind each port, based on the errors thrown back (or in other cases the time it took to respond).

Although this might seem somewhat harmless, there are scenarios in which, this vulnerability can be exploited with far worse payloads as we’ll see below.

Curl - libcurl

cURL is a software project that provides us with a library, namely libcurl, and a command-line tool, curl, that’s based on libcurl for transferring data using various protocols. Bindings for libcurl exist in almost every language. Due to curl’s and libcurl’s popularity, the impact of a potential security vulnerability is vast and it affect many different projects based on them.

Gopher

Gopher according to wikipedia is:

An TCP/IP application layer protocol designed for distributing, searching, and retrieving documents over the internet.

In short, it was used even before browsers were a thing, in order to browse the web. For the purposes of this post, all we care about is that this protocol opens up a TCP connection and throws in it everything we provide it with, in the form of URL-encoded data in the URI. For instance,

gopher://example.org:8080/_insert%20data%20here

insert data here is going to be thrown into a TCP connection to example.org - port 8080

Exploiting gophers

Let’s assume that the social network referenced above handles user-provided URLs in order to fetch users’ profile photos. Also, for simplicity, assume that a redis instance is running on the same host as the backend of the application, on port 6379. It turns out that, either directly (C code) or indirectly (higher level language bindings for libcurl) libcurl is used to handle the “fetching” functionality.

The developers have made sure that no protocols other than http or https is used for security reasons. After some time in production, users complain that their photo was not uploaded correctly although the content is valid (jpg, below 2MB and so on) under the URL they provided. After some debugging, developers figure out that the issue is that redirects are not being followed by the “fetching” happening in the backend.

Developers look up libcurl’s option that allows for following redirects, and they stumble upon CURLOPT_FOLLOWLOCATION. Sure enough, they enable this option and all works out well for both the users and the developers.

What if a malicious user hosts a website that all it does is redirect to other protocols. Oh well, what can one do with this? Could gopher:// mentioned above be used? I would bet it cannot, no way such an obscure and old protocol that has not seen the light of day the past decade (or more) is allowed.

It turns out up until recently that was the case. As seen in libcurl’s source code the default allowed protocols are all protocols supported by libcurl except for:

  • file
  • scp
  • smb
  • smbs

All of the above have been blacklisted over the years due to security issues

Gopher is included in the allowed protocols. What this means is that in our example application, an adversary can host a website that all it does is redirect to the following URL:

gopher://localhost:6379/_FLUSHALL

What the backend of our service will do is handle the redirect, and happily follow the gopher:// scheme, open a TCP connection to the redis instance, and issue the FLUSHALL command. It goes without saying what the impact of this could be. For more info about exploitation for redis, refer to this.

Similarly, any other TCP-based protocol can be abused, text-based or binary (e.g. MySQL) to delete resources, update them, create new and so on. Another example is deleting elasticsearch indices:

gopher://elasticsearch.host:9200/_DELETE%20/some_index%20HTTP%2F1.0%0A

Patching libcurl

The solution to this for our example is to set libcurl’s CURLOPT_REDIR_PROTOCOLS option, and define the allowed redirect protocols there.

Although people have been starting to revive this protocol by hosting gopher sites lately, questions arise:

  • Do we really need such protocols in the general case?
  • How many of us actually use such protocols?
  • Should they be allowed by default by libcurl?

It turns out that curl developers agreed that this is not sane default behaviour, and that exotic protocols such as Gopher should be explicitly allowed in redirects. In this regard, after discussing it in the libcurl mailing list, I opened a PR that got merged a while ago only allowing HTTP, HTTPS and FTP for redirects, by default.


Shoutout to @apoikos for helping me throughout the process of getting my PR merged and for pushing me to actually open one!

Side note

The issue where a user can supply a URL containing internal hosts, e.g. localhost or IPs from the private IP range (10.0.0.0/8) is a quite difficult one to solve as seen here. This is due to RFC3986’s URI definitions being really complex and also covering many encodings. For instance: http://2130706433/ is http://127.0.0.1 in decimal notation.