Friday, June 7, 2013

WebRTC: Security and Confidentiality

One of the interesting aspects of WebRTC is that it has encryption baked right into it; there's actually no way to send unencrypted media using a WebRTC implementation. The developing specifications currently use DTLS-SRTP keying[1], and that's what both Chrome and Firefox implement. The general idea here is that there's a Diffie-Hellman key exchange in the media channel, without the web site -- or even the javascript implementation -- being involved at all.

But who are you talking to?

This is only part of the story, though. While encryption is one of the tools necessary to prevent eavesdropping by other parties, it is in no way sufficient. Unless you have some way to demonstrate that the other end of your encrypted connection is under the control of the person you're talking to, you could easy be sending your media to a server that is capable of sharing your conversation with an arbitrary third party. One important tool to help with this problem is the WebRTC identity work currently underway in the W3C. This isn't ready in any implementations that I'm aware of yet, but it's definitely something that needs to happen before we consider WebRTC done.

The general idea behind the identity work is that, as part of key exchange, you also get enough information to prove, in a cryptographically verifiable way, that the other end of the connection are who you think they are. Of course, there are still some tricky aspects to this (you have to, for example, trust Google not to sign off on someone other than me being ""[2]), but you can at least reduce the problem from trusting one party (the website hosting the WebRTC application) to trusting that two parties (the website and the identity provider) won't collude.

The other tool necessary to ensure the confidentiality of contents is making sure that the media isn’t being copied by the javascript itself and being sent to an alternate destination. This isn’t part of any current specification, but we’re working on adding a standardized mechanism that will allow specific user media streams to be limited so that they can only be sent, over an encrypted channel, to a specified identity (and nowhere else).

On top of this, web browser developers have a very difficult task in presenting this to users in a way that they can use. The nuances between (and implications of) "this is encrypted but we can't prove who you're talking to" versus "this is being encrypted and sent directly to (at least if you trust Google)" are very subtle. Rendering this to users is a thorny challenge, and one that's going to take time to get right.

And who knows who you are talking to?

Of course, none of this is perfect. The recent Verizon brouhaha is about a database of who is communicating with whom (known in the communications interception community as a "pen register"), not actually listening in on phone calls. It uses telephone numbers as identifiers, which are pretty easy to correlate to an owner. WebRTC can't prevent this kind of information from being collected,  using IP addresses where Verizon uses phone numbers. IP addresses aren't much harder to correlate to people than phone numbers are, as has been demonstrated by numerous MPAA and RIAA lawsuits.

Even with a good encryption story, WebRTC has no inbuilt defenses to collecting this kind of information. Anyone with access to the session description is going to be able to see the IP addresses of both parties to the conversation; and, of course, the website is going to know where the HTTP requests came from. Beyond that, your ISP (and every backbone provider between you and the other end of the call) can easily see which IP addresses you're sending information to, and picking media streams out (even encrypted media streams) is a trivial exercise for the kinds of equipment ISPs and backbone providers deploy.

The problem  is that it's fundamentally difficult to mask who is talking to whom on a network. There are approaches, such as anonymizers and Onion Routers, that can be used to make it more difficult to ascertain; but such approaches have their own weaknesses, and most simply shift trust around from one third party to another.

In summary, WebRTC is taking steps to allow for the contents of communication to remain confidential, but it takes a concerted effort by application developers to bring the right tools together. The less tractable problem of masking who talks to whom is left as out of scope.

[1] There's been recent talk in the IETF RTCWEB working group of adding Security Descriptions (SDES) as an alternate means of key exchange. SDES uses the signaling channel to send the media encryption keys from one end of the connection to the other. This would necessarily allow the web site to access the encryption keys. This means that they (or anyone they handed the keys off to) could decrypt the media, if they have access to it. In terms of stopping some random hacker in the same hotel as you from listening in while you talk to your bank, it's still reasonably effective; in the context of programs like PRISM, or even the pervasive collection of personal data by major internet website operators, it's about as much protection as using tissue paper to stop a bullet.

[2] Whether you choose to do so mostly comes down to whether you trust this blog entry more than this slide.


  1. Indispensable article Adams!

    At Quolony Tech we believe that web or mobile applications MUST be able to choose the right provider for the kind of device and service.

    Might browser or operating system provide a user identities collection and the means to establish a "personal firewall" for communication end-points, but this is only an option for the service.

    Service SHOULD be able to delegate authentication and authorization to "outside providers" like those who offer OAuth (tipically Google, Facebook, Twitter), Mozilla Persona, ...

    Courage to WebRTC, a life-saver for the Web.

  2. Hello Adam, first of all congrats, nice articles about webRTC.

    I'm a student and I'm making my own little experiments with webRTC in order to understand it but there is a technical issue that's not so clear for me.

    I'm running some tests in this web and I've captured the traffic using wireshark in order to figure out whats going on.
    Do you know how clients get destination IP address? Is this issue stablished (standarized) in webRTC or it depends on the web application? And also, what protocols use webRTC to control sessions?

    Thank you in advance.


    1. Miguel --

      Thanks for the compliment.

      I actually have a partially-finished article on federation that talks a little about the signaling (at least as a tangent), but it will probably be a while before I find time to finish writing that. The short answer to your question is that it's entirely up to the applications how they choose to exchange the information necessary to set up a session, including IP addresses. In practice, most applications are going to be sending the literal SDP offers and answers that they get out of the PeerConnection objects to each other, along with the literal ICE candidates (basically, additional IP addresses and ports) that they receive via callbacks. This can be done via the web server hosting the scripts; using a websockets-based intermediary; or directly between browsers, using the PeerConnection data channel (which can only be done after using one of the other techniques to set up the data channel in the first place).

      Did that answer your question?