Cryptographic certificates in WebKit & GCR

One important security feature of a web browser is to allow you to review the cryptographic certificates of the pages you visit. So tonight I will be describing how WebKit exposes this information to Odysseus, et al and if I have time how LibGCR parses it.

I won’t be describing GCR’s UI library though because I opted to redesign that UI. It’s the same UI design other browsers act embarrased to show.

When the page has been committed to tab history, the sandbox notifies the PageProxy of, amongst other things, the cryptographic certificate as part of the load process. This’ll then be stored in the page’s “main frame” (as a shallow wrapper around the WebCore object, which in turn is a shallow wrapper around GTLS as provided by LibSoup) for the GObject bindings to access and deconstruct later.

That basically covers the WebKit side of things.

In my code, I hand the raw data to GCR.SimpleCertificate so as to access the properties it’s parsed out of that, rendering it like an IRL signed certificate (e.g. a diploma). This hands the data over to a library called “egg” via it’s superclass GCR.Certificate.

egg parses the same named tree typically rendered into a tree view (which I object to as it includes needless noise). GCR.Certificate will parse the data upon first property access, and read hardcoded tree paths.

To parse a cryptographic certificate, egg uses an intermediate representation to which it decodes the format’s big ints and length-prefixed lists.

But since the structure’s so class to XML, egg then translates the intermediate representation to GXML. (though I haven’t found the import, so not 100% sure) The main difference is that certificates identify their tags with big ints, and XML ofcourse uses strings.

It finishes off by validating the types for different nodes.

The other main difference this binary certificate format has over XML is that it is dynamically typed, packed in the top few bits of the field identifier. This is used to guide how the next length bytes are parsed, and that type may be a recursive “structured” type - hence the XML similarities egg picks up on.

GBytes were very handy for parsing the length-prefixed lists, and I just verified that the format I’m talking about is “Abstract Syntax Notation 1”.