Dynamic protobuf


This article assumes some basic knowledge of protobuf and Qt.

Protobuf is a nice library for data serialization. It is fast and efficient. The API is ugly (which is not uncommon for google products), but usable. I have played a bit with protobuf with the aim to replace a self-written serializer in a C++ Qt project. As usual the difficulties start with deserialization (maybe that is the reason why all these tools are named “serializers”, I am not aware of any product named “deserializer”). For my current project (a client server application based on message passing) protobuf has two drawbacks:

  1. No transport mechanism
  2. No dynamic deserialization

The first point means protobuf does not define any method to send/receive serialized messages over the wire, it even defines no ways to send/receive a sequence of messages over one connection. That is not a major problem and can be fixed with a few lines of code.
The second point is more important: the old mechanism generated a Qt-signal from the receiving function with a signature like this:

signal:
  void gotMessage(const QVariant& v);

The current message was wrapped into a QVariant. Our old serializer (we used our own message file compiler comparable to the protoc compiler) generated the necessary wrapper code and the right qRegisterMetaType() calls. So each message got a unique type-id. For the wire transfer we used a QDataStream, sent first the type-id and after that the serialized message. The receiving side was able to create the right type from this id (wrapped into a QVariant) and fill it from the serialized message. Each client (in Qt speech: the slot connected to the signal above) was able to restore the original type:

void handleMessage(const QVariant& v);
if (v.userType() == ExpectedMessage::typeId) {
  ExpectedMessage msg = v.value();
  handleExpectedMessage(msg);
}

v.userType() returns the type-id transferred over the wire, where ExpectedMessage::typeId is the “static” type-id returned from qRegisterMetaType(). So each module can register itself as a listener (in Qt: connect) for the gotMessage() signal, filter out the interesting message types and handle them.

This is what I mean by “dynamic deserialization”: feed the deserializer with some data and get back a parsed instance of the right message class. In protobuf each message declared in .proto file becomes a C++ class derived from a basic Message class. That is the same
mechanism we used in our implementation, which makes porting a bit easier. Protobuf does not guess the message type, to read a message protobuf needs know which kind of message it has to read. In an application one have to instantiate the right Message subclass and read the data:

ExpectedMessage msg;
msg.parseFromArray(...)

ExpectedMessage is a class declared as a message in .proto file and derived from google::protobuf::Message. But what I would like to have is:

Message *msg = magicallyReadTheMessage();
if (name(msg) == "expectedmessage") {
  ExpectedMessage *sub = dynamic_cast<ExpectedMessage*>(msg);
  handleSubclass(sub);
}

So is this possible with protobuf too? Fortunately yes, with two small tricks: we include the .proto file into our application and use some magic from the protoc compiler.

The first thing to do is to make the source code of the .proto file available at runtime. In the Qt world we just create a resource file and include the .proto file. (In a real world application, where we create different executables for server and client, we put this resource and all the message handling code into a library and link it to both client and server). Now we read the .proto file at startup and generate type-id from each message in the same manner with qRegisterMetaType() in the old version:

QFile data(":/demo.proto");
if (!data.open(QIODevice::ReadOnly | QIODevice::Text)) {
  qFatal("cannot read proto resource file");
return;
}
QByteArray protoText = data.readAll();

Now we have a string with our messages and can parse it (courtesy goes to https://cxwangyi.wordpress.com/2010/06/29/google-protocol-buffers-online-parsing-of-proto-file-and-related-data-files/):

using namespace google::protobuf;
using namespace google::protobuf::io;
using namespace google::protobuf::compiler;

FileDescriptorProto file_desc_proto;

ArrayInputStream proto_input_stream(protoText.data(), protoText.size());
Tokenizer tokenizer(&amp;proto_input_stream, NULL);
Parser parser;
if (!parser.Parse(&tokenizer, &file_desc_proto)) {
  qFatal("Cannot parse .proto file");
}

Now we can read all message types from file_desc_proto and generate a unique id for each message. In fact we push each message name onto a vector and use the index as id:

    using MessageNames = std::vector<std::string>;
    MessageNames messageNames;
    for(int i=0; i < file_desc_proto.message_type_size();++i) {
        const DescriptorProto& dp = file_desc_proto.message_type(i);
        qDebug() << i << dp.name().c_str();
        messageNames.push_back(dp.name());
    }

All this stuff goes into a initialization function which have to be called at startup. Note I have ignored module prefix stuff (protobuf messages are declared inside a module and get names following the schema module.submodule.messageName).

Sending a message is straight forward:

void sendMsg(const Message& msg) {
    auto id = findIndexOf(msg.GetTypeName()); // looks up a message with this name in the messageNames vector
    QByteArray data((char*)&id, sizeof(id));
    string s = msg.SerializeAsString();
    data.append(QByteArray(s.data()),s.size());
    sendBuffer(data); // send a QByteArray however you like...
}

The corresponding receive method may look like this:

void recvMsg(const QByteArray &data) {
  MessageNames::difference_type idx = *(data.data());
  const string& name = messageNames[idx];
  const google::protobuf::Descriptor *desc = google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(name);
  const google::protobuf::Message *protoMsg = google::protobuf::MessageFactory::generated_factory()->GetPrototype(desc);
  google::protobuf::Message* resultMsg = protoMsg->New();
  resultMsg->ParseFromArray(data.data()+sizeof(MessageNames::difference_type), data.size()-sizeof(MessageNames::difference_type));
  emit handleReceivedMsg(*resultMsg);
  delete resultMsg;
}

After receiving the message type id and looking up the message name in our messagesNames vector we look up a Descriptor for this message and generate a prototype of this message type by calling GetPrototype(). The protoMsg is already an instance of the right subclass. Each protobuf message contains a virtual New() method, to create a fresh instance of this type. We use this to create our own instance and filling it with the received data. Finally we inform all our listening clients about the new message (in Qt speech: emit a signal).
One remaining drawback is, that we need to delete the message after processing it. So we cannot use this signal in queued connections. In my case this is not a problem.

The client (Qt: the receiving slots) can now filter for the right messages:

void handleReceivedMsg(const Message& msg) {
  if (msg.GetTypeName() == "therightmessage") {
    const TheRightMessage& trm = dynamic_cast<const TheRightMessage&>(msg);
    handleTheRightMessage(trm);
  }
}

This code looks very similar to our old code we started with.

Another nice feature of the inclusion of the .proto file is the possibility to create a hash of the contained .proto messages and send it to the server on connect. So we can ensure that both client and server use a compatible .proto declaration.

Of course one could send the message names (instead of the type-ids). But sending long strings instead of short integers is a waste of bandwith. Receiving a fixed size integer is much easier.

As the size of each protobuf message may differ (protobuf put a lot efforts to reduce the size of messages) I consider it a good idea to add a sentinel to each message.

A short demo (using byte buffers to simplify the message transfer part) can be found at https://github.com/valpo/protodemo.

Meeting C++ 2016 Trip report

Disclaimer: alles wie immer sehr subjektiv eingefaerbt.

Nachdem ich letztes Jahr ausgesetzt hatte (genauer: dem Qt World Summit den Vorzug gegeben hatte), habe ich mich dieses Jahr wieder auf den Weg nach Berlin zum Meeting C++ gemacht.

Los ging es am Freitag mit der Keynote von Bjarne Stroustrup. Leicht erkaeltet aber sichtbar gut gelaunt gab es eine Mischung aus Status Update, History und Anekdoten. Gleich im Anschluss machte Bjarne in Vertretung des erkrankten Peter Sommerlad weiter mit einer Einfuehrung in die Core Guidlines und speziell in die GSL.  Man muss Bjarne zugute halten, dass er kurzfristig einsprang, dennoch war es alles in allem ein eher schwacher Vortrag.

Dann erstmal Mittag, dazu kann ich nicht so viel sagen: ich war vom Fruehstueck noch satt.

Im Anschluss habe ich mir dann Nikos Athanasiou angehoert zum Thema “Reduce: From functional programming to C++17 Fold expressions”. Leider ist Nikos im Zeitraffer-Modus durch viel zu viele Slides mit viel zu viel Sourcecode gerauscht. Inhaltlich aber ein schoenes Thema.

Dann ging es zu Odin Holems “Ranges v3 and microcontrollers, a revolution”. Ich persoenlich denke ja, dass Ranges das “next big thing” in C++ werden (koennen). Der Vortrag selber war leider etwas schwach, Odin praesentierte zwar mit viel Humor, bei mir blieb aber der Eindruck haengen, dass von den vielen Ideen die meisten noch ihrer Umsetzung (und damit des Beweises ihrer Funktionalitaet) harren.

Und zum Abschluss noch Phil Nash, der gewohnt professionell und gut strukturiert seine Gedanken zum Thema “Functional C++ for Fun and Profit” praesentierte. Ich muss gestehen, sein string_builder-Konzept kann ich noch nicht so richtig einordnen, umso wertvoller waren seine kleinen Tipps zum funktionalen Initialisieren (IIFE – Achtung, nicht erschrecken: hier anhand von Javascript beschrieben).

Nach 10 Stunden voller Vortraege war ich schon etwas geschafft – man wird nicht juenger. Also erstmal kurz etwas frische Luft und Entspannung und dann zur Party. Die Party erfuellte alle Vorurteile, die Aussenstehende ueber Programmierer hegen: ueberall fanden sich Gruppen zusammen und diskutierten – ja genau: C++ Probleme. Ich wollte frueh ins Bett und maximal bis 10 Uhr bleiben. Als ich das erste Mal auf die Uhr schaute, war es aber schon 11pm durch – sprich: mir hats gefallen.

Samstag startete dann mit “Functional reactive programming in C++” von Ivan Cukic. Der Vortrag hinterliess bei mir einen etwas zwiespaeltigen Eindruck, irgendwie war da nichts bei, was fuer mich neu war und/oder praktischen Nutzen verspricht.

Mehr praktischen Nutzen versprach die “Clang Static Analysis” von Gabor Horvath im Anschluss. Gabor fokussierte zwar mehr auf die Techniken im Hintergrund, gab aber immer auch Hinweise auf die praktische Anwendung. Den Codechecker muss ich mir nochmal genauer anschauen.

Als letzten Vortrag gab es nochmal richtig schwere Kost: “How to test static_assert?” von Roland Bock. Kurz zusammengefasst: wie testet man die Meta-Ebene? Man schreibt Tests auf der Meta-Meta-Meta-Ebene und laesst sie auf der Meta-Meta-Ebene laufen. Ist doch klar. Praktisch anwenden werde ich das wohl eher nicht, aber als Denkuebung hat es Spass gemacht.

Alles in allem zwei schoene Tage, interessante Themen, nette und mindestens genauso interessante Leute, gute Organisation und angenehme Location. Den Rest vom Samstag (total verregnet) und den Sonntag (herrlicher Sonnenschein) habe ich dann noch fuer ein paar Touren durch Berlin genutzt.

Caching

As soon as developer finds a performance problem (or more probably someone else finds a performance problem and informs the developer), he starts thinking about caching. Caching is a great thing regarding performance, and is terrible regarding consistency. Look at the web: it is slow by design, so if you compete with desktop applications, you have to cheat use caching. Which will surely break consistency.

Lets look at the firefox (in version 39).

We use (of course) all this modern stuff like Jira and Confluence, with single-sign-on and Kerberos and all this good things. It works under Linux too. Not that easy, but it is possible to get this running (search for network.negotiate-auth.trusted-uris if you want to know more about this). So just call kinit on the command line to obtain a ticket, and you get sso with firefox. Just in case you made the mistake to start firefox first (before kinit), firefox asks for your login. So you run kinit, get your ticket and refresh this page in firefox. Bad mistake: firefox caches you credentials (in this case it caches the fact you do not have credentials). And there is no other way to clear this cache then restarting firefox.

This can be very funny, as firefox caches your credentials in conjunction with the requested page. So if you are able to access jira, to run all queries, obtain all tickets except the one you have tried first in the morning – yes, that is caching.

The same thing happens with dns problems. Firefox never forgets a failed dns lookup. No way to clear the cache. Just restart the browser. Cross fingers, in most cases firefox can recover all you tabs. If not – live is hard.