Dynamic protobuf


This article assumes some basic knowledge of protobuf and Qt.

Protobuf is a nice library for data serialization. It is fast and efficient. The API is ugly (which is not uncommon for google products), but usable. I have played a bit with protobuf with the aim to replace a self-written serializer in a C++ Qt project. As usual the difficulties start with deserialization (maybe that is the reason why all these tools are named “serializers”, I am not aware of any product named “deserializer”). For my current project (a client server application based on message passing) protobuf has two drawbacks:

  1. No transport mechanism
  2. No dynamic deserialization

The first point means protobuf does not define any method to send/receive serialized messages over the wire, it even defines no ways to send/receive a sequence of messages over one connection. That is not a major problem and can be fixed with a few lines of code.
The second point is more important: the old mechanism generated a Qt-signal from the receiving function with a signature like this:

signal:
  void gotMessage(const QVariant& v);

The current message was wrapped into a QVariant. Our old serializer (we used our own message file compiler comparable to the protoc compiler) generated the necessary wrapper code and the right qRegisterMetaType() calls. So each message got a unique type-id. For the wire transfer we used a QDataStream, sent first the type-id and after that the serialized message. The receiving side was able to create the right type from this id (wrapped into a QVariant) and fill it from the serialized message. Each client (in Qt speech: the slot connected to the signal above) was able to restore the original type:

void handleMessage(const QVariant& v);
if (v.userType() == ExpectedMessage::typeId) {
  ExpectedMessage msg = v.value();
  handleExpectedMessage(msg);
}

v.userType() returns the type-id transferred over the wire, where ExpectedMessage::typeId is the “static” type-id returned from qRegisterMetaType(). So each module can register itself as a listener (in Qt: connect) for the gotMessage() signal, filter out the interesting message types and handle them.

This is what I mean by “dynamic deserialization”: feed the deserializer with some data and get back a parsed instance of the right message class. In protobuf each message declared in .proto file becomes a C++ class derived from a basic Message class. That is the same
mechanism we used in our implementation, which makes porting a bit easier. Protobuf does not guess the message type, to read a message protobuf needs know which kind of message it has to read. In an application one have to instantiate the right Message subclass and read the data:

ExpectedMessage msg;
msg.parseFromArray(...)

ExpectedMessage is a class declared as a message in .proto file and derived from google::protobuf::Message. But what I would like to have is:

Message *msg = magicallyReadTheMessage();
if (name(msg) == "expectedmessage") {
  ExpectedMessage *sub = dynamic_cast<ExpectedMessage*>(msg);
  handleSubclass(sub);
}

So is this possible with protobuf too? Fortunately yes, with two small tricks: we include the .proto file into our application and use some magic from the protoc compiler.

The first thing to do is to make the source code of the .proto file available at runtime. In the Qt world we just create a resource file and include the .proto file. (In a real world application, where we create different executables for server and client, we put this resource and all the message handling code into a library and link it to both client and server). Now we read the .proto file at startup and generate type-id from each message in the same manner with qRegisterMetaType() in the old version:

QFile data(":/demo.proto");
if (!data.open(QIODevice::ReadOnly | QIODevice::Text)) {
  qFatal("cannot read proto resource file");
return;
}
QByteArray protoText = data.readAll();

Now we have a string with our messages and can parse it (courtesy goes to https://cxwangyi.wordpress.com/2010/06/29/google-protocol-buffers-online-parsing-of-proto-file-and-related-data-files/):

using namespace google::protobuf;
using namespace google::protobuf::io;
using namespace google::protobuf::compiler;

FileDescriptorProto file_desc_proto;

ArrayInputStream proto_input_stream(protoText.data(), protoText.size());
Tokenizer tokenizer(&amp;proto_input_stream, NULL);
Parser parser;
if (!parser.Parse(&tokenizer, &file_desc_proto)) {
  qFatal("Cannot parse .proto file");
}

Now we can read all message types from file_desc_proto and generate a unique id for each message. In fact we push each message name onto a vector and use the index as id:

    using MessageNames = std::vector<std::string>;
    MessageNames messageNames;
    for(int i=0; i < file_desc_proto.message_type_size();++i) {
        const DescriptorProto& dp = file_desc_proto.message_type(i);
        qDebug() << i << dp.name().c_str();
        messageNames.push_back(dp.name());
    }

All this stuff goes into a initialization function which have to be called at startup. Note I have ignored module prefix stuff (protobuf messages are declared inside a module and get names following the schema module.submodule.messageName).

Sending a message is straight forward:

void sendMsg(const Message& msg) {
    auto id = findIndexOf(msg.GetTypeName()); // looks up a message with this name in the messageNames vector
    QByteArray data((char*)&id, sizeof(id));
    string s = msg.SerializeAsString();
    data.append(QByteArray(s.data()),s.size());
    sendBuffer(data); // send a QByteArray however you like...
}

The corresponding receive method may look like this:

void recvMsg(const QByteArray &data) {
  MessageNames::difference_type idx = *(data.data());
  const string& name = messageNames[idx];
  const google::protobuf::Descriptor *desc = google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(name);
  const google::protobuf::Message *protoMsg = google::protobuf::MessageFactory::generated_factory()->GetPrototype(desc);
  google::protobuf::Message* resultMsg = protoMsg->New();
  resultMsg->ParseFromArray(data.data()+sizeof(MessageNames::difference_type), data.size()-sizeof(MessageNames::difference_type));
  emit handleReceivedMsg(*resultMsg);
  delete resultMsg;
}

After receiving the message type id and looking up the message name in our messagesNames vector we look up a Descriptor for this message and generate a prototype of this message type by calling GetPrototype(). The protoMsg is already an instance of the right subclass. Each protobuf message contains a virtual New() method, to create a fresh instance of this type. We use this to create our own instance and filling it with the received data. Finally we inform all our listening clients about the new message (in Qt speech: emit a signal).
One remaining drawback is, that we need to delete the message after processing it. So we cannot use this signal in queued connections. In my case this is not a problem.

The client (Qt: the receiving slots) can now filter for the right messages:

void handleReceivedMsg(const Message& msg) {
  if (msg.GetTypeName() == "therightmessage") {
    const TheRightMessage& trm = dynamic_cast<const TheRightMessage&>(msg);
    handleTheRightMessage(trm);
  }
}

This code looks very similar to our old code we started with.

Another nice feature of the inclusion of the .proto file is the possibility to create a hash of the contained .proto messages and send it to the server on connect. So we can ensure that both client and server use a compatible .proto declaration.

Of course one could send the message names (instead of the type-ids). But sending long strings instead of short integers is a waste of bandwith. Receiving a fixed size integer is much easier.

As the size of each protobuf message may differ (protobuf put a lot efforts to reduce the size of messages) I consider it a good idea to add a sentinel to each message.

A short demo (using byte buffers to simplify the message transfer part) can be found at https://github.com/valpo/protodemo.