I have serialisation use case which I have not seen described elsewhere on these forums or “the usual suspects” when searching for alternatives.
Whilst I already have a solution, it would be helpful to avoid one part of the process. I also think a cleaner way of implementing what I have, if possible, would be of interest to the community in general and be a common solution for the dozens of other serialisation related queries I have dutifully read through before penning my question.
I have included some background, details of our current solution, and a question; the answer to which may hold the secret to a better implementation.
Our ML/HST database/messaging uses a number of optional plugins which are either chosen from a suite we provide or commissioned by customers who have particular needs but cannot or do not want to create the code themselves. Oddly, whilst we mostly use ProGuard to improve JVM response, customers want the algorithms in their code to be as obscure as possible as they constitute their “secret sauce.”
Note: Naming has been simplified to make the processes simpler to grok but the processes is described in full.
I register messages which require high-speed serialisation by scanning jars, both “built-in” and “dynamic”, and looking for classes which implement an “AutoSerialisable” interface. The registration scanning is done at startup or after receiving a notification the plug-in jars have changed - yes, it must do this on the fly without restarting. A serialisation message includes a class type as a compressed binary id known a priori to other systems - often these are a single byte (think MessagePack etc). Billions of these messages are sent and processed per hour (InfiniBand, PCIe serial messages), so the length of the id can make a significant difference to processing efficiency. Just doubling the length of the identifier can lead to a x4 increase in message decoding time.
During registration a call is made to a registration method with a namespace starting point; I do not want to scan masses of auxiliary code, hence a starting point. Registration may take place at multiple starting points as I, and the customers, do not want to force all messages to live in a specific namespace. On finding a class which implements “AutoSerialisable” it is registered in an in memory repository using an id which is hard-coded into the class using a known static member. If an “AutoSerialisable” class does not meet the requirements for being “AutoSerialisable” (both structure and code signing requirements) it is discarded and an error is logged, security alerts dispatched, but the code continues unless this is a startup scan.
Whilst reading the message stream (which implements self-synchronisation and sentinels for error correction) the ‘id’ is read and a “Create” method called for the required class with the message stream reader as a parameter. The class’ then deserialises itself. (There is steam fast forwarding and message sharding over PCIe lanes etc etc, but that is unimportant here.)
Now to the crux of the matter:
At present I use the ProGuard obfuscation map to generate a simple file which maps the original namespace to the obfuscated namespace. If available, the registration code reads this file and maps the requested namespace onto the obfuscated namespace before searching. This works well but does require the code to be shipped with a matching mapping file. Not ProGuard’s own mapping file, just a map of root namespaces.
I would like to eliminate the need to ship any kind of mapping file by having ProGuard recognise a string as being a namespace and munging it match the obfuscated namespace. I have done something akin to this in the early 2000s when I wrote my own obfuscator for protecting games (back then ProGuard had issues with “Main” class inheritance chains (multiple chained entry points) and only reused obfuscated method names on parameter signatures rather than also on return-type as is possible with the JVM) so I know the concept is sound, but does ProGuard offer such an option?
Feel free to ask questions relevant to solving the use case.