What is a grammar?
Normally you’d expect the term grammar to be used for natural languages or maybe compilers. However, also binary files similarly follow rules that allow some software to read them in a predictable way.
Hexinator grammars are based on the assumption that binary files consist of some sort of structures which themselves comprise numbers, strings, and other elements.
Hexinator and Synalyze It both create and use grammars which represent the internal structures found within various specific types of binary files. These grammars are themselves represented as a particular kind of XML files, used only by Hexinator and Synalyze It. (The XML schema for grammar files is included in the application package.)
But this wouldn’t be sufficient to parse a binary file. The parser has to decide at certain points which structure has to read next if multiple are allowed. Hexinator grammars allow to define “fixed values” for strings, numbers and binary elements that decide if the surrounding structure matches. It’s like a switch/case statement in programming languages with the difference that the structures themselves contain all relevant information. Somehow object oriented, right?
When binary files are parsed, there are not only structures which contain numbers, strings and other elements. Some formats require to jump within a file, a concept supported by “offset elements” in grammars. You can do both absolute (relative to file start) and relative jumps.
So, to sum it up, consider Hexinator grammars as XML schema or RELAX NG for binary files with specialization on binary files with masks, bits, offsets and the ability include Python or Lua scripts.
|XML Schema||Hexinator Grammars|
|Attributes||Elements (numbers, strings, offsets, binary, structure references, scripted)|
|<xs:sequence>||Structure containing other structures|
|<xs:restriction>||Definition of fixed values|
The whole magic is shown in the following picture:
What’s the difference between Synalyze It! and Hexinator?
Technically both are the same and share a proven core. Hexinator is basically a “freemium” version of Synalyze It! available on Linux and Windows.
This means Hexinator offers all base functionality (the hex editor) for free and you pay only what you actually need.
Why didn’t you keep the ‘Synalyze It!’ name?
There are several reasons:
- Program names with spaces or an exclamation mark can cause problems and are harder to type when starting the application from command line
- I got the feedback that the name “Synalyze It!” is hard to remember. Hexinator is closer to “hex editor” and easier to remember
- I wanted to offer the functionality you can get somewhere else for free (in hex editors) also for free. When selling through a store like the Mac App Store this would imply to use in-app purchasing – download for free and pay only for the advanced features. Currently there’s no way to do this transition without annoying the many existing users.
Will ‘Synalyze It!’ and Hexinator be merged into one at a later stage?
There will probably be also a Hexinator version for OS X. The Synalyze It! users will get the same updates.
Of course it would be ideal to have finally only one application however many users enjoy the advantages of the Mac App Store like automatic updates and easy installation so I don’t want to force them to a version they have to download manually.
What is the roadmap for Hexinator?
There are many ideas, suggestions and wishes. However, the most frequently requested feature is comparison on binary and grammar level. So this is the plan for the next future:
- Comparison of two files on structure (grammar) level
- An SDK that allows to use grammars within custom Python scripts
While implementing this there will be also some grammars published for important file formats like PDF, PE (Windows executables) or Java classes.
Will there be a FreeBSD version?
Technically it would probably be relatively easy. All base libraries of Hexinator are written in plain portable C code and also the integrated open source libraries would work on FreeBSD.
If there would be enough users who commit to buy a FreeBSD version I’d really consider it.
Where can I find grammars for Hexinator?
The grammars are hosted on the Synalyze It! web site: https://www.synalysis.net/formats.xml
Currently there are about 60 grammars. For the future it’s planned to create more grammars relevant for malware analysis and computer forensics.
How can I contribute new grammars?
Simply send me the grammar XML file to email@example.com and I’ll publish it. If you want I’ll even set a link to a web site of your choice.
Can I use one license on Linux and Windows?
Yes, you can activate the license on up to three machines, be it Linux or Windows.
How can I integrate custom text encodings
Hexinator supports many text encodings by default however there might still be one missing. Hexinator uses the ICU library for text conversion so custom text encodings for ICU work as well in Hexinator. Here’s how you do it (see also Generating a new code page converter):
- Create a .ucm file with your text encoding (more info in the ICU User Guide)
- Convert the .ucm file to .cnv
- Copy the .cnv file to the code page directory – on Windows C:\Users\%USER%\AppData\Local\Synalysis\Hexinator\icudt53l and on Linux ~/.local/share/Synalysis/Hexinator/icudt53l
- After the next start your encoding should be selectable
How can I enhance a grammar with scripting?
There are different ways how you can use a script in a grammar:
1. a scripted (“custom”) data type — here the script translates some bytes to a string or back (if the user changed a value). This script is edited in the script editor
2. a script element in a structure. This script is edited in the grammar editor (in the element’s properties)
In a script element you have again the choice:
1. parse structures and values on your own and add them to the “results” tree structure
2. call the parser to map a certain structure or element (at a certain or the current position) – the parser will add the parsing results to the tree
You can split your logic into multiple scripts. Maybe one script collects only data and another processes it later
These grammars could be helpful to understand the possibilities: (on https://www.synalysis.net/formats.xml)
1. ZIP – here a Lua script gets a structure from the grammar (by name) and lets the parser process it at a certain position (file length – 22). The third script in the grammar maps a structure at the current parsing position. There are some functions for binary operations because Lua didn’t support them when I created the grammar (Python is here better, don’t know if Lua supports them now)
2. JPEG – here you find a mixture of direct file reading (“byteView.readByte()”) with usage of the parser (“mapElementWithSize()”)
3. PYC (Python byte code) – here are scripts that translate a time stamp or long object value or call the parser to map a structure at a certain file offset (“mapStructureAtPosition”)