Andys Binary Folding Editor

Introduction

Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities.

This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the structures within them. BE is particularly suited to displaying non-variable length structures within the files.

This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps.

Command line arguments

  usage: be [-w width] [-h height] [-c colscheme]
            [-i inifile] {-I incpath} {-D symbol}
            [-d defn] [-a addr]
            [-y symfile] {binfile[@addr]}
  flags: -w width      screen width
         -h height     screen height
         -c colscheme  set colour scheme (0 to 3, default: 0)
         -i inifile    override default initialisation file
         -I incpath    append include path(s) for use by inifile
         -D symbol     pre-$define symbol(s) for use by inifile
         -d defn       initial definition to use (default: main)
         -a addr       initial address to use (default: 0)
         -y symfile    input symbol table file
         binfile@addr  binary file(s) (with optional address, default: 0)

The -w and -h arguments can be used to try to override the current screen size. This doesn't work on UNIX or Win32, but does on OS/2. The -c argument allows you to choose from a small selection of colour schemes.

The -i flag overrides the default initialisation file.

The -I flag affects the operation of the include command in the initialisation file.

The -D flag allows the definition of symbols which may be accessed via the $ifdef and similar directives in the initialisation file.

The initial structure definition and address to decode may be overridden with the -d and -a flags. Normally BE starts by looking up the definition of a 'main' structure, and decoding the data at address 0 as such.

A symbol table may be specified using the -y flag. Each line of the symbol table is of the form :-

  symbolname    472484aa

Note that the address is in hex, and not 0x preceeded. This conveniently matches the symbol table layout generated by the ARM linker.

Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.

Typical invokations of BE might be :-

  be -y gizmo.sym gizmo.rom gizmo.ram@0x8000

  be picture.bmp

The initialisation file

One of the first thing BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.

Under OS/2 or Windows, BE finds the initialisation file by searching along the path for an .INI file with the same name. Under UNIX, BE looks for ~/.berc, (or ~/.xxrc if the be executable is renamed to xx). BE can be made to look elsewhere using the -i command line option.

This initialisation file may contain C or C++ style comments.

Also, $define, $ifdef, $ifndef, $else, $endif and $error are supported, as a form of a pre-processing/conditional processing step. The -D command line option may be used to pre-$define such conditional processing symbols.

If BE is running on OS/2, then OS2 is pre-$defined. If running on Windows NT, then WIN32 is pre-$defined. If running on a type of UNIX, then UNIX is pre-$defined. If running specifically on AIX, then AIX is pre-$defined. Either BE or LE will be pre-$defined, depending upon whether BE is running on a big-endian or little-endian machine. These $defines allow you to write initialisation files with sensible defaults, relevant for the current environment.

An include directive is supported, and included files will be searched for by looking in the current directory, then along an internal include path, and finally along the PATH environment variable. The internal include path is usually empty, but may be appended to by the use of the -I command line option.

The initialisation file contains commands to set the default data display attributes, structure definitions, alignment declarations and include statements.

As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.

Numbers

Wherever the initialisation file calls for a number, the following variants may be used :-

number
The number can be signed, specified in binary (eg: 0b1101), octal (eg: 0o15), decimal (eg: 13) or hex (eg: 0x0d).
addr "symbolinthesymboltable"
if a symbol table is loaded, and the symbol can be found then the result is the numeric value of the symbol. Otherwise a warning is generated, and the result is the number 0xffffffff.
sizeof DEFN
this gives the size in bytes of the earlier defined structure DEFN. If DEFN isn't already defined, then an error results.
map MAPNAME "mapletstring"
this gives the numeric value that corresponds to the given string defined in the map defintion, as explained below.

Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings, listed highest priority first :-

  eg: addr "tablebase" + 4 * sizeof RGB

Such numeric expressions can be used when BE prompts for a number.

Commands to set the default data display attributes

When the program starts parsing the initialisation file, the default data display attributes are le unsigned hex nomul abs nonull nocode nolj noseg nozterm.

To change this default setting, just include one or more of the following keywords in the file :-

Map definitions

These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-

  map compression_type
    {
    "uncompressed" 1
    "huffman"      2
    "lzw"          3
    }

If the numeric value on display matches the value given, then it can be converted to the textual description.

Bitfields may be acheived in the following fashion :-

  map pending_events
    {
    "reconfiguration" 0x0001 : 0x0001
    "flush_cache"     0x0002 : 0x0002
    "restart_io"      0x0004 : 0x0004
    }

The : symbol introduces an additional mask. The number to string conversion algorithm inside BE works like this :-

  for each maplet in the map
    if ( value & maplet.mask ) == maplet.value then
      display the maplet.name
  if some unexplained bits left over then
      display the remaining value in hex

So, it is possible to have multiple field decodes from a single value :-

  map twobitfields
    {
    "green" 0x0001 : 0x000f
    "blue"  0x0002 : 0x000f
    "red"   0x0003 : 0x000f
    "small" 0x0100 : 0x0f00
    "large" 0x0200 : 0x0f00
    }

The value 0x0243 would be converted to red|large|0x40.

It has been alluded to above, that when supplying numeric expressions, the map keyword may also be used. In the following example, the expression evaluates to 0x0105 :-

  map twobitfields "small" + 5

Structure definitions

Structures are a list of at OFFSET clauses, align ALIGNMENT clauses and field definitions. When the structure definition is processed, then the current-offset is initialised to 0.

An at OFFSET clause moves the current-offset to the specified numeric value.

An align ALIGNMENT clause moves the current-offset to be the next integer multiple of the specified numeric value.

A field definition defines a field which lives at the current-offset into the structure. After definition of the field, the current-offset is moved to the end of the field, so that the next field will immediately follow it (unless another at OFFSET clause is used, or a union is being defined).

The size of the structure is the largest value that the current-offset ever attains. This is the value returned whenever sizeof DEFN is used as a number.

Duplicate definitions of the same named structure are not allowed.

A structure definition may have zero or more fields, align ALIGNMENT clauses and/or at OFFSET clauses.

A structure definition may behave like a C struct definition, in that each field follows on from the previous one in memory. Or it may behave like a C union definition, in that all fields overlay each other in memory, and the total size is the size of the largest field.

  def A_STRUCTURE struct
    {
    n32 "first field, bytes 0 to 3"
    n32 "next field, bytes 4 to 7"
      // sizeof A_STRUCTURE is 8
    }
  def A_UNION union
    {
    n32 "first field, bytes 0 to 3"
    n16 "second field, bytes 0 to 1"
      // sizeof A_UNION is 4
    }

The keyword struct is unnecessary, and may be omitted.

These may be combined, like in the following :-

  def MY_COMPLICATED_STRUCTURE
    {
    n32 "first field, occupying bytes 0 to 3"
    union
      {
      n32 "second field, occupying bytes 4 to 7"
      struct
        {
        n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5"
        n8  "the upper middle byte, occupying byte 6"
        n8  "the top byte, occupying byte 7"
        }
      }
    }

The at OFFSET clause also allows the same areas of a structure to be displayed in more than one way, thus also allowing the implementation of unions :-

  def UNION_THE_HARD_WAY
    {
    n32 le  "first value, bytes 0 to 3"
    at 0 n8 "the lower byte, byte 0"
      // sizeof UNION_THE_HARD_WAY is 4
    }

Field definitions

Here are some examples of field definitions :-

  n8 asc "initial"
  n8 buf 20 "surname"
  n16 be unsigned dec "age"
  3 pet "pet names"
  3 n16 be unsigned dec "pet costs"
  2 n32 le unsigned hex ptr person "2 pointers to parents"
  2 n32 ptr person null "2 pointers, null legal"
  person "a person"
  n32 sym code "__main"
  1024 n32 unsigned dec "memory as 32 bit words"
  9 n16 map errorcodes "results"
  buf 100 asc zterm "a C style string"
  GENERIC_POINTER suppress

Each example is of the form :-

  optional-count type optional-attrs name

The field describes count data items of the specified type, count is restricted to being >= 1, and if it is > 1, then the field is initially displayed by just showing its type (eg: 10 n32 le unsigned hex "numbers"). When you select the field, you are presented with an element list, with count lines, from which you can select the element you are interested in.

The type of the data is one of n8, n16, n24, n32, buf N or DEFN, where DEFN is the name of a previously defined structure. This type may be considered to be the way in which BE is told the size of the data item concerned. n8, n16, n24 and n32 mean 8, 16, 24 or 32 bit numeric data item. buf N means a buffer of N bytes.

The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.

In addition to the data display attribute keywords given above is the map MAP attribute which means display the numeric field by looking up a textual equivelent of the numeric value using the mapping which must have previously been defined.

The ptr DEFN attribute says that the numeric value is in fact a pointer to a structure of type DEFN. DEFN need not be defined yet in the initialisation file. The mul/nomul attribute described above specifies whether to multiply the pointer value by the size of the data item being pointed to. The null/nonull attribute described above specifies whether this pointer may be followed if the numeric value is 0. The keyword add BASE may be used. Also, the rel/abs attribute described above specifies whether to add the address of the pointer itself to the numeric value. By using combinations of the pointer keywords, various effects may be acheived :-

n32 ptr DEFN abs
fetch pointer value, and decode DEFN at that address. This case is very common for file format decoding and memory dumps.
n32 ptr DEFN add 0x40000 abs
fetch pointer value, add 0x40000, and decode DEFN at that address. This case can be used to handle multiple memory space problems.
n32 ptr DEFN mul add addr "table" abs
fetch pointer value, multiply by the size of a DEFN, add the address of the table (as determined from the symbol table), and decode the DEFN at that address. This case is typical for when the pointer is in fact a table index.
n32 ptr DEFN rel
fetch pointer value, add address of the pointer itself, and decode the DEFN at that address. When a file consists of a list of variable length structures, where the first field is the size of the structure, this provides a handy way to skip past it to the next.
n32 ptr DEFN add 8 rel
fetch pointer value, add address of the pointer itself, add the numeric value 8 (this can be negative), and decode the DEFN at that address. This case is common for when one structure includes a field which identifies an amount of data to skip before the next structure is seen.
n32 le ptr DEFN abs seg
fetch pointer value (explicitly in little endian order), mangle pointer to account for 16:16 segmented mode, and decode the DEFN at that address.

The procedure for following pointers is :-

  1. fetch pointers numeric value
  2. if nonull and pointer is 0, then don't follow the pointer.
  3. if mul, then multiply the pointer value by the size of the item being pointed to.
  4. if add BASE, then add BASE to the pointer value.
  5. if rel, then add the address of the pointer itself.
  6. if seg, then mangle pointer address to account for the 16:16 segmented mode of x86 processors.
  7. decode and display data item at resultant address.

The seg keyword works by taking the top 16 bits of the pointer value as the segment, the bottom as the offset, and producing a new pointer value which is segment*16+offset. This feature may be of use for decoding large memory model program dumps which have been running on x86 processors running in real mode, or a 16:16 protected mode with a linear selector mapping. Anyone with a sensible file format to decode, or a dump taken from the memory space of a processor of a sensible architecture, can ignore this feature.

The keyword open may be given and this has the effect of increasing the level of detail that is initially displayed. See the description of the level of detail of display feature later in this document. This feature has its problems (bugs), but can be used to ensure that small arrays and short structures are displayed in full without the user having to manually increase the level of detail by hand.

Also, the suppress keyword may be used. Normally all fields are shown when a definition is being viewed, but some can be marked as suppressed. When a whole definition is shown on one line (by expanding the level of detail of display), those fields marked with suppress, are not shown.

Finally the name of the field must be given.

Alignment declarations

Normally, when parsing a structure definition, each field is positioned immediately after the one before (unless the union, align, or at keywords are used).

When BE begins processing the initialisation file, it believes that all n8, n16, n24 and n32 variables should be aligned on a 1 byte boundary. In other words, no special alignment is to be automatically performed.

This is radically different from the way the high level languages such as C lay out the fields within their structures. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-

  def ALIGNED_USING_MANUAL_PADDING
    {
    n8 "fred"
    buf 3 "padding to align bill on a 4 byte boundary"
    n32 "bill"
    }

Or alternatively, the align keyword could be used :-

  def ALIGN_USING_align_KEYWORD
    {
    n8 "fred"
    align 4
    n32 "bill"
    }

It is possible to tell BE to automatically align n8, n16, n24 or n32 fields on specific byte (offset) boundaries by constructs such as the following (which corresponds to many 32 bit C compilers) :-

  align n16 2
  align n32 4

  def ALIGNED_AUTOMATICALLY
    {
    n8 "fred"
    n32 "bill"
    }

Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an extension, or doing post-mortem examination of program dumps.

Most data file formats don't-need-to and/or don't-bother-to align their fields.

Include directives

The initialisation file can contain the following, as long as it is outside of any other definition :-

  include "anotherfile.ini"

A sample initialisation file

Here is a snippet from a real initialisation file :-

le unsigned hex abs // set defaults, just to be sure
lj // allow ARM specific symbolic lookup of code addresses

map DE_
  {
  "DP_Pending" -1
  "DS_Success"  0
  "DE_Failure"  1
  }

def DPB
  {
  n32 ptr DPB       "DPB_Next   "
  n32 sym code      "DPB_Address"
  n8 map DC_        "DPB_Number "
  n8                "DPB_Flag2  "
  n8 map SY_        "DPB_Flag   "
  n8 signed map DE_ "DPB_Dsb    "
  n32               "DPB_Safety "
  }

def NOP
  {
  DPB     "NOP_Header"
  n8      "NOP_Spare1"
  n8      "NOP_Spare2"
  n8      "NOP_Spare3"
  n8 dec  "NOP_Period"
  n32 dec "NOP_Value "
  CLK     "NOP_Clock "
  }

def main // the entire memory map
  {
  at addr "noptable"   100 NOP     "noptable  "
  at addr "currentdpb" n32 ptr DPB "currentdpb"
  }

The supplied initialisation file

The supplied initialisation file contains enough definitions to enable you to examine the contents of many image file formats.

These include Windows / OS/2 Bitmaps, Targa files, KIPS files, ZSoft PCX, M-Motion Video, TIFF, ILBM IFF, Compu$erve GIF, RiscOS sprite, IBM PSEG, and OS/2 resource files.

The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.

Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.

Using the editor

Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home and End all work in the obvious ways, traversing the list on display. The Wordstar keys ^E, ^X, ^R, ^C, ^W and ^Z also work.

BE displays the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.

q or @X (ie: Alt+X) exits the program. If you have made any changes, you will be prompted as to whether BE should write them out to disk. On machines which support it, @W can write out any unsaved changes.

Esc exits the current screen back to the previous screen.

p allows you to 'print' the list on display to a file. You can specify the filename, and whether to append to or overwrite any existing file of that name.

f allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n can be used to repeat the last find. If a find is taking a long time, it may be interrupted using Ctrl+Break on OS/2 and Windows (unfortunately not on UNIX though).

i allows you to generate a display which only has lines which include a pattern you specify. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern. Esc exits back to the original display.

The keys A,O,L,I toggle the display of addresses, offsets lengths and array indices. On machines which support it, @B, @O, @D and @H may be used to set the display mode of the array indices to binary, octal, decimal or hex. Also, on machines which support it, @Y toggles the display of addresses between raw hex, and symbol table entry and offset.

The t will decode the current field as if it were raw ASCII text, and will break it up into lines upon CR, LF or CR-LF pair boundarys.

The r key causes a refresh. BE re-fetches all the data on display. The R is a slightly more aggressive form of refresh. If an extension providing data to BE was caching data, this type of refresh causes it to drop its cache.

g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.

s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.

A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic or via a mapping table.

z is displayed if you are allowed to toggle the stop displaying when a nul terminator is found attribute.

+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.

Increasing the level of display can make BE open up an array, and enumerate the elements. eg: 3 n32 to [123,123,456].

Increasing the level of display can also make BE open up a definition, and display the fields. eg: VAR to {"name",123}.

This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.

Some examples :-

level 0 (=type) level 1 level 2 level 3
n32 7 7 7
3 n32 3 n32 [8,9,10] [8,9,10]
VAR VAR {"a",1} {"a",1}
2 VAR 2 VAR [VAR,VAR] [{"b",2},{"c",3}]
n16 ptr VAR 22->VAR 22->{"d",4} 22->{"d",4}
2 n8 ptr VAR 2 n8 ptr VAR [33->VAR,44->VAR] [33->{"e",5},44->{"f",6}]

Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. The Esc key brings you back to where you are now.

Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then decode the memory at the given address as if it were of the specified structure type.

The = key may be used to edit the current field on display.

If the current field is a numeric value, then you can type a new expression, according to the rules for numbers and expressions used when parsing the initialisation file. Examples include :-

  1
  1+2
  addr "symbol"
  sizeof RGBTRIPLE
  map FF_ "FF_Split" | 0x20

If the current field is a buffer, then either ASCII data or raw hex bytes may be supplied :-

  "a string within quotes"
  @1234FF00

If the zterm attribute is applicable to the current field, then after the data is stored, a NUL terminator is appended.

Extensions

The binary file arguments to BE are normally of the form :-

  filename[@address]

This tells BE to load the file and whenever data at a memory address from address to address+filelength is accessed, to supply the data from the file.

However, it is possible to supply binary file arguments of the form :-

  extension!args[@address]

Under OS/2, BE will ensure that BEextension.DLL is loaded. This DLL should be on the LIBPATH and should contain certain entrypoints which will be used by BE. BE then passes the args and address to the extension DLL, who does something of its own chosing with them. The extension DLL can then supply data to BE on request.

Under Windows, provision for extension DLLs is also exists. The DLL is located according to the algorithm used by the Win32 LoadLibrary API.

Under AIX, extensions may be provided as shared librarys. They are located by following the PATH environment variable, and are named beextension.

One use of this might be the provision of an extension for handling files too massive to load into memory all at once. The extension could open a file handle and read bytes demanded by BE upon request. This extension could be provided in BEBIGFIL.DLL, and the user could type :-

  be bigfil!verybigfile.dat

Another use might be in live-debug of adapter cards. The extension would provide data bytes from the memory space of the adapter. args could be used to identify the slot the adapter is in.

Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs.

Also, the surface of a disk or block device could be made accessible via an extension.

Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.

The file bememext.h documents the extension interface. Currently extensions may only be built for the OS/2 or version of BE using the IBM VisualAge C++ compiler, the Win32 version of BE using MS Visual C++, or the AIX version of BE using the IBM xlC++ compiler.

I anticipate learning about shared library support on the various different types of UNIX, enabling similar tricks to be performed there. Apparently this area is becoming more standardised, with the new dlopen, dlsym and dlclose entrypoints.

Flushing

When editing files, changes to the data are recorded in memory. When BE is closed down, it attempts to write back any changes back into the disk files where the data originally came from. BE will prompt you as to whether to save the changes back to disk.

If an extension is providing the data to BE for display, and the extension supports modification of the data, it has a choice :-

As most extensions provide a live view of some real-time data, most opt for the first choice.

Installation

BE can be found on the Hobbes FTP site.

The usual supplied be.zip file should be expanded using unzip be or pkunzip -d be on an OS/2 or Windows machine.

You get a selection of executables, and the one to pick depends upon which operating system you wish to run :-

be_os2.exe
Runs on 32 bit OS/2.
be_win.exe
Runs on Windows NT.
be
Runs on AIX.

Installing BE for OS/2

  1. Copy be_os2.exe to be.exe, somewhere on the path.
  2. Copy be.ini to the same directory as be.exe so it can be found.
  3. Optionally copy be.htm to wherever you keep documentation.
  4. Optionally copy be.ico to the same directory as be.exe. This allows BE to have a cute icon when running in the Workplace shell.
  5. Optionally create a Workplace Shell Program Object(s) that references the BE executable. The working directory should be the directory where be.ini can be found.

Installing BE for Windows NT

  1. Copy be_win.exe to be.exe, somewhere on the path.
  2. Copy be.ini to the same directory as be.exe so it can be found.
  3. Optionally copy be.htm to wherever you keep documentation.

Installing BE for UNIX, ie: AIX

  1. Copy the be executable to somewhere like /usr/bin or ~/bin, or wherever on the path you consider appropriate.
  2. Copy the be.ini to .berc in your home directory, or make a soft link to a common .berc somewhere from your home directory.
  3. Optionally copy be.htm to wherever you keep documentation.

Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.

Copying

Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.


This documentation is written and maintained by the Binary Editor author, Andy Key
nyangau@aladdin.co.uk