Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities.
This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the structures within them. BE is particularly suited to displaying non-variable length structures within the files.
This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps.
usage: be [-w width] [-h height] [-c colscheme] [-i inifile] {-I incpath} {-D symbol} [-d defn] [-a addr] [-y symfile] {binfile[@addr]} flags: -w width screen width -h height screen height -c colscheme set colour scheme (0 to 3, default: 0) -i inifile override default initialisation file -I incpath append include path(s) for use by inifile -D symbol pre-$define symbol(s) for use by inifile -d defn initial definition to use (default: main) -a addr initial address to use (default: 0) -y symfile input symbol table file binfile@addr binary file(s) (with optional address, default: 0)
The -w
and -h
arguments can be used to try
to override the current screen size.
This doesn't work on UNIX or Win32, but does on OS/2.
The -c
argument allows you to choose from a small selection
of colour schemes.
The -i
flag overrides the default initialisation file.
The -I
flag affects the operation of the
include command in the
initialisation file.
The -D
flag allows the definition of symbols which may be
accessed via the $ifdef
and similar directives in the
initialisation file.
The initial structure definition and address to decode may be overridden
with the -d
and -a
flags.
Normally BE starts by looking up the definition of a 'main' structure,
and decoding the data at address 0 as such.
A symbol table may be specified using the -y
flag.
Each line of the symbol table is of the form :-
symbolname 472484aa
Note that the address is in hex, and not 0x
preceeded.
This conveniently matches the symbol table layout generated by the ARM
linker.
Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.
Typical invokations of BE might be :-
be -y gizmo.sym gizmo.rom gizmo.ram@0x8000 be picture.bmp
One of the first thing BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.
Under OS/2 or Windows, BE finds the initialisation file by searching
along the path for an .INI
file with the same name.
Under UNIX, BE looks for ~/.berc
,
(or ~/.xxrc
if the be executable is renamed to xx
).
BE can be made to look elsewhere using the -i
command line option.
This initialisation file may contain C or C++ style comments.
Also, $define
, $ifdef
, $ifndef
,
$else
, $endif
and $error
are supported,
as a form of a pre-processing/conditional processing step.
The -D
command line option may be used to
pre-$define
such conditional processing symbols.
If BE is running on OS/2, then OS2
is
pre-$define
d.
If running on Windows NT, then WIN32
is
pre-$define
d.
If running on a type of UNIX, then UNIX
is
pre-$define
d.
If running specifically on AIX, then AIX
is
pre-$define
d.
Either BE
or LE
will be
pre-$define
d, depending upon whether BE is running on a
big-endian or little-endian machine.
These $define
s allow you to write initialisation files with
sensible defaults, relevant for the current environment.
An
include directive is supported, and included files
will be searched for by looking in the current directory, then along an
internal include path, and finally along the PATH
environment
variable.
The internal include path is usually empty, but may be appended to by the
use of the -I
command line option.
The initialisation file contains commands to set the default data display attributes, structure definitions, alignment declarations and include statements.
As BE processes the initialisation file, it generates warnings
(such as undefined symbol table symbol), and error messages
into an internal buffer.
If there are no errors, then this buffer is discarded.
If there are errors, then all the warnings and errors are listed,
and BE aborts.
Numbers
Wherever the initialisation file calls for a number, the following variants may be used :-
number
0b1101
),
octal (eg: 0o15
),
decimal (eg: 13
) or
hex (eg: 0x0d
).
addr "symbolinthesymboltable"
0xffffffff
.
sizeof DEFN
map MAPNAME "mapletstring"
Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings, listed highest priority first :-
+
, -
, ~
, !
/
, *
, %
, &
+
, -
, |
, ^
eg: addr "tablebase" + 4 * sizeof RGB
Such numeric expressions can be used when BE prompts for a number.
Commands to set the default data display attributes
When the program starts parsing the initialisation file, the default data
display attributes are
le unsigned hex nomul abs nonull nocode nolj noseg nozterm
.
To change this default setting, just include one or more of the following keywords in the file :-
be
- read multibyte values from memory in a big-endian fashion.
le
- read multibyte values from memory in a little-endian fashion.
signed
- when fetching numeric values sign extend them,
and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned
- when fetching numeric values zero extend them,
and when displaying numerically show 'unsignedvalue'.
asc
- set display mode to ASCII.
ebc
- set display mode to EBCDIC.
bin
- set display mode to binary.
oct
- set display mode to octal.
dec
- set display mode to decimal.
hex
- set display mode to hex.
sym
- set display mode to symbolic.
ie: look up the value in the symbol table, and if found, display
symbol+hexoffset, else display value in hex.
null
- allow following of 0 pointers.
nonull
- disallow following of 0 pointers.
seg
- cope with 16:16 segmented pointers.
noseg
- pointers are not segmented.
mul
- pointer values should be multiplied by the size of
the data type being pointed to.
nomul
- pointer values are given in regular byte addresses.
abs
- pointer values are absolute.
rel
- pointer values are to be considered relative to their
own addresses.
code
- specify that numeric value is actually a code address.
nocode
- specify that numeric value is not a code address.
lj
- perform ARM specific long-jump interpretation of code addresses.
nolj
- don't do long-jump interpretation.
zterm
- stop displaying buf
data when a nul
terminator is reached.
nozterm
- display data beyond nul terminators.
These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-
map compression_type { "uncompressed" 1 "huffman" 2 "lzw" 3 }
If the numeric value on display matches the value given, then it can be converted to the textual description.
Bitfields may be acheived in the following fashion :-
map pending_events { "reconfiguration" 0x0001 : 0x0001 "flush_cache" 0x0002 : 0x0002 "restart_io" 0x0004 : 0x0004 }
The :
symbol introduces an additional mask.
The number to string conversion algorithm inside BE works like this :-
for each maplet in the map if ( value & maplet.mask ) == maplet.value then display the maplet.name if some unexplained bits left over then display the remaining value in hex
So, it is possible to have multiple field decodes from a single value :-
map twobitfields { "green" 0x0001 : 0x000f "blue" 0x0002 : 0x000f "red" 0x0003 : 0x000f "small" 0x0100 : 0x0f00 "large" 0x0200 : 0x0f00 }
The value 0x0243
would be converted to
red|large|0x40
.
It has been alluded to above, that when supplying numeric expressions,
the map
keyword may also be used.
In the following example, the expression evaluates to 0x0105
:-
map twobitfields "small" + 5
Structures are a list of at OFFSET
clauses,
align ALIGNMENT
clauses and field definitions.
When the structure definition is processed, then the current-offset is
initialised to 0.
An at OFFSET
clause moves the current-offset to the specified
numeric value.
An align ALIGNMENT
clause moves the current-offset to be
the next integer multiple of the specified numeric value.
A
field definition defines a field which lives at the
current-offset into the structure.
After definition of the field, the current-offset is moved to the end of
the field, so that the next field will immediately follow it
(unless another at OFFSET
clause is used, or
a union
is being defined).
The size of the structure is the largest value that the current-offset
ever attains.
This is the value returned whenever sizeof DEFN
is used as a
number.
Duplicate definitions of the same named structure are not allowed.
A structure definition may have zero or more fields,
align ALIGNMENT
clauses and/or at OFFSET
clauses.
A structure definition may behave like a C struct
definition, in that each field follows on from the previous one in memory.
Or it may behave like a C union
definition, in that all fields
overlay each other in memory, and the total size is the size of the
largest field.
def A_STRUCTURE struct { n32 "first field, bytes 0 to 3" n32 "next field, bytes 4 to 7" // sizeof A_STRUCTURE is 8 }
def A_UNION union { n32 "first field, bytes 0 to 3" n16 "second field, bytes 0 to 1" // sizeof A_UNION is 4 }
The keyword struct
is unnecessary, and may be omitted.
These may be combined, like in the following :-
def MY_COMPLICATED_STRUCTURE { n32 "first field, occupying bytes 0 to 3" union { n32 "second field, occupying bytes 4 to 7" struct { n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5" n8 "the upper middle byte, occupying byte 6" n8 "the top byte, occupying byte 7" } } }
The at OFFSET
clause also allows the same areas of a
structure to be displayed in more than one way, thus also allowing the
implementation of unions :-
def UNION_THE_HARD_WAY { n32 le "first value, bytes 0 to 3" at 0 n8 "the lower byte, byte 0" // sizeof UNION_THE_HARD_WAY is 4 }
n8 asc "initial" n8 buf 20 "surname" n16 be unsigned dec "age" 3 pet "pet names" 3 n16 be unsigned dec "pet costs" 2 n32 le unsigned hex ptr person "2 pointers to parents" 2 n32 ptr person null "2 pointers, null legal" person "a person" n32 sym code "__main" 1024 n32 unsigned dec "memory as 32 bit words" 9 n16 map errorcodes "results" buf 100 asc zterm "a C style string" GENERIC_POINTER suppress
Each example is of the form :-
optional-count type optional-attrs name
The field describes count data items of the specified type,
count is restricted to being >= 1, and if it is > 1, then
the field is initially displayed by just showing its type
(eg: 10 n32 le unsigned hex "numbers"
).
When you select the field, you are presented with an element list, with
count lines, from which you can select the element you are interested in.
The type of the data is one of
n8
, n16
, n24
, n32
,
buf N
or DEFN
, where DEFN is the name of a
previously defined structure.
This type may be considered to be the way in which BE is told the size
of the data item concerned.
n8
, n16
, n24
and n32
mean 8, 16, 24 or 32 bit numeric data item.
buf N
means a buffer of N bytes.
The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.
In addition to the data display attribute keywords given above is the
map MAP
attribute which means display the numeric field by
looking up a textual equivelent of the numeric value using the
mapping which must have previously been defined.
The ptr DEFN
attribute says that the numeric value is in fact
a pointer to a structure of type DEFN.
DEFN need not be defined yet in the initialisation file.
The mul
/nomul
attribute described above
specifies whether to multiply the pointer value by the size of the data item
being pointed to.
The null
/nonull
attribute described above specifies
whether this pointer may be followed if the numeric value is 0.
The keyword add BASE
may be used.
Also, the rel
/abs
attribute described above specifies
whether to add the address of the pointer itself to the numeric value.
By using combinations of the pointer keywords, various effects may be
acheived :-
n32 ptr DEFN abs
n32 ptr DEFN add 0x40000 abs
n32 ptr DEFN mul add addr "table" abs
n32 ptr DEFN rel
n32 ptr DEFN add 8 rel
n32 le ptr DEFN abs seg
The procedure for following pointers is :-
nonull
and pointer is 0, then don't follow the pointer.
mul
, then multiply the pointer value by the size of
the item being pointed to.
add BASE
, then add BASE to the pointer value.
rel
, then add the address of the pointer itself.
seg
, then mangle pointer address to account for
the 16:16 segmented mode of x86 processors.
The seg
keyword works by taking the top 16 bits of the
pointer value as the segment, the bottom as the offset, and producing
a new pointer value which is segment*16+offset.
This feature may be of use for decoding large memory model program dumps
which have been running on x86 processors running in real mode, or a 16:16
protected mode with a linear selector mapping.
Anyone with a sensible file format to decode, or a dump taken from the
memory space of a processor of a sensible architecture, can ignore this
feature.
The keyword open
may be given and this has the effect
of increasing the level of detail that is initially displayed.
See the description of the level of detail of display feature later
in this document.
This feature has its problems (bugs), but can be used to ensure
that small arrays and short structures are displayed in full without the
user having to manually increase the level of detail by hand.
Also, the suppress
keyword may be used.
Normally all fields are shown when a definition is being viewed, but
some can be marked as suppressed.
When a whole definition is shown on one line (by expanding the level of
detail of display), those fields marked with suppress, are not shown.
Finally the name of the field must be given.
Alignment declarations
Normally, when parsing a
structure definition, each field is positioned immediately
after the one before (unless the union
, align
,
or at
keywords are used).
When BE begins processing the initialisation file, it believes that all
n8
, n16
, n24
and n32
variables should be aligned on a 1 byte boundary.
In other words, no special alignment is to be automatically performed.
This is radically different from the way the high level languages such as C lay out the fields within their structures. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-
def ALIGNED_USING_MANUAL_PADDING { n8 "fred" buf 3 "padding to align bill on a 4 byte boundary" n32 "bill" }
Or alternatively, the align
keyword could be used :-
def ALIGN_USING_align_KEYWORD { n8 "fred" align 4 n32 "bill" }
It is possible to tell BE to automatically align
n8
, n16
, n24
or n32
fields on specific byte (offset) boundaries by constructs such as
the following (which corresponds to many 32 bit C compilers) :-
align n16 2 align n32 4 def ALIGNED_AUTOMATICALLY { n8 "fred" n32 "bill" }
Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an extension, or doing post-mortem examination of program dumps.
Most data file formats don't-need-to and/or don't-bother-to align their
fields.
Include directives
The initialisation file can contain the following, as long as it is outside of any other definition :-
include "anotherfile.ini"
Here is a snippet from a real initialisation file :-
le unsigned hex abs // set defaults, just to be sure lj // allow ARM specific symbolic lookup of code addresses map DE_ { "DP_Pending" -1 "DS_Success" 0 "DE_Failure" 1 } def DPB { n32 ptr DPB "DPB_Next " n32 sym code "DPB_Address" n8 map DC_ "DPB_Number " n8 "DPB_Flag2 " n8 map SY_ "DPB_Flag " n8 signed map DE_ "DPB_Dsb " n32 "DPB_Safety " } def NOP { DPB "NOP_Header" n8 "NOP_Spare1" n8 "NOP_Spare2" n8 "NOP_Spare3" n8 dec "NOP_Period" n32 dec "NOP_Value " CLK "NOP_Clock " } def main // the entire memory map { at addr "noptable" 100 NOP "noptable " at addr "currentdpb" n32 ptr DPB "currentdpb" }
The supplied initialisation file contains enough definitions to enable you to examine the contents of many image file formats.
These include Windows / OS/2 Bitmaps, Targa files, KIPS files, ZSoft PCX, M-Motion Video, TIFF, ILBM IFF, Compu$erve GIF, RiscOS sprite, IBM PSEG, and OS/2 resource files.
The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.
Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.
Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home and End all work in the obvious ways, traversing the list on display. The Wordstar keys ^E, ^X, ^R, ^C, ^W and ^Z also work.
BE displays the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.
q or @X (ie: Alt+X) exits the program. If you have made any changes, you will be prompted as to whether BE should write them out to disk. On machines which support it, @W can write out any unsaved changes.
Esc exits the current screen back to the previous screen.
p allows you to 'print' the list on display to a file. You can specify the filename, and whether to append to or overwrite any existing file of that name.
f allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n can be used to repeat the last find. If a find is taking a long time, it may be interrupted using Ctrl+Break on OS/2 and Windows (unfortunately not on UNIX though).
i allows you to generate a display which only has lines which include a pattern you specify. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern. Esc exits back to the original display.
The keys A,O,L,I toggle the display of addresses, offsets lengths and array indices. On machines which support it, @B, @O, @D and @H may be used to set the display mode of the array indices to binary, octal, decimal or hex. Also, on machines which support it, @Y toggles the display of addresses between raw hex, and symbol table entry and offset.
The t will decode the current field as if it were raw ASCII text, and will break it up into lines upon CR, LF or CR-LF pair boundarys.
The r key causes a refresh. BE re-fetches all the data on display. The R is a slightly more aggressive form of refresh. If an extension providing data to BE was caching data, this type of refresh causes it to drop its cache.
g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.
s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.
A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic or via a mapping table.
z is displayed if you are allowed to toggle the stop displaying when a nul terminator is found attribute.
+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.
Increasing the level of display can make BE open up an array,
and enumerate the elements.
eg: 3 n32
to [123,123,456]
.
Increasing the level of display can also make BE open up a
definition, and display the fields.
eg: VAR
to {"name",123}
.
This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.
Some examples :-
level 0 (=type) | level 1 | level 2 | level 3 |
---|---|---|---|
n32 |
7 |
7 |
7 |
3 n32 |
3 n32 |
[8,9,10] |
[8,9,10] |
VAR |
VAR |
{"a",1} |
{"a",1} |
2 VAR |
2 VAR |
[VAR,VAR] |
[{"b",2},{"c",3}] |
n16 ptr VAR |
22->VAR |
22->{"d",4} |
22->{"d",4} |
2 n8 ptr VAR |
2 n8 ptr VAR |
[33->VAR,44->VAR] |
[33->{"e",5},44->{"f",6}] |
Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. The Esc key brings you back to where you are now.
Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then decode the memory at the given address as if it were of the specified structure type.
The = key may be used to edit the current field on display.
If the current field is a numeric value, then you can type a new expression, according to the rules for numbers and expressions used when parsing the initialisation file. Examples include :-
1 1+2 addr "symbol" sizeof RGBTRIPLE map FF_ "FF_Split" | 0x20
If the current field is a buffer, then either ASCII data or raw hex bytes may be supplied :-
"a string within quotes" @1234FF00
If the zterm
attribute is applicable to the current field,
then after the data is stored, a NUL terminator is appended.
Extensions
The binary file arguments to BE are normally of the form :-
filename[@address]
This tells BE to load the file and whenever data at a memory address
from address
to address+filelength
is accessed,
to supply the data from the file.
However, it is possible to supply binary file arguments of the form :-
extension!args[@address]
Under OS/2, BE will ensure that BEextension.DLL
is loaded.
This DLL should be on the LIBPATH
and should contain certain
entrypoints which will be used by BE.
BE then passes the args
and address
to the
extension DLL, who does something of its own chosing with them.
The extension DLL can then supply data to BE on request.
Under Windows, provision for extension DLLs is also exists.
The DLL is located according to the algorithm used by the Win32
LoadLibrary
API.
Under AIX, extensions may be provided as shared librarys.
They are located by following the PATH
environment variable,
and are named beextension
.
One use of this might be the provision of an extension for handling
files too massive to load into memory all at once.
The extension could open a file handle and read bytes demanded by BE
upon request.
This extension could be provided in BEBIGFIL.DLL
,
and the user could type :-
be bigfil!verybigfile.dat
Another use might be in live-debug of adapter cards.
The extension would provide data bytes from the memory space of the
adapter. args
could be used to identify the slot the
adapter is in.
Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs.
Also, the surface of a disk or block device could be made accessible via an extension.
Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.
The file bememext.h
documents the extension interface.
Currently extensions may only be built for
the OS/2 or version of BE using the IBM VisualAge C++ compiler,
the Win32 version of BE using MS Visual C++,
or the AIX version of BE using the IBM xlC++ compiler.
I anticipate learning about shared library support on the various different
types of UNIX, enabling similar tricks to be performed there.
Apparently this area is becoming more standardised, with the new
dlopen
, dlsym
and dlclose
entrypoints.
When editing files, changes to the data are recorded in memory. When BE is closed down, it attempts to write back any changes back into the disk files where the data originally came from. BE will prompt you as to whether to save the changes back to disk.
If an extension is providing the data to BE for display, and the extension supports modification of the data, it has a choice :-
As most extensions provide a live view of some real-time data, most opt for the first choice.
BE can be found on the Hobbes FTP site.
The usual supplied be.zip
file should be expanded
using unzip be
or pkunzip -d be
on an OS/2 or
Windows machine.
You get a selection of executables, and the one to pick depends upon which operating system you wish to run :-
be_os2.exe
be_win.exe
be
be_os2.exe
to be.exe
, somewhere on the path.
be.ini
to the same directory as be.exe
so it can
be found.
be.htm
to wherever you keep documentation.
be.ico
to the same directory as be.exe
.
This allows BE to have a cute icon when running in the Workplace shell.
be.ini
can be found.
be_win.exe
to be.exe
, somewhere on the path.
be.ini
to the same directory as be.exe
so it can
be found.
be.htm
to wherever you keep documentation.
be
executable to somewhere like /usr/bin
or ~/bin
, or wherever on the path you consider appropriate.
be.ini
to .berc
in your home directory,
or make a soft link to a common .berc
somewhere from your home directory.
be.htm
to wherever you keep documentation.
Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.
Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.