AuxData Documentation
GTIRB Auxiliary Data (AuxData
) tables are a value-store for arbitrary
user-defined analysis data. GTIRB defines a number of AuxData
tables with
stable schemata that are recommended for all GTIRB users as Sanctioned
tables and also a set of tables under consideration for standardization as
Provisional.
Note that all unsanctioned tables are considered unstable and their schemata may change between minor releases.
Ddisasm generates the following sanctioned
, provisional
, and unsanctioned
tables:
ddisasmVersion
unsanctioned
Name | ddisasmVersion |
Type | std::string |
Value | The version of ddisasm used to produce the GTIRB file. |
For example, 1.5.3 (8533031c 2022-03-31) X64
represents version 1.5.3
compiled on commit 8533031c
with support for the X64
ISA.
binaryType
unsanctioned
Name | binaryType |
Type | std::vector<std::string> |
Value | A list of binary type descriptors. |
ELF binaries have either a [DYN
,PIE
] or [DYN
,SHARED
] or [EXEC
] or [REL
] entry.
PE binaries have either a [DLL
] or [EXE
] entry and optionally a subsystem descriptor (e.g. WINDOWS_GUI
).
archInfo
unsanctioned
Name | archInfo |
Type | std::map<std::string, std::string> |
Value | A map of detailed architecture information. |
Currently only generated for ARM32 binaries.
Possible key, value pairs are:
Key | Values |
---|---|
"Profile" | "Application", "RealTime", "Microcontroller", "System" |
"Arch" | "Pre_v4", "v4", "v4T", "v5T", "v5TE", "v5TEJ", "v6", "v6KZ", "v6K", "v7", "v6_M", "v6S_M", "v7E_M", "v8_A", "v8_R", "v8_M_Base", "v8_M_Main", "v8_1_M_Main", "v9_A" |
These correspond to values in the ARM attributes section, but may also be inferred by ddisasm based on the presence of particular instructions if no ARM attributes section is present in the binary.
padding
sanctioned
Name | padding |
Type | std::map<gtirb::Offset, uint64_t> |
Value | Offset of padding in a ByteInterval and the padding length in bytes. |
sectionProperties
provisional
Name | sectionProperties |
Type | std::map<gtirb::UUID, std::tuple<uint64_t, uint64_t>> |
Value | Map from section UUIDs to a tuple with the section type constant and section flags constant. |
Integer constants are defined by the binary format specification:
Note that the section type value is always 0 for PE sections.
symbolForwarding
sanctioned
Name | symbolForwarding |
Type | std::map<gtirb::UUID, gtirb::UUID> |
Value | Map one symbol to another symbol. (Used for flattening linker-induced indirect references.) |
This table is used to resolve certain kinds of indirect constructs used by linkers in symbolic expressions. Currently there are five kinds of constructs that use symbolForwarding.
GOT references
Instructions in the code might refer to function or data through the GOT table. The code will point to a GOT entry, whose content gets resolved to the referred element at runtime.
For example, the code in the binary would be:
mov RAX,QWORD PTR [RIP+L_200ff0]
...
...
# An entry on the GOT table
L_200ff0:
.quad Foo
Where 200ff0
is an entry in the GOT table that contains a relocation to Foo
.
Ddisasm adds an entry to the symbol forwarding table of the form: L_200ff0
-> Foo
,
and the assembly generated by gtirb-pprinter will be:
mov RAX,QWORD PTR [RIP+Foo@GOTPCREL]
The linker will then take care of creating a new GOT entry based on this symbolic expression.
PLT references
The case of PLT entries is similar. The original code would look like:
# the call in the code
call FUN_590
# the thunk in the PLT table
FUN_590:
jmp QWORD PTR [RIP+Foo@GOTPCREL]
The original code calls to the PLT thunk, which then resolves
the reference dynamically and jumps to the corresponding function.
Ddisasm adds an entry to the symbol forwarding table of the form FUN_590
-> Foo
,
and the assembly generated by gtirb-pprinter will be:
call Foo@PLT
COPY relocations
In the case of copy relocations, the original code looks as follows:
mov RAX,QWORD PTR [RIP+stdout]
...
...
# Bss section
stdout:
201020 .zero 8
The entry in the Bss section contains a COPY relocation that will copy the content of stdout from glibc into the reserved slot. Unfortunately, we don’t have a way to tell the linker to generate a copy relocation for the existing slot at address 201020
. Therefore, we will ignore the slot and let the linker reserve a new one.
This is achieved as follows:
Ddisasm renames the defined symbol
stdout
tostdout_copy
The symbolic expressions still point to this renamed symbol.
Ddisasm adds a symbol forwarding entry
stdout_copy
->stdout
, which forwards the renamed symbol to an copy of the symbol with the original name but that points to a ProxyBlock.
The symbol forwarding entry is used by gtirb-pprinter to generate the following assembly code:
mov RAX,QWORD PTR [RIP+stdout] # note that this still points to stdout_copy, but it is being redirected.
...
...
# Bss section
stdout_copy:
201020 .zero 8 # this is now dead data
ABI intrinsic
This is used for symbols that get redefined by the linking process, we want the references to point to the new versions of the symbols defined by the linker, not to where those symbols were in the original binary.
The mechanism is the same as with COPY relocation:
Ddisasm renames the defined symbol to
<Name>_copy
.The symbolic expressions still point to the original definition.
Ddisasm adds a symbol forwarding entry
<Name> -> <Name>_copy
, which forwards the renamed symbol to an copy of the symbol with the original name but that points to a ProxyBlock.
This is applied to symbols like _GLOBAL_OFFSET_TABLE_
or __dso_handle
.
References to imported symbols in PE binaries
PE binaries record imported procedures in the import address table (IAT). Each entry corresponds to an imported procedure and it is populated by the Windows loader.
call [.L_40100C]
...
...
# An entry on the IAT table
.L_40100C:
.quad Foo
This is very similar to the GOT table and Ddisasm generates the same kind of symbol forwarding entry.
Ddisasm adds an entry to the symbol forwarding table of the form IAT_Entry
-> Foo
.
PE binaries can also use small thunks to support imported procedures for code that was not compiled like that. In that case, the user code will have a direct call and the linker will add a small thunk that references the IAT table.
call 200000
...
# The compiler added thunk
.L_200000:
jmp [.L_40100C]
...
# An entry on the IAT table
.L_40100C:
.quad Foo
The original code calls to the thunk, which then jumps to the target function using the IAT.
Ddisasm adds an entry to the symbol forwarding table of the form .L_200000
-> Foo
.
symbolicExpressionSizes
provisional
Name | symbolicExpressionSizes |
Type | std::map<gtirb::Offset, uint64_t> |
Value | Map from an Offset of a symbolic expression in a ByteInterval to its extent, a size in bytes. |
encodings
provisional
Name | encodings |
Type | std::map<gtirb::UUID, std::string> |
Value | Map from (typed) data objects to the encoding of the data, expressed as a std::string containing an assembler encoding specifier: "string", "uleb128" or "sleb128". |
functionEntries
sanctioned
Name | functionEntries |
Type | std::map<gtirb::UUID, std::set<gtirb::UUID>> |
Value | UUIDs of the blocks that are entry points of each function. |
functionBlocks
sanctioned
Name | functionBlocks |
Type | std::map<gtirb::UUID, std::set<gtirb::UUID>> |
Value | UUIDs of the blocks that belong to each function. |
functionNames
sanctioned
Name | functionNames |
Type | std::map<gtirb::UUID, gtirb::UUID> |
Value | UUID of the symbol holding the string name of each function. |
SCCs
provisional
Name | SCCs |
Type | std::map<gtirb::UUID, int64_t> |
Value | Map UUID of blocks to an identifier for a intra-procedural subgraph it belongs to, i.e. a Strongly Connected Component (SCC) identifier. |
libraries
provisional
Name | libraries |
Type | std::vector<std::string> |
Value | Names of the libraries that are needed. |
libraryPaths
provisional
Name | libraryPaths |
Type | std::vector<std::string> |
Value | Paths contained in the rpath of the binary. |
souffleFacts
unsanctioned
Name | souffleFacts |
Type | std::map<std::string, std::tuple<std::string, std::string>> |
Value | Map of Souffle facts by relation name to their associated type signatures and CSV. |
Note: Relation names are namespaced with the name of the pass in which they belong; for example, block_points
is identified by disassembly.block_points
.
souffleOutputs
unsanctioned
Name | souffleOutputs |
Type | std::map<std::string, std::tuple<std::string, std::string>> |
Value | Map of Souffle outputs by relation name to their associated type signatures and CSV. |
Note: Relation names are namespaced with the name of the pass in which they belong; for example, block_points
is identified by disassembly.block_points
.
ELF
dynamicEntries
unsanctioned
Name | dynamicEntries |
Type | std::set<std::tuple<std::string, uint64_t>> |
Value | Dynamic section entries: Name and value. |
sectionIndex
unsanctioned
Name | sectionIndex |
Type | std::map<uint64_t, gtirb::UUID> |
Value | Map from ELF section indices to section UUIDs. |
elfDynamicInit
sanctioned
Label | "elfDynamicInit" |
Type | gtirb::UUID |
Note | The CodeBlock to which a DT_INIT entry in an ELF file's .dynamic section refers. |
elfDynamicFini
Label | "elfDynamicFini" |
Type | gtirb::UUID |
Note | The CodeBlock to which a DT_FINI entry in an ELF file's .dynamic section refers. |
elfSymbolInfo
provisional
Name | elfSymbolInfo |
Type | std::map<gtirb::UUID, std::tuple<uint64_t, std::string, std::string, std::string, uint64_t>> |
Value | Map from symbol UUIDs to their ELF Symbol information containing the Size, Type, Binding, Visibility, and SectionIndex of the symbol. Type can be "NOTYPE", "OBJECT", "FUNC", etc. Binding can be "LOCAL", "GLOBAL", or "WEAK". Visibility can be "DEFAULT", "HIDDEN", "PROTECTED", etc. For a complete list of possible values see e.g. https://refspecs.linuxbase.org/elf/gabi4+/ch4.symtab.html |
elfSymbolTabIdxInfo
unsanctioned
Name | elfSymbolTabIdxInfo |
Type | std::map<gtirb::UUID, std::vector<std::tuple<std::string, uint64_t>>> |
Value | Map from symbol UUIDs to symbol section information including the names of the symbol tables where the symbol was declared (typically ".dynsym" or ".symtab") and the index within that table. |
elfSymbolVersions
provisional
Name | elfSymbolVersions |
Type | std::tuple<ElfSymVerDefs,ElfSymVerNeeded,ElfSymbolVersionsEntries> |
Value | Tuple with symbol version definitions, needed symbol versions, and a mapping of symbol UUIDs to symbol versions. |
ElfSymverDefs = std::map<SymbolVersionId, std::tuple<std::vector<std::string>>, uint16_t>
Symbol version definitions are a map from symbol version identifiers version
definitions. These correspond to ELFxx_Verdef
entries in the ELF section
.gnu.version_d
. The values in the map are tuples containing the list of
versions strings and the verdef flags. The verdef flag may be VER_FLG_BASE
(0x1), which indicates that the given version definition is the file itself,
and must not be used for matching a symbol. The first element of the list is
the version itself, the subsequent elements are predecessor versions.
ElfSymVerNeeded = std::map<std::string, std::map<SymbolVersionId, std::string>>
The needed symbol versions are a map from dynamic library names to the symbol versions that they need. For each library, we have a map from version identifiers to version strings.
ElfSymbolVersionsEntries = std::map<gtirb::UUID, std::tuple<SymbolVersionId,bool>>
Symbol UUIDs are mapped to symbol versions where the bool
represents the
HIDDEN
attribute (i.e., bit 15 of the version ID). Symbol version
identifiers are SymbolVersionId = uint16_t
integers.
HIDDEN
symbol versions correspond to symbols specified with @
and not
visible to the static linker, while the default version of a symbol specified
with @@
will be non-hidden.
Name | cfiDirectives |
Type | std::map<gtirb::Offset, std::vector<std::tuple<std::string, std::vector<int64_t>, gtirb::UUID>>> |
Value | Map from Offsets to vector of cfi directives. |
A cfi directive contains: a string describing the directive, a vector of numeric arguments, and an optional symbolic argument (represented with the UUID of the symbol).
PE
peImportEntries
provisional
Name | peImportEntries |
Type | std::vector<std::tuple<uint64_t, int64_t, std::string, std::string>> |
Value | List of tuples detailing an imported function address, ordinal, function name, and library names for PE. |
peExportEntries
provisional
Name | peExportEntries |
Type | std::vector<std::tuple<uint64_t, int64_t, std::string>> |
Value | List of tuples detailing an exported address, ordinal, and name for PE. |
peImportedSymbols
provisional
Name | peImportedSymbols |
Type | std::vector<gtirb::UUID> |
Value | UUIDs of the imported symbols for PE. |
peExportedSymbols
provisional
Name | peExportedSymbols |
Type | std::vector<gtirb::UUID> |
Value | UUIDs of the exported symbols for PE. |
peSafeExceptionHandlers
provisional
Name | peSafeExceptionHandlers |
Type | std::set<gtirb::UUID> |
Value | UUIDs of the blocks in the SEHandlerTable pointer array of safe exception handlers for PE32 binaries compiled with /SAFESEH . |
peResource
provisional
Name | peResource |
Type | std::vector<std::tuple<std::vector<uint8_t>, gtirb::Offset, uint64_t>> |
Value | List of PE resources. A resource header, data length, and data pointer. |
peLoadConfig
unsanctioned
Name | peLoadConfig |
Type | std::map<std::string, uint64_t> |
Value | Map of PE load configuration field names to integral values. |
See the _IMAGE_LOAD_CONFIG_DIRECTORY32
struct in WINNT.h
for a complete list
of fields and The Load Configuration Structure for additional information.
overlay
unsanctioned
Name | overlay |
Type | std::vector<uint8_t> |
Value | List of bytes appended to the binary that are not loaded to memory. |
comments
sanctioned
std::map<gtirb::Offset, std::string>
Comments are used to bind additional analysis details and debugging information to instructions. See the
--debug
command line flag.