Standard AuxData Schemata
The AuxData class provides generic storage for application-specific data. This allows data to be attached to either the IR or Module classes in GTIRB.
We specify a small number of standard gtirb::AuxData schemata to support interoperability. These are listed below, in two sets:
- Sanctioned Recommended for GTIRB users. Individual schemata are unlikely to change in future, although the set of Sanctioned schemata may grow.
- Provisional Under consideration for 'sanctioned' status.
For example, if you want to store alignment requirements for blocks and data objects, you can use an alignment table.
namespace schema {
struct MyAuxDataFoo {
static constexpr const char* Name = "foo";
typedef Foo Type;
};
}
}
using namespace schema;
void call_me_from_main()
{
AuxDataContainer::registerAuxDataType<Alignment>();
AuxDataContainer::registerAuxDataType<MyAuxDataFoo>();
}
void do_stuff_with_gtirb()
{
module.
addAuxData<Alignment>(std::map<UUID, uint64_t>{});
if (align_map)
Foo my_foo = BuildAFoo();
module.
addAuxData<MyAuxDataFoo>(std::move(my_foo));
module.
getAuxData<MyAuxDataFoo>()->some_member_function();
}
Sanctioned AuxData Tables
The following are the sanctioned AuxData table schemata.
elfDynamicInit
| |
Label | "elfDynamicInit" |
Type | gtirb::UUID |
Value | CodeBlock UUID |
AttachedTo | gtirb::Module |
Note | The CodeBlock to which a DT_INIT entry in an ELF file's .dynamic section refers. |
elfDynamicFini
| |
Label | "elfDynamicFini" |
Type | gtirb::UUID |
Value | CodeBlock UUID |
AttachedTo | gtirb::Module |
Note | The CodeBlock to which a DT_FINI entry in an ELF file's .dynamic section refers. |
elfSoname
| |
Label | "elfSoname" |
Type | std::string |
Value | The SONAME of a library. |
AttachedTo | gtirb::Module |
Note | The string value which the DT_SONAME entry in an ELF file's .dynamic section contains. |
elfStackExec
| |
Label | "elfStackExec" |
Type | bool |
Value | Stack executable flag specified by PT_GNU_STACK segment in ELF files. |
AttachedTo | gtirb::Module |
elfStackSize
| |
Label | "elfStackSize" |
Type | uint64_t |
Value | The size of the PT_GNU_STACK segment in ELF files, which may influence the runtime stack size in certain environments. |
AttachedTo | gtirb::Module |
functionBlocks
| |
Label | "functionBlocks" |
Type | std::map<gtirb::UUID, std::set<gtirb::UUID>> |
Key | Function UUID. |
Value | The set of UUIDs of all the blocks (gtirb::CodeBlock) in the function. |
AttachedTo | gtirb::Module |
Note | This table identifies all of the gtirb::CodeBlocks that belong to each function. These do not necessarily have to be contiguous in the address space. Note that there is no function notion in the core GTIRB IR. A function's UUID is just a unique identifier that is consistently used across all function-related AuxData tables. |
functionEntries
| |
Label | "functionEntries" |
Type | std::map<gtirb::UUID, std::set<gtirb::UUID>> |
Key | Function UUID. |
Value | The set of UUIDs of all the entry blocks (gtirb::CodeBlock) for the function. |
AttachedTo | gtirb::Module |
Note | This table identifies all gtirb::CodeBlocks that represent entry points to each function. A single function may have more than one entry point. Note that there is no function notion in the core GTIRB IR. A function's UUID is just a unique identifier that is consistently used across all function-related AuxData tables. |
functionNames
| |
Label | "functionNames" |
Type | std::map<gtirb::UUID, gtirb::UUID> |
Key | Function UUID. |
Value | The UUID of a gtrb::Symbol whose name field contains the name of the function. |
AttachedTo | gtirb::Module |
Note | There may be more than one gtirb::Symbol associated with the address(es) corresponding to the entry point(s) of a function. This table identifies a canonical gtirb::Symbol to be used for each function. Note that there is no function notion in the core GTIRB IR. A function's UUID is just a unique identifier that is consistently used across all function-related AuxData tables. |
types
alignment
comments
| |
Label | "comments" |
Type | std::map<gtirb::Offset, std::string> |
Key | The gtirb::Offset of a comment. |
Value | A comment string relevant to the specified offset in the specified GTIRB entry. |
AttachedTo | gtirb::Module |
Note | The gtirb::Offset refers to the UUID of an entity in memory and a byte offset within that entity to indicate the point at which the comment applies. Comments can contain arbitrary content and are likely generated by analysis tools. They often do not (but may) represent comments present in the original source code of the binary. |
symbolForwarding
padding
| |
Label | "padding" |
Type | std::map<gtirb::Offset, uint64_t> |
Key | The gtirb::Offset at which padding is present. |
Value | The length of the padding, in bytes. |
AttachedTo | gtirb::Module |
Note | Padding here may be 0's or it may be valid instructions. An entry in this table indicates that an analysis has determined that at the given gtirb::Offset (UUID of an entity in memory and byte offset into that entity) and length of bytes indicated constitute content that is unused by the program and is only present to ensure alignment of neighboring objects. Note: some disassemblers may still create a gtirb::CodeBlock or gtirb::DataBlock for the same portion of address space that a padding entry covers. |
Provisional AuxData Tables
The following are the provisional AuxData table schemata.
Label | Type |
"binaryType" | std::vector<std::string> |
"cfiDirectives" | std::map<gtirb::Offset, std::vector<std::tuple<std::string, std::vector<int64_t>, gtirb::UUID>>> |
"elfSectionProperties" | std::map<gtirb::UUID, std::tuple<uint64_t, uint64_t>>>> |
"elfSymbolInfo" | std::map<gtirb::UUID, std::tuple<uint64_t, std::string, std::string, std::string, uint64_t>> |
"elfSymbolVersions" | std::tuple<std::map<uint16_t, std::tuple<std::vector<std::string>, uint16_t>>, std::map<std::string, std::map<uint16_t, std::string>>, std::map<gtirb::UUID, std::tuple<uint16_t, bool>>> |
"encodings" | std::map<gtirb::UUID, std::string> |
"functionNameProbabilities" | std::map<std::string, std::map<gtirb::UUID, std::vector<std::tuple<std::string, std::string, float>>>> |
"includedLibraryNames" | std::map<gtirb::UUID, std::string> |
"includedLibraryVersions" | std::map<gtirb::UUID, std::string> |
"libraries" | std::vector<std::string> |
"libraryPaths" | std::vector<std::string> |
"peExportEntries" | std::vector<std::tuple<uint64_t, int64_t, std::string>> |
"peExportedSymbols" | std::vector<gtirb::UUID> |
"peImportEntries" | std::vector<std::tuple<uint64_t, int64_t, std::string, std::string>> |
"peImportedSymbols" | std::vector<gtirb::UUID> |
"peResource" | std::vector<std::tuple<std::vector<uint8_t>, gtirb::Offset, uint64_t>> |
"profile" | std::map<gtirb::UUID, uint64_t> |
"prototypeTable" | std::map<gtirb::UUID, gtirb::UUID> |
"sccs" | std::map<gtirb::UUID, int64_t> |
"symbolicExpressionSizes" | std::map<gtirb::Offset, uint64_t> |
"typeTable" | std::map<gtirb::UUID, std::variant<uint64_t, std::tuple<uint8_t>, std::tuple<int8_t, uint64_t>, uint64_t, uint64_t, std::tuple<gtirb::UUID, std::vector<gtirb::UUID>>, gtirb::UUID, std::tuple<gtirb::UUID, uint64_t>, tuple<uint64_t, std::vector<tuple<uint64_t, gtirb::UUID>>>, std::tuple<uint8_t>, gtirb::UUID>> |
encodings
| |
Label | "encodings" |
Type | std::map<gtirb::UUID, std::string> |
Key | The gtirb::UUID of a data object. |
Value | The encoding of the data object. |
AttachedTo | gtirb::Module |
Note | Map from (typed) data objects to the encoding of the data, expressed as a std::string containing an assembler encoding specifier: "string", "uleb128" or "sleb128". |
elfSectionProperties
| |
Label | "elfSectionProperties" |
Type | std::map<gtirb::UUID, std::tuple<uint64_t, uint64_t>>>> |
Key | The gtirb::UUID of a section. |
Value | The tuple with the ELF section types and flag. |
AttachedTo | gtirb::Module |
Note | Map from section UUIDs to tuples with the ELF section types and flags. |
elfSymbolVersions
| |
Label | "elfSymbolVersions" |
Type | std::tuple<std::map<uint16_t, std::tuple<std::vector<std::string>, uint16_t>>, std::map<std::string, std::map<uint16_t, std::string>>, std::map<gtirb::UUID, std::tuple<uint16_t, bool>>> |
Key | The gtirb::UUID of a section. |
Value | The tuple with the ELF section types and flag. |
AttachedTo | gtirb::Module |
Note | Tuple with symbol version definitions, needed symbol versions, and a mapping of symbol UUIDs to symbol versions. Symbol version definitions are ElfSymDefs = std::map<SymbolVersionId, std::tuple<std::vector<std::string>>, uint16_t> , a map from symbol version identifiers version definitions. These correspond to ELFxx_Verdef entries in the ELF section .gnu.version_d . The values in the map are tuples containing the list of versions strings and the verdef flags. The verdef flag may be VER_FLG_BASE (0x1), which indicates that the given version definiton is the file itself, and must not be used for matching a symbol. The first element of the list is the version itself, the subsequent elements are predecessor versions. The needed symbol versions are ElfSymVerNeeded = std::map<std::string, std::map<SymbolVersionId, std::string>> , a map from dynamic library names to the symbol versions that they need. For each library, we have a map from version identifiers to version strings. Finally, symbol UUIDs are mapped to symbol versions as ElfSymbolVersionsEntries = std::map<gtirb::UUID, std::tuple<SymbolVersionId,bool>> , where the bool represents the HIDDEN attribute. Symbol version identifiers are SymbolVersionId = uint16_t integers. |
cfiDirectives
| |
Label | "cfiDirectives" |
Type | std::map<gtirb::Offset, std::vector<std::tuple<std::string, std::vector<int64_t>, gtirb::UUID>>> |
Key | The gtirb::Offset of a cfi directive. |
Value | cfi directive contains: a string describing the directive, a vector of numeric arguments, and an optional symbolic argument (represented with the UUID of the symbol |
AttachedTo | gtirb::Module |
Note | Map from Offsets to vector of cfi directives. A cfi directive contains: a string describing the directive, a vector of numeric arguments, and an optional symbolic argument (represented with the UUID of the symbol). |
elfSymbolInfo
| |
Label | "elfSymbolInfo" |
Type | std::map<gtirb::UUID, std::tuple<uint64_t, std::string, std::string, std::string, uint64_t>> |
Key | The gtirb::UUID of a symbol. |
Value | The type, binding, and visibility categories of the symbol. |
AttachedTo | gtirb::Module |
Note | On ELF targets only: Map from symbols to their type, binding, and visibility categories. |
libraries
| |
Label | "libraries" |
Type | std::vector<std::string> |
Value | The name of a library. |
AttachedTo | gtirb::Module |
Note | Names of the external libraries that are needed dynamically at run time. |
libraryPaths
| |
Label | "libraryPaths" |
Type | std::vector<std::string> |
Value | A path contained in the rpath of the binary. |
AttachedTo | gtirb::Module |
Note | Paths contained in the rpath of the binary. |
binaryType
| |
Label | "binaryType" |
Type | std::vector<std::string> |
Value | A binary type descriptor. |
AttachedTo | gtirb::Module |
Note | A set of binary type descriptors e.g. for ELF whether the binary is PIE "DYN" or not, "EXEC". PE binaries have additional descriptors, "DLL" or "EXE, and subsystem descriptor, e.g. WINDOWS_GUI or WINDOWS_CUI. |
SCCs
| |
Label | "SCCs" |
Type | std::map<gtirb::UUID, int64_t> |
Key | The gtirb::UUID of a block |
Value | The intra-procedural SCC identifier of the block. |
AttachedTo | gtirb::Module |
Note | The intra-procedural SCC identifier of each block. |
symbolicExpressionSizes
| |
Label | "symbolicExpressionSizes" |
Type | std::map<gtirb::Offset, uint64_t> |
Key | The gtirb::Offset of a symbolic expression. |
Value | The size of the expression, in bytes. |
AttachedTo | gtirb::Module |
Note | Map from an Offset of a symbolic expression in a ByteInterval to its extent, a size in bytes. |
peImportEntries
| |
Label | "peImportEntries" |
Type | std::vector<std::tuple<uint64_t, int64_t, std::string, std::string>> |
Value | A tuples containing details of an imported function. |
AttachedTo | gtirb::Module |
Note | List of tuples detailing an imported function address, ordinal, function name, and library names for PE. |
peExportEntries
| |
Label | "peExportEntries" |
Type | std::vector<std::tuple<uint64_t, int64_t, std::string>> |
Value | A tuples containing details of an exported function. |
AttachedTo | gtirb::Module |
Note | List of tuples detailing an exported address, ordinal, and name for PE. |
peImportedSymbols
peExportedSymbols
peResource
| |
Label | "peResource" |
Type | std::vector<std::tuple<std::vector<uint8_t>, gtirb::Offset, uint64_t>> |
Value | A resource header, data length, and data pointer. |
AttachedTo | gtirb::Module |
Note | List of PE resources. A resource header, data length, and data pointer. |
profile
| |
Label | "profile" |
Type | std::map<gtirb:UUID,uint64_t> |
Key | The gtirb::UUID of a gtirb::CodeBlock. |
Value | The number of times that block was executed. |
AttachedTo | gtirb::Module |
Notes | An entry in this table describes how many times a code block was executed. Blocks that are not present in this aux data table should be assumed to have a value of 0, indicating that they were not executed. |
functionNameProbabilities
| |
Label | "functionNameProbabilities" |
Type | std::map<std::string, std::map<gtirb::UUID, std::vector<std::tuple<std::string, std::string, float>>>> |
Key | Name of the tool that made the predictions. |
Value | Map from function UUID to a list of weighted predictions. Each prediction is a tuple of <function name, binary name, score> where the score's meaning is tool dependent. Several existing tools use the convention that higher score is a better match and values ranging from 0.0 to 1.0. |
AttachedTo | gtirb::Module |
Notes | Used to collect results from tools that identify functions and their source libraries. Source library information is tracked in includedLibraryNames and includedlibraryVersions. |
includedLibraryNames
| |
Label | "includedLibraryNames" |
Type | std::map<gtirb::UUID, std::string> |
Key | Included library UUID. |
Value | The name of the library. |
AttachedTo | gtirb::Module |
Notes | Names of libraries that are included in an executable (i.e., their code is intermingled with the executable code). |
includedlibraryVersions
| |
Label | "includedLibraryVersions" |
Type | std::map<gtirb::UUID, std::string> |
Key | Included library UUID. |
Value | Version string for the included library. |
AttachedTo | gtirb::Module |
Notes | Versions of libraries that are included in an executable (i.e., their code is intermingled with the executable code). |
typeTable
| |
Label | "typeTable" |
Type | std::map<gtirb::UUID, std::variant<uint64_t, std::tuple<uint8_t>, std::tuple<int8_t, uint64_t>, uint64_t, uint64_t, std::tuple<gtirb::UUID, std::vector<gtirb::UUID>>, gtirb::UUID, std::tuple<gtirb::UUID, uint64_t>, tuple<uint64_t, std::vector<tuple<uint64_t, gtirb::UUID>>>, std::tuple<uint8_t>, gtirb::UUID>> |
Key | UUID of the type object |
Value | Variant of the object type, where the variants field each represent: Unknown, Bool, Int, Char, Float, Function, Pointer, Array, Struct, Void, Alias, in that order. |
AttachedTo | gtirb::Module |
Notes | Contains structured type information about objects in the variant. Some empty tuples have been replaced with std::tuple<uint8_t> which is default-initialized to zero since some GTIRB implementations cannot store a 0-length tuple. The corresponding semantics of each type's variant field is the following:
- Unknown: uint64_t - Size of the unknown type
- Bool: std::tuple<uint8_t> - default initialized to zero
- Int: std::tuple<int8_t, uint64_t> - A tuple of signedness (1 for signed, 0 for unsigned) and width of int
- Char: uint64_t - Size of the character
- Float: uint64_t - size of the floating point number
- Function: std::tuple<gtirb::UUID, std::vector<gtirb::UUID>> - A tuple of return type UUID, and a list of parameter type UUIDs
- Pointer: gtirb::UUID - UUID of pointed-to type
- Array: std::tuple<gtirb::UUID, uint64_t> - A tuple of UUID of the elements of the array, and the number of element sin that array
- Alias: gtirb::UUID - The type being aliased (note this is effectively a typedef )
- Struct: tuple<uint64_t, std::vector<std::tuple<uint64_t, gtirb::UUID>>> - A tuple of the size of the structure in total, and a vector of its fields. Each field is represented as a tuple of the offset at which the field is located, and the UUID of the type of that field.
- Void: std::tuple<uint8_t> - default initialized to zero |
prototypeTable
| |
Label | "prototypeTable" |
Type | std::map<gtirb::UUID, gtirb::UUID> |
Key | UUID of the function |
Value | UUID of the function type in the typeTable |
AttachedTo | gtirb::Module |
Notes | Maps functions' UUIDs to their associated typeTable entry for the purpose of giving them prototypes. NOTE: The associated type table entry must be a Function type object. |