gtirb package

Subpackages

Submodules

Module contents

class gtirb.AuxData(data, type_name, lazy_container=None)[source]

Bases: object

AuxData objects can be attached to the gtirb.IR or individual gtirb.Module s to store additional client-specific data in a portable way.

AuxData represents a portable, language-independent manner of encoding rich data. To do this, all data is stored on disk as a series of bytes with a string describing the format of the data, called a type name. See gtirb.serialization for the list of all default types. Types may also be parameterized; for example, mapping<string,UUID> is a dict from str objects to UUID objects. All AuxData requires a valid type name in order to be serialized.

Variables:
  • ~.data – The value stored in this AuxData.

  • ~.type_name – A string describing the type of data. Used to determine the proper codec for serializing this AuxData.

__init__(data, type_name, lazy_container=None)[source]
Parameters:
  • data (object) – The value stored in this AuxData.

  • type_name (str) – A string describing the type of data. Used to determine the proper codec for serializing this AuxData.

  • lazy_container (typing.Optional[gtirb.auxdata._LazyDataContainer]) – An object that will lazily deserialize the auxdata table backing this object, or None.

property data: Any
serializer: typing.ClassVar[gtirb.serialization.Serialization] = <gtirb.serialization.Serialization object>

This is a gtirb.Serialization instance, used to encode and decode data fields of all AuxData. See gtirb.serialization for details.

class gtirb.AuxDataContainer(aux_data={}, uuid=None)[source]

Bases: Node

The base class for anything that holds AuxData tables; that is, gtirb.IR and gtirb.Module.

Variables:

~.aux_data – The auxiliary data associated with the object, as a mapping from names to gtirb.AuxData.

__init__(aux_data={}, uuid=None)[source]
Parameters:
  • aux_data (typing.Union[typing.Mapping[str, gtirb.auxdata.AuxData], typing.Iterable[typing.Tuple[str, gtirb.auxdata.AuxData]]]) – The initial auxiliary data to be associated with the object, as a mapping from names to gtirb.AuxData. Defaults to an empty dict.

  • uuid (typing.Optional[uuid.UUID]) – the UUID of this AuxDataContainer, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

deep_eq(other)[source]

This overrides gtirb.Node.deep_eq() to check for AuxData equality.

Because the values stored by AuxData are not necessarily amenable to deep checking, the auxiliary data dictionaries stored for self and other are not deeply checked. Instead, they are considered to be equal if their sets of keys are equal.

Return type:

bool

class gtirb.Block(uuid=None)[source]

Bases: Node

The base class for blocks. Symbols may have references to any subclass of Block.

property module: Module | None

Get the module this node ultimately belongs to.

property references: Iterator[Symbol]

Get all the symbols that refer to this block.

class gtirb.ByteBlock(*, size=0, offset=0, uuid=None, byte_interval=None)[source]

Bases: Block

The base class for blocks that belong to a ByteInterval and store their bytes there.

Variables:
  • ~.size – The size of the block in bytes.

  • ~.offset – The offset from the beginning of the byte interval to which this block belongs. Multiple blocks in the same interval may have the same offset.

__init__(*, size=0, offset=0, uuid=None, byte_interval=None)[source]
Parameters:
  • size (int) – The size of the data object in bytes.

  • offset (int) – The offset from the beginning of the byte interval to which this block belongs.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this ByteBlock, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • byte_interval (typing.Optional[gtirb.byteinterval.ByteInterval]) – The ByteInterval this block belongs to.

property address: int | None

Get the address of this block, or None if not present.

property byte_interval: ByteInterval | None

The ByteInterval this block belongs to.

contains_address(address)[source]

Indicate if the provided address is within this block. Returns False if the block has no address.

Return type:

bool

contains_offset(offset)[source]

Indicate if the provided offset is within this block.

Return type:

bool

property contents: bytes

Get the bytes in this block.

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property ir: IR | None

Get the IR this node ultimately belongs to.

property module: Module | None

Get the module this node ultimately belongs to.

offset

A descriptor that will notify a parent when the value is set and can be otherwise used like a normal attribute.

property section: Section | None

Get the section this node ultimately belongs to.

size

A descriptor that will notify a parent when the value is set and can be otherwise used like a normal attribute.

class gtirb.ByteInterval(*, address=None, size=None, initialized_size=None, contents=b'', blocks=(), symbolic_expressions={}, uuid=None, section=None)[source]

Bases: Node

A contiguous region of bytes in a binary.

A ByteInterval defines a relative ordering for a group of ByteBlocks, optionally at a fixed address in memory. It also stores the bytes associated with these blocks.

If two blocks are in two different ByteIntervals, then it should be considered safe (that is, preserving of program semantics) to move one block relative to the other in memory. If two blocks are in the same ByteInterval, then it should be considered unknown if moving the two blocks relative to one another in memory is a safe operation.

Variables:
  • ~.address – The fixed address of this interval, if present. If this field is present, it may indicate the original address at which this interval was located at in memory, or it may indicate that this block’s address is fixed and must not be changed. If this field is not present, it indicates that the interval is free to be moved around in memory while preserving program semantics.

  • ~.size – The size of this interval in bytes. If this number is greater than initialized_size, this indicates that the high addresses taken up by this interval consist of uninitialized bytes. This often occurs in BSS sections, where data is zero-initialized rather than stored as zeroes in the binary.

  • ~.contents – The bytes stored in this interval.

  • ~.blocks – A set of all ByteBlocks in this interval.

  • ~.symbolic_expressions – A mapping, from offset in the interval, to a SymbolicExpression in the interval.

__init__(*, address=None, size=None, initialized_size=None, contents=b'', blocks=(), symbolic_expressions={}, uuid=None, section=None)[source]
Parameters:
  • address (typing.Optional[int]) – The fixed address of this interval, if present.

  • size (typing.Optional[int]) – The size of this interval in bytes.

  • initialized_size (typing.Optional[int]) – The number of initialized bytes in this interval.

  • contents (typing.ByteString) – The bytes stored in this interval.

  • blocks (typing.Iterable[gtirb.block.ByteBlock]) – A set of all ByteBlocks in this interval.

  • symbolic_expressions (typing.Union[typing.Mapping[int, gtirb.symbolicexpression.SymbolicExpression], typing.Iterable[typing.Tuple[int, gtirb.symbolicexpression.SymbolicExpression]]]) – A mapping, from offset in the interval, to a SymbolicExpression in the interval.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this ByteInterval, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • section (typing.Optional[gtirb.section.Section]) – The Section this interval belongs to.

address

A descriptor that will notify a parent when the value is set and can be otherwise used like a normal attribute.

byte_blocks_at(addrs)[source]

Finds all the byte blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

byte_blocks_on(addrs)[source]

Finds all the byte blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

code_blocks_at(addrs)[source]

Finds all the code blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

code_blocks_on(addrs)[source]

Finds all the code blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

data_blocks_at(addrs)[source]

Finds all the data blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

data_blocks_on(addrs)[source]

Finds all the data blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property initialized_size: int

The number of initialized bytes in this interval.

Not all bytes in this interval may correspond to bytes physically stored in the underlying file format. This can occur, for example, in BSS sections, which are zero-initialized at loadtime, but these zeroes are not stored in the file itself. If this number is smaller than size, this indicates that any bytes past this number are unitialized bytes with values determined at loadtime. As such, all bytes past this number in this interval’s byte vector are truncated when saving to file.

property ir: IR | None

Get the IR this node ultimately belongs to.

property module: Module | None

Get the module this node ultimately belongs to.

property section: Section | None

The Section this interval belongs to.

size

A descriptor that will notify a parent when the value is set and can be otherwise used like a normal attribute.

property symbolic_expressions: MutableMapping[int, SymbolicExpression]
symbolic_expressions_at(addrs)[source]

Finds all the symbolic expressions that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[typing.Tuple[gtirb.byteinterval.ByteInterval, int, gtirb.symbolicexpression.SymbolicExpression]]

Returns:

Yields (interval, offset, symexpr) tuples for every symbolic expression in the range.

class gtirb.CFG(edges=None)[source]

Bases: MutableSet[Edge]

A control-flow graph for an IR. Vertices are CfgNodes, and edges may optionally contain Edge.Labels.

The graph may be viewed simply as a set of Edges. For convenience, the out_edges() and in_edges() methods provide access to the outgoing or incoming edges of individual nodes.

For efficency, only vertices with edges are guaranteed to be stored in this graph. If you want to find all vertices possible (that is, all CfgNodes), use IR.cfg_nodes() instead.

Internally, the graph is stored as a NetworkX instance, which can be accessed using nx(). This allows NetworkX’s large library of graph algorithms to be used on CFGs, if desired.

__contains__(edge)[source]
Return type:

bool

__init__(edges=None)[source]
__iter__()[source]
Return type:

typing.Iterator[gtirb.cfg.Edge]

__len__()[source]
Return type:

int

add(edge)[source]

Add an element.

Return type:

None

clear()[source]

This is slow (creates N new iterators!) but effective.

Return type:

None

deep_eq(other)[source]
Return type:

bool

discard(edge)[source]

Remove an element. Do not raise an exception if absent.

Return type:

None

in_edges(node)[source]
Return type:

typing.Iterator[gtirb.cfg.Edge]

nx()[source]
Return type:

networkx.classes.multidigraph.MultiDiGraph

out_edges(node)[source]
Return type:

typing.Iterator[gtirb.cfg.Edge]

update(edges)[source]
Return type:

None

class gtirb.CfgNode(uuid=None)[source]

Bases: Block

The base class for blocks that may appear as vertices in the CFG.

property incoming_edges: Iterable[Edge]

Get the edges that point to this CFG node.

property outgoing_edges: Iterable[Edge]

Get the edges that start at this CFG node.

class gtirb.CodeBlock(*, decode_mode=DecodeMode.Default, size=0, offset=0, uuid=None, byte_interval=None)[source]

Bases: ByteBlock, CfgNode

A basic block in the binary.

Does not directly store data bytes, which are kept in a ByteInterval.

Variables:

~.decode_mode – The decode mode of the block, used in some ISAs to differentiate between sub-ISAs (e.g. differentiating blocks written in ARM and Thumb).

class DecodeMode(value)[source]

Bases: Enum

Variations on decoding a particular ISA

Default = 0

Default decode mode for all ISAs

Thumb = 1

Thumb decode mode for ARM32

__init__(*, decode_mode=DecodeMode.Default, size=0, offset=0, uuid=None, byte_interval=None)[source]
Parameters:
  • size (int) – The length of the block in bytes.

  • decode_mode (gtirb.block.CodeBlock.DecodeMode) – The decode mode of the block, used in some ISAs to differentiate between sub-ISAs (e.g. differentiating blocks written in ARM and Thumb). Defaults to DecodeMode.Default.

  • offset (int) – The offset from the beginning of the byte interval to which this block belongs.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this CodeBlock, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • byte_interval (typing.Optional[gtirb.byteinterval.ByteInterval]) – The ByteInterval this block belongs to.

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property incoming_edges: Iterator[Edge]

Get the edges that point to this CFG node.

property outgoing_edges: Iterator[Edge]

Get the edges that start at this CFG node.

class gtirb.DataBlock(*, size=0, offset=0, uuid=None, byte_interval=None)[source]

Bases: ByteBlock

Represents a data object, possibly symbolic.

__init__(*, size=0, offset=0, uuid=None, byte_interval=None)[source]
Parameters:
  • size (int) – The size of the data object in bytes.

  • offset (int) – The offset from the beginning of the byte interval to which this block belongs.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this DataBlock, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • byte_interval (typing.Optional[gtirb.byteinterval.ByteInterval]) – The ByteInterval this block belongs to.

class gtirb.Edge(source: CfgNode, target: CfgNode, label: EdgeLabel | None = None)[source]

Bases: NamedTuple

An edge in the CFG from source to target, with optional control-flow details in label.

Variables:
  • ~.source – The source CFG node.

  • ~.target – The target CFG node.

  • ~.label – An optional label containing more control flow information.

Label

alias of EdgeLabel

Type

alias of EdgeType

static __new__(cls, source, target, label=None)[source]

Create new instance of NamedTuple(source, target, label)

Return type:

gtirb.cfg.Edge

__slots__ = ()
class gtirb.EdgeLabel(type: EdgeType, conditional: bool = False, direct: bool = True)[source]

Bases: tuple

Contains a more detailed description of a gtirb.Edge in the CFG.

Variables:
  • ~.conditional – When this edge is part of a conditional branch, conditional is True when the edge represents the control flow taken when the branch’s condition is met, and False when it represents the control flow taken when the branch’s condition is not met. Otherwise, it is always False.

  • ~.directTrue if the branch or call is direct, and False if it is indirect. If an edge is indirect, then all outgoing indirect edges represent the set of possible locations the edge may branch to. If there exists an indirect outgoing edge to a gtirb.ProxyBlock without any gtirb.Symbol objects referring to it, then the set of all possible branch locations is unknown.

  • ~.type – The type of control flow the gtirb.Edge represents.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

__slots__ = ()
conditional: bool

Alias for field number 1

direct: bool

Alias for field number 2

type: gtirb.cfg.EdgeType

Alias for field number 0

class gtirb.EdgeType(value)[source]

Bases: Enum

The type of control flow transfer indicated by a gtirb.Edge.

Branch = 0

This edge is the explicit target of a jump instruction. May be conditional or unconditional. If conditional, there will be a corresponding edge of type gtirb.Edge.Type.Fallthrough.

Call = 1

This edge is the explicit target of a call instruction. Unless the function does not return, there will also be a corresponding edge of type gtirb.Edge.Type.Fallthrough.

Fallthrough = 2

This edge represents two blocks executing in sequence. This occurs on the non-branching paths of conditional branch instructions, after call instructons have returned, and when two blocks have no control flow between them, but another gtirb.Edge targets the target block. If there exists a fallthrough edge from block A to block B, then A must immediately precede B in memory.

Return = 3

This edge represents a return from a function, generally via a return instruction. Return edges may either go to a symbolless gtirb.ProxyBlock, which indicates that the set of possible return targets is unknown, or there may be one return edge per return target, which indicates that the set of possible return targets if fully known.

Syscall = 4

This edge is the explicit target of a system call instruction. Unless the function does not return, there will also be a corresponding edge of type gtirb.Edge.Type.Fallthrough. This is the system call equivalent to gtirb.Edge.Type.Call.

Sysret = 5

This edge represents a return from a system call, generally via a return instruction. Return edges may either go to a symbolless gtirb.ProxyBlock, which indicates that the set of possible return targets is unknown, or there may be one return edge per return target, which indicates that the set of possible return targets if fully known. This is the system call equivalent to gtirb.Edge.Type.Return.

class gtirb.IR(*, modules=[], aux_data={}, cfg={}, version=4, uuid=None)[source]

Bases: AuxDataContainer

A complete internal representation consisting of multiple Modules.

Variables:
  • ~.modules – A list of Modules contained in the IR.

  • ~.cfg – The IR’s control flow graph.

  • ~.version – The Protobuf version of this IR.

__init__(*, modules=[], aux_data={}, cfg={}, version=4, uuid=None)[source]
Parameters:
  • modules (typing.Iterable[gtirb.module.Module]) – A list of Modules contained in the IR.

  • cfg (typing.Iterable[gtirb.cfg.Edge]) – A set of Edges representing the IR’s control flow graph. Defaults to being empty.

  • aux_data (typing.Union[typing.Mapping[str, gtirb.auxdata.AuxData], typing.Iterable[typing.Tuple[str, gtirb.auxdata.AuxData]]]) – The initial auxiliary data to be associated with the object, as a mapping from names to gtirb.AuxData. Defaults to being empty.

  • version (int) – The Protobuf version of this IR.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this IR, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

property byte_blocks: Iterator[ByteBlock]

The ByteBlocks in this IR.

byte_blocks_at(addrs)[source]

Finds all the byte blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

byte_blocks_on(addrs)[source]

Finds all the byte blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

property byte_intervals: Iterator[ByteInterval]

The ByteIntervals in this IR.

byte_intervals_at(addrs)[source]

Finds all the byte intervals that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

byte_intervals_on(addrs)[source]

Finds all the byte intervals that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

property cfg_nodes: Iterator[CfgNode]

The CfgNodes in this IR.

property code_blocks: Iterator[CodeBlock]

The CodeBlocks in this IR.

code_blocks_at(addrs)[source]

Finds all the code blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

code_blocks_on(addrs)[source]

Finds all the code blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

property data_blocks: Iterator[DataBlock]

The DataBlocks in this IR.

data_blocks_at(addrs)[source]

Finds all the data blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

data_blocks_on(addrs)[source]

Finds all the data blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

deep_eq(other)[source]

This overrides gtirb.Node.deep_eq() to check for AuxData equality.

Because the values stored by AuxData are not necessarily amenable to deep checking, the auxiliary data dictionaries stored for self and other are not deeply checked. Instead, they are considered to be equal if their sets of keys are equal.

Return type:

bool

get_by_uuid(uuid)[source]

Look up a node by its UUID.

This method will find any node currently attached to this IR. It will not find any nodes attached to other IRs, or not attached to any IR.

Parameters:

uuid (uuid.UUID) – The UUID to look up.

Return type:

typing.Optional[gtirb.node.Node]

Returns:

The Node this UUID corresponds to, or None if no node exists with that UUID.

static load_protobuf(file_name)[source]

Load IR from a Protobuf file at the specified path.

Parameters:

file_name – The path to the Protobuf file.

Returns:

A Python GTIRB IR object.

static load_protobuf_file(protobuf_file)[source]

Load IR from a Protobuf object.

Use this function when you have a Protobuf object already loaded, and you want to parse it as a GTIRB IR. If the Protobuf object is stored in a file, use gtirb.IR.load_protobuf() instead.

Parameters:

protobuf_file (typing.BinaryIO) – A byte stream encoding a GTIRB Protobuf message.

Return type:

gtirb.ir.IR

Returns:

An IR object representing the same information that is contained in protobuf_file.

modules_named(name)[source]

Find all modules with a given name

Return type:

typing.Iterator[gtirb.module.Module]

property proxy_blocks: Iterator[ProxyBlock]

The ProxyBlocks in this IR.

save_protobuf(file_name)[source]

Save self to a Protobuf file at the specified path.

Parameters:

file_name – The file path at which to save the Protobuf representation of self.

save_protobuf_file(protobuf_file)[source]

Save self to a Protobuf object.

Parameters:

protobuf_file (typing.BinaryIO) – The byte stream to write the GTIRB Protobuf message to.

Return type:

None

property sections: Iterator[Section]

The Sections in this IR.

sections_at(addrs)[source]

Finds all the sections that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.section.Section]

sections_on(addrs)[source]

Finds all the sections that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.section.Section]

symbolic_expressions_at(addrs)[source]

Finds all the symbolic expressions that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[typing.Tuple[gtirb.byteinterval.ByteInterval, int, gtirb.symbolicexpression.SymbolicExpression]]

Returns:

Yields (interval, offset, symexpr) tuples for every symbolic expression in the range.

property symbols: Iterator[Symbol]

The Symbols in this IR.

class gtirb.Module(*, name, aux_data={}, binary_path='', file_format=FileFormat.Undefined, isa=ISA.Undefined, byte_order=ByteOrder.Undefined, preferred_addr=0, proxies={}, rebase_delta=0, sections={}, symbols={}, entry_point=None, uuid=None, ir=None)[source]

Bases: AuxDataContainer

Represents a loadable object, such as an executable or library.

Variables:
  • ~.binary_path – The path to the loadable binary object represented by this module. An empty string if not specified. The file represented by this path is indicitave of what file this Module was initially created from; it is not guaranteed to currently exist or have the same contents.

  • ~.isa – The ISA of the binary.

  • ~.file_format – The file format of the binary.

  • ~.byte_order – The endianness of the binary.

  • ~.name – The name given to the binary. Some file formats use this for linking and/or symbol resolution purposes. The file name (without directory components) if not specified by the format.

  • ~.preferred_addr – The preferred loading address of the binary.

  • ~.proxies – A set containing all the gtirb.ProxyBlocks in the binary.

  • ~.rebase_delta – The rebase delta of the binary.

  • ~.sections – A set containing all the gtirb.Sections in the binary.

  • ~.symbols – A set containing all the gtirb.Symbols in the binary.

  • ~.entry_point – A CodeBlock representing where control flow of this module begins at, or None if not present.

class ByteOrder(value)[source]

Bases: Enum

Identifies the endianness of a gtirb.Module.

Big = 1

Big endian.

Little = 2

Little endian.

Undefined = 0

An unknown or uninitialized endianness.

class FileFormat(value)[source]

Bases: Enum

Identifies the executable file format of the binary represented by a gtirb.Module.

COFF = 1

The Common Object File Format.

ELF = 2

The Executable and Linkable Format, formerly the Extensible Linking Format.

IdaProDb32 = 4

A 32-bit IDA Pro database file.

IdaProDb64 = 5

A 64-bit IDA Pro database file.

MACHO = 7

A Mach object file.

PE = 3

Microsoft’s Portable Executable format.

RAW = 8

A raw binary file, with no file format.

Undefined = 0

A file format that has not yet been specified. This is for unitialized modules; do not use to refer to file formats without FileFormat values.

XCOFF = 6

The Extended Common Object File Format.

class ISA(value)[source]

Bases: Enum

Identifies the instruction set architecture (ISA) targeted by a gtirb.Module.

ARM = 4

The Acorn RISC Machine, 32-bit.

ARM64 = 7

The Acorn RISC Machine, 64-bit.

IA32 = 1

The 32-bit Intel Architecture. Also known as i386, x86, or x32.

MIPS32 = 8

Microprocessor without Interlocked Pipelined Stages, 32-bit.

MIPS64 = 9

Microprocessor without Interlocked Pipelined Stages, 64-bit.

PPC32 = 2

IBM’s 32-bit PowerPC (Performance Optimization with Enhanced RISC / Performance Computing) architecture.

PPC64 = 6

IBM’s 64-bit PowerPC (Performance Optimization with Enhanced RISC / Performance Computing) architecture.

Undefined = 0

An ISA that has not yet been specified. This is for unitialized modules; use gtirb.Module.ISA.ValidButUnsupported instead for specifying undefined ISAs.

ValidButUnsupported = 5

An unknown or undefined ISA.

X64 = 3

The 64-bit Intel Architecture. Also known as x86_64.

__init__(*, name, aux_data={}, binary_path='', file_format=FileFormat.Undefined, isa=ISA.Undefined, byte_order=ByteOrder.Undefined, preferred_addr=0, proxies={}, rebase_delta=0, sections={}, symbols={}, entry_point=None, uuid=None, ir=None)[source]
Parameters:
  • aux_data (typing.Union[typing.Mapping[str, gtirb.auxdata.AuxData], typing.Iterable[typing.Tuple[str, gtirb.auxdata.AuxData]]]) – The initial auxiliary data to be associated with the object, as a mapping from names to gtirb.AuxData, defaults to an empty dict.

  • binary_path (str) – The path to the loadable binary object represented by this module.

  • isa (gtirb.module.Module.ISA) – The ISA of the binary.

  • byte_order (gtirb.module.Module.ByteOrder) – The endianness of the binary.

  • file_format (gtirb.module.Module.FileFormat) – The file format of the binary.

  • name (str) – The name given to the binary.

  • preferred_addr (int) – The preferred loading address of the binary.

  • proxies (typing.Iterable[gtirb.block.ProxyBlock]) – A set containing all the gtirb.ProxyBlocks in the binary.

  • rebase_delta (int) – The rebase delta of the binary.

  • sections (typing.Iterable[gtirb.section.Section]) – A set containing all the gtirb.Sections in the binary.

  • symbols (typing.Iterable[gtirb.symbol.Symbol]) – A set containing all the gtirb.Symbols in the binary.

  • entry_point (typing.Optional[gtirb.block.CodeBlock]) – A CodeBlock representing where control flow of this module begins at, or None if not present.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this Module, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • ir (typing.Optional[gtirb.ir.IR]) – The IR this module belongs to.

property byte_blocks: Iterator[ByteBlock]

The ByteBlocks in this module.

byte_blocks_at(addrs)[source]

Finds all the byte blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

byte_blocks_on(addrs)[source]

Finds all the byte blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

property byte_intervals: Iterator[ByteInterval]

The ByteIntervals in this module.

byte_intervals_at(addrs)[source]

Finds all the byte intervals that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

byte_intervals_on(addrs)[source]

Finds all the byte intervals that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

property cfg_nodes: Iterator[CfgNode]

The CfgNodes in this module.

property code_blocks: Iterator[CodeBlock]

The CodeBlocks in this module.

code_blocks_at(addrs)[source]

Finds all the code blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

code_blocks_on(addrs)[source]

Finds all the code blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

property data_blocks: Iterator[DataBlock]

The DataBlocks in this module.

data_blocks_at(addrs)[source]

Finds all the data blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

data_blocks_on(addrs)[source]

Finds all the data blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

deep_eq(other)[source]

This overrides gtirb.Node.deep_eq() to check for AuxData equality.

Because the values stored by AuxData are not necessarily amenable to deep checking, the auxiliary data dictionaries stored for self and other are not deeply checked. Instead, they are considered to be equal if their sets of keys are equal.

Return type:

bool

property ir: IR | None

The IR this module belongs to.

sections_at(addrs)[source]

Finds all the sections that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.section.Section]

sections_on(addrs)[source]

Finds all the sections that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.section.Section]

symbolic_expressions_at(addrs)[source]

Finds all the symbolic expressions that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[typing.Tuple[gtirb.byteinterval.ByteInterval, int, gtirb.symbolicexpression.SymbolicExpression]]

Returns:

Yields (interval, offset, symexpr) tuples for every symbolic expression in the range.

symbols_named(name)[source]

Finds all symbols with a given name.

Return type:

typing.Iterator[gtirb.symbol.Symbol]

class gtirb.Node(uuid=None)[source]

Bases: object

A Node is any GTIRB object which can be referenced by UUID.

Variables:

~.uuid – The UUID of this Node.

__init__(uuid=None)[source]
Parameters:

uuid (typing.Optional[uuid.UUID]) – The UUID of this Node, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

class gtirb.Offset(element_id, displacement)[source]

Bases: NamedTuple

An Offset describes a location inside a gtirb.Node, such as a gtirb.DataBlock or gtirb.ByteInterval.

Variables:
  • ~.element_id – The gtirb.Node containing the location of interest.

  • ~.displacement – The offset inside the Node to point to.

class gtirb.ProxyBlock(*, uuid=None, module=None)[source]

Bases: CfgNode

A placeholder that serves as the endpoint (source or target) of a gtirb.Edge.

ProxyBlock objects allow the construction of CFG edges to or from another node. For example, a call to a function in another module may be represented by a gtirb.Edge that originates at the calling gtirb.CodeBlock and targets a ProxyBlock. Another example would be a gtirb.Edge that represents an indirect jump whose target is not known.

A ProxyBlock does not represent any instructions and so has neither an address nor a size.

__init__(*, uuid=None, module=None)[source]
Parameters:

uuid (typing.Optional[uuid.UUID]) – The UUID of this Node, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property incoming_edges: Iterator[Edge]

Get the edges that point to this CFG node.

property ir: IR | None

Get the IR this node ultimately belongs to.

property module: Module | None

Get the module this node ultimately belongs to.

property outgoing_edges: Iterator[Edge]

Get the edges that start at this CFG node.

class gtirb.Section(*, name='', byte_intervals=(), flags={}, uuid=None, module=None)[source]

Bases: Node

Represents a named section of the binary.

Does not directly store the contents of the section, which are kept in a gtirb.ImageByteMap.

Variables:
  • ~.name – The section name (E.g. “.text”, “.bss”, etc).

  • ~.byte_intervals – The ByteIntervals in this section.

  • ~.flags – The Section.Flags this section has.

class Flag(value)[source]

Bases: Enum

A flag representing a known property of a section.

Executable = 3

This section contains executable code.

Initialized = 5

This section has bytes allocated to it in the binary file.

Loaded = 4

This section is present in memory at runtime.

Readable = 1

This section can be read from at runtime.

ThreadLocal = 6

This section is created in memory once per thread.

Undefined = 0

This value is defined for Protobuf compatibility. Do not use.

Writable = 2

This section can be written to at runtime.

__init__(*, name='', byte_intervals=(), flags={}, uuid=None, module=None)[source]
Parameters:
property address: int | None

Get the address of this section, if known.

The address is calculated from the ByteInterval objects in this section. More specifically, if the address of all byte intervals in this section are fixed, then it will return the address of the interval lowest in memory. If any one interval does not have an address then this will be None, as the address is not calculable in that case. Note that a section with no intervals in it has no address or size, so it will be None in that case.

property byte_blocks: Iterator[ByteBlock]

The ByteBlocks in this section.

byte_blocks_at(addrs)[source]

Finds all the byte blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

byte_blocks_on(addrs)[source]

Finds all the byte blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.ByteBlock]

byte_intervals_at(addrs)[source]

Finds all the byte intervals that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

byte_intervals_on(addrs)[source]

Finds all the byte intervals that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.byteinterval.ByteInterval]

property code_blocks: Iterator[CodeBlock]

The CodeBlocks in this section.

code_blocks_at(addrs)[source]

Finds all the code blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

code_blocks_on(addrs)[source]

Finds all the code blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.CodeBlock]

property data_blocks: Iterator[DataBlock]

The DataBlocks in this section.

data_blocks_at(addrs)[source]

Finds all the data blocks that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

data_blocks_on(addrs)[source]

Finds all the data blocks that overlap an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[gtirb.block.DataBlock]

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property ir: IR | None

Get the IR this node ultimately belongs to.

property module: Module | None

The Module this section belongs to.

property size: int | None

Get the size of this section, if known.

The address is calculated from the ByteInterval objects in this section. More specifically, if the address of all byte intervals in this section are fixed, then it will return the difference between the lowest and highest address among the intervals. If any one interval does not have an address, then this will be None, as the size is not calculable in that case. Note that a section with no intervals in it has no address or size, so it will be None in that case.

symbolic_expressions_at(addrs)[source]

Finds all the symbolic expressions that begin at an address or range of addresses.

Parameters:

addrs (typing.Union[int, range]) – Either a range object or a single address.

Return type:

typing.Iterable[typing.Tuple[gtirb.byteinterval.ByteInterval, int, gtirb.symbolicexpression.SymbolicExpression]]

Returns:

Yields (interval, offset, symexpr) tuples for every symbolic expression in the range.

class gtirb.Serialization[source]

Bases: object

Manages codecs used to serialize and deserialize GTIRB objects.

The gtirb.Serialization.decode() method of gtirb.AuxData.serializer is called when GTIRB AuxData is loaded via gtirb.IR.load_protobuf(), and the gtirb.Serialization.encode() method of gtirb.AuxData.serializer is called when GTIRB AuxData is saved to file via gtirb.IR.save_protobuf(). You can alter the encoding and decoding of AuxData values via gtirb.Serialization.codecs. To do this, create a new subclass of gtirb.serialization.Codec and add it to gtirb.Serialization.codecs:

>>> gtirb.AuxData.serializer.codecs['my_custom_type'] = MyCustomCodec

This example registers a new type name, my_custom_type, and associate it with a new codec, MyCustomCodec.

Variables:

~.codecs – A mapping of type names to codecs. Codecs can be added or overridden using this dictionary.

__init__()[source]

Initialize with the built-in gtirb.serialization.Codec subclasses.

decode(raw_bytes, type_name, get_by_uuid=None)[source]

Decode a gtirb.AuxData of the specified type from the specified byte stream.

Parameters:
  • raw_bytes (typing.Union[bytes, bytearray, memoryview, typing.BinaryIO]) – The byte stream from which to read the encoded value.

  • type_name (str) – The type name of the object encoded by raw_bytes.

  • get_by_uuid (typing.Optional[typing.Callable[[uuid.UUID], typing.Optional[gtirb.node.Node]]]) – A function to look up nodes by UUID.

Return type:

object

Returns:

The object encoded by raw_bytes.

encode(out, val, type_name)[source]

Encodes the value of an AuxData value to bytes.

Parameters:
  • out (typing.BinaryIO) – A binary stream to write bytes to.

  • val (object) – The gtirb.AuxData to encode.

  • type_name (str) – The type name of the value encapsulated by the gtirb.AuxData.

Return type:

None

class gtirb.SymAddrAddr(scale, offset, symbol1, symbol2, attributes={})[source]

Bases: SymbolicExpression

Represents a symbolic expression of the form “(Sym1 - Sym2) / Scale + Offset”.

Variables:
  • ~.scale – Constant scale factor.

  • ~.offset – Constant offset.

  • ~.symbol1 – Symbol representing the base address.

  • ~.symbol2 – Symbol to subtract from symbol1.

__eq__(other)[source]

Return self==value.

Return type:

bool

__hash__()[source]

Return hash(self).

Return type:

int

__init__(scale, offset, symbol1, symbol2, attributes={})[source]
Parameters:
deep_eq(other)[source]
Return type:

bool

property symbols: Iterable[Symbol]

Get all the symbols involved with this symbolic expression, regardless of role.

class gtirb.SymAddrConst(offset, symbol, attributes={})[source]

Bases: SymbolicExpression

Represents a symbolic expression of the form “Sym + Offset”.

Variables:
  • ~.offset – Constant offset.

  • ~.symbol – Symbol representing an address.

__eq__(other)[source]

Return self==value.

Return type:

bool

__hash__()[source]

Return hash(self).

Return type:

int

__init__(offset, symbol, attributes={})[source]
Parameters:
deep_eq(other)[source]
Return type:

bool

property symbols: Iterable[Symbol]

Get all the symbols involved with this symbolic expression, regardless of role.

class gtirb.Symbol(name, uuid=None, payload=None, at_end=False, module=None)[source]

Bases: Node

Represents a symbol, which maps a name to an object in the IR.

Variables:
  • ~.name – The name of this symbol.

  • ~.at_end – True if this symbol is at the end of its referent, rather than at the beginning. Has no meaning for integral symbols.

__init__(name, uuid=None, payload=None, at_end=False, module=None)[source]
Parameters:
  • name (str) – The name of this symbol.

  • uuid (typing.Optional[uuid.UUID]) – The UUID of this Symbol, or None if a new UUID needs generated via uuid.uuid4(). Defaults to None.

  • payload (typing.Union[gtirb.block.Block, int, None]) – The value this symbol points to. May be an address, a Node, or None.

  • at_end (bool) – True if this symbol is at the end of its referent, rather than at the beginning.

  • module (typing.Optional[gtirb.module.Module]) – The Module this symbol belongs to.

deep_eq(other)[source]

Check: is self structurally equal to other?

This method should be used only when deep structural equality checks are actually needed, and not for all equality checks. Typically the default implmentation of __eq__, which checks pointer equality, is sufficient; Nodes are cached such that references to two Nodes with the same UUID refer to the same exact object. Use this method when you have manually constructed Nodes that may share the same UUID despite being different objects, and you need to check for structural equality.

Return type:

bool

property ir: IR | None

Get the IR this node ultimately belongs to.

property module: Module | None
name

A descriptor that will notify a parent when the value is set and can be otherwise used like a normal attribute.

property referent: Block | None

The object referred to by a Symbol, which is Block or None. value and referent are mutually exclusive.

property value: int | None

The value of a Symbol, which is an integer or None. value and referent are mutually exclusive.

class gtirb.SymbolicExpression(attributes={})[source]

Bases: object

Base class of symbolic expression types.

class Attribute(value)[source]

Bases: Enum

Attributes representing a known property of a symbolic expression. See https://grammatech.github.io/gtirb/md__symbolic_expression.html

ABS = 2015
CALL = 22
DISP = 3003
DTPMOD = 19
DTPOFF = 17
DTPREL = 16
G0 = 2001
G1 = 2002
G2 = 2003
G3 = 2004
GOT = 0
GOTNTPOFF = 1000
GOTOFF = 2
GOTPC = 1
GOTREL = 3
GPREL = 3002
H = 4000
HA = 4002
HI = 24
HI12 = 2010
HI16 = 3000
HI21 = 2011
HIGH = 4003
HIGHA = 4004
HIGHER = 25
HIGHERA = 4005
HIGHEST = 26
HIGHESTA = 4006
INDNTPOFF = 1001
L = 4001
LO = 23
LO12 = 2007
LO14 = 2009
LO15 = 2008
LO16 = 3001
LOWER16 = 2006
NC = 2014
NOTOC = 4009
NTPOFF = 18
OFST = 3004
PAGE = 20
PAGEOFF = 21
PCREL = 6
PG = 2013
PLT = 4
PLTOFF = 5
PREL = 2016
PREL31 = 2017
S = 2012
SBREL = 2020
SECREL = 7
TARGET1 = 2018
TARGET2 = 2019
TLS = 8
TLSCALL = 12
TLSDESC = 13
TLSGD = 9
TLSLD = 10
TLSLDM = 11
TLSLDO = 2021
TOC = 4008
TOCBASE = 4007
TPOFF = 15
TPREL = 14
UPPER16 = 2005
__int__()[source]
Return type:

int

__init__(attributes={})[source]
deep_eq(other)[source]
Return type:

bool

property symbols: Iterable[Symbol]

Get all the symbols involved with this symbolic expression, regardless of role.

class gtirb.Variant(index, val)[source]

Bases: object

__eq__(other)[source]

Return self==value.

Return type:

bool

__hash__ = None
__init__(index, val)[source]