code_inference

This module computes the valid instructions organized in blocks of code. It infers the facts code_in_block, block, and code.

Potential code inference failures are recorded in the block_still_overlap and interval_schedule_tie relations.

The disassembly consists of several phases:

  • Traverse backward from invalid instructions through “must” edges

  • Traverse forward from “may” edges from all possible instructions to generate a superset of all possible code blocks in code_in_block_candidate. This traversal has a recursive component and a linear-sweep component.

  • Generate data block candidates using references from code block candidates

  • Split common block tails to ensure code blocks are non-overlapping in code_in_block_candidate_refined

  • Populate known_block based on binary metadata and propagate forward through “must” edges

  • Populate impossible_block with blocks which overlap known_blocks, and propagate backward through “must” edges

  • Identify remaining overlapping blocks in unresolved_block and assign points in block_points and block_points_proportional according to heuristics.

  • Resolve blocks with different boundaries using weighted interval scheduling, which weights blocks according to points and binary coverage

  • Resolve blocks with identical boundaries according to their points.

code_in_block(EA:address, Block:address)

The instruction at address ‘EA’ belong to the block ‘Block’. The block address is the address of the first instruction in the block. These are final addresses where the code is located. They are organized in blocks of sequential code and the block identifier is the address of the first instruction in the block.

block(Block:address)

There is a code block starting at ‘Block’.

code(EA:address)

The instruction at address ‘EA’ is code.

WARNING: Predicate not present in compiled Datalog program (Dead Code)

block_last_instruction(Block:address, EA:address)

The last instruction of ‘Block’ is at address ‘EA’

block_boundaries(block:address, BegAddr:address, EndAddr:address)

The block defined by address ‘Block’ extends from ‘BegAddr’ to ‘EndAddr’. In most cases ‘BegAddr’ and ‘Block’ coincide, but there are exceptions such as ARM 32 Thumb code blocks.

overlapping_instruction(EA:address, EA2: address)

Predicate to capture overlapping instructions. Certain overlapping instructions are permitted and do not generate block conflicts. Those overlapping instructions are captured by this predicate.

data_in_code(Begin:address, End:address)

There is a segment of data in a code section that spans from ‘Begin’ to ‘End’.

block_still_overlap(Block1:address, Type1:block_type, Size1:unsigned, Block2:address, Type2:block_type, Size2:unsigned)

This predicate if there are still overlapping blocks after the code inference. In that case the analysis has failed to resolve all conflicts.

may_fallthrough(From:address, To:address)

Instruction at address From might fallthrough to instruction at address To.

must_fallthrough(From:address, To:address)

Instruction at address From must fallthrough to instruction at address To.

likely_fallthrough(From:address, To:address)

This predicate is a special case of may_fallthrough. must_fallthrough is very conservative and does not include things like conditional jumps. However, conditional jumps can almost always fallthrough (unless there is obfuscation). likely_fallthrough captures those instructions.

may_have_symbolic_immediate(src:address, dest:address)

unlikely_have_symbolic_immediate(EA:address)

invalid(EA:address, Type:symbol)

max_instruction_size(Size:unsigned)

Largest instruction in the binary

possible_ea(EA:address)

WARNING: Predicate not present in compiled Datalog program (Dead Code)

known_code(EA:address)

The address EA is already known to be code in the input GTIRB.

basic_target(ea:address)

Starting addresses for code exploration.

block_limit(EA:address)

possible_target_from(dest:address, src:address)

possible_target(Target:address)

code_in_block_candidate(EA:address, EA_block:address)

aligned_address_in_data(EA:address, Value:address)

Aligned address shown in data

padding_block_limit(Limit:address)

Addresses where the propagation of padding blocks should be limited.

Similar to (and a superset of) block_limit for code blocks.

nop_in_padding_candidate(EA:address, Start:address)

Build padding candidates from nop instructions

This is analagous to code_in_block_candidate, but for blocks of type “padding”.

padding_block_candidate(EA:address, Size:unsigned)

Padding block candidates can help fill in space when evaluating overlapping models.

after_end(EA:address, End:address)

common_tail(EA:address)

code_in_block_candidate_refined(EA:address, Block:address)

composite_data_access(EA:address, EA_load:address, Data:address, Size:unsigned)

A pair of instructions form an access to data with a predetermined size and location

data_block_candidate(Block:address, Size:unsigned)

A data block candidate in a code section. The candidate is located at address ‘Block’ and has ‘Size’ bytes.

block_candidate_boundaries(Block:address, Type:block_type, BegAddr:address, EndAddr:address)

The block candidate defined by address ‘Block’ extends from ‘BegAddr’ to ‘EndAddr’. In most cases ‘BegAddr’ and ‘Block’ coincide, but there are exceptions such as ARM 32 Thumb code blocks. These block candidates can be code blocks or data blocks.

multiple_fallthrough_to(Block:address)

WARNING: Predicate not present in compiled Datalog program (Dead Code)

block_overlap(Block1:address, Type1:block_type, Size1:unsigned, Block2:address, Type2:block_type, Size2:unsigned)

Two blocks overlap.

block_candidate_dependency_edge(BlockA:address, TypeA:block_type, SizeA:unsigned, BlockB:address, TypeB:block_type, SizeB:unsigned)

The block BlockA,TypeA implies the existence of of BlockB,TypeB

block_implies_block(BlockA:address, TypeA:block_type, SizeA:unsigned, BlockB:address, TypeB:block_type, SizeB:unsigned)

If BlockA exists, BlockB must also exist.

known_block(Block:address, Type:block_type, Size:unsigned, Reason:symbol)

Known blocks are blocks that must exist due to metadata.

impossible_block(Block:address, Type:block_type, Size:unsigned, Reason:symbol)

Logically impossible blocks.

unresolved_block(Block:address, Type:block_type, Size:unsigned)

Blocks that are still overlapping after known_block and impossible_block are determined.

unresolved_block_overlap(Block1:address, Type1:block_type, Size1:unsigned, Block2:address, Type2:block_type, Size2:unsigned)

Blocks that still overlap after known_block and impossible_block are determined.

unresolved_interval(Start:address, End:address)

Valid unresolved intervals

unresolved_interval_order(ID:unsigned, Start:address, End:address)

Sort intervals by end address

block_type_priority(Type:block_type, Priority:unsigned)

unresolved_interval_best_block(ID:unsigned, Block:address, Type:block_type, Size:unsigned, Weight:unsigned)

Select best block in an unresolved interval and calculate weight

wis_has_prior(I:unsigned, Count:unsigned)

Weighted interval schedule: Count of intervals that end prior to the start of interval I.

Only calculated for intervals that have prior intervals.

wis_prior(I:unsigned, Count:unsigned)

wis_prior extends wis_has_prior, including intervals that have no prior intervals (i.e., with Count of 0).

wis_memo(I:unsigned, Value:unsigned, Pred:unsigned)

Weighted interval schedule: memoized subproblems

Value is the optimized total weight for intervals [1:I].

Value is the total weight for selected intervals, and Pred (i.e., the Predecessor) is the previous entry in wis_memo, later used to reconstruct the optimal solution.

interval_schedule_tie(BlockA:address, TypeA:block_type, SizeA:unsigned, BlockB:address, TypeB:block_type, SizeB:unsigned)

Blocks A and B had equal weights during interval scheduling, and Block A was selected.

wis_schedule(Interval:unsigned)

Weighted interval schedule: reconstruct schedule of selected intervals

wis_schedule_iter(Iter:unsigned)

block_total_points(Block:address, Type:block_type, Size:unsigned, Points:number)

discarded_block(Block:address, Type:block_type, Size:unsigned, Reason:symbol, BlockPropagated:address)

Block: block to be discarded BlockPropagated (for debugging purpose):

  • previous block that the reason is propagated from (for “propagated”)

  • block that this block overlapped with (for “first instruction overlap”)

candidate_block_is_not_padding(Block:address)

candidate_block_is_padding(Block:address)

block_heuristic(Block:address, Type:block_type, Size:unsigned, Instance:address, HeuristicName:symbol)

block_heuristic defines properties of blocks that will receive points. This predicate uniquely identifies a block and the ‘HeuristicName’ property associated to it.

Some heuristics can be assigned to a block multiple times, this is made possible by the ‘Instance’ field. E.g. if we give points for specific instructions in the block or for each block predecessor. For heuristics that can only be applied once per block, the instance field is always 0.

negative_block_heuristic(Block:address, Type:block_type, Size:unsigned, Instance:address, HeuristicName:symbol)

Negative block heuristics are collected in their own relation.

This allows any block with negative_block_heuristic to be marked as unresolved_block without causing cyclic negation.

block_points(Block:address, Type:block_type, Size:unsigned, Instance:address, Weight:number, HeuristicName:symbol)

A block defined by its address ‘Block’, ‘Type’, and ‘Size’ has been assigned a heuristic ‘HeuristicName’ with weight ‘Weight’. The ‘Instance’ field can be used to assign a single heuristic to a block multiple times, see block_heuristics. The heuristic weights are determined by the heuristic_weight predicate.

block_points_proportional(Block:address, Type:block_type, Size:unsigned, WeightMult:number, HeuristicName:symbol)

This predicate specifies heuristics whose weight is proportional to the block size. A block defined by its address ‘Block’, ‘Type’, and ‘Size’ has been assigned a heuristic ‘HeuristicName’

with weight ‘WeightMult’*’Size’.

resolved_reaches(Block:address, Size:unsigned, Type:symbol)

Points for reachability

Give points to blocks that are reachable from a non-overlapping block. We distinguish two kinds of reachability “strong” and “weak” which get different amount of points. “weak” reachability is for may fallthroughs that have higher uncertainty such as function calls.

data_in_code_propagate(Current:address, Initial:address, EndByteInterval:address)

Auxiliary predicate to compute data_in_code. It propagates through the holes between code blocks and between the last code block of a byte interval and the end of the interval

first_block_in_byte_interval(Begin:address, End:address, BlockBeg:address)

Auxiliary predicate to compute data_in_code. BlockBeg contains the address of the first code block in the byte interval defined by the addresses [Begin,End). If a code byte interval has no code, no predicate will be generated.

next_block_in_byte_interval(Block:address, NextBlock:address)