symbolization

This module performs symbolization. It uses the results of several analysis:

-use_def -value -data_access

Part of symbolization is pointer reattribution, that is detecting cases where a number is the result of a symbol+constant. This is done in an independent module ‘pointer_reattribution’ which also uses the results of these analyses.

The data symbolization itself uses the following heuristics:

  • address_array: whether we have potential symbols evenly spaced. The more symbols the less likely they are all value collisions. We require at least 3 symbols evenly spaced to consider it an array.

  • preferred_data_access and data_access_patterns (from the data_access analysis): - if an address is accessed with the size of the pointer, it is more likely to be a pointer. - if an address is accessed with a size other than the size of the pointers, it is almost

    certainly not a pointer.

  • strings: if we have a pointer candidate in what seems to be a string, it is less likely to be a pointer.

  • aligned location: if a pointer candidate is aligned, it is more likely to be a pointer. Compilers usually (but not always) store pointers aligned.

This module also computes and symbol_minus_symbol.

symbolic_operand(EA:address, Index:operand_index, Value:address, Type:symbol)

Instruction at address ‘EA’ has a symbolic operand with value ‘Value’. ‘Value’ is given as an address. The field ‘Index’ identifies which operand is symbolic and ‘Type’ specifies if the target is “data” or “code”. This predicate only supports symbolic expressions with one symbol and no offset. For symbolic operands with offset see moved_label.

symbolic_data(EA:address, Size:unsigned, Value:address)

There is a symbolic expression in the data at address ‘EA’ of size ‘Size’ pointing to ‘Value’. ‘Value’ is given as an address. This predicate only supports symbolic expressions with one symbol and no offset. For symbolic expressions in data with offset see moved_data_label and for symbol-symbol expressions see symbol_minus_symbol.

symbol_minus_symbol(EA:address, Size:unsigned, Symbol1:address, Symbol2:address, Scale:unsigned, Offset:number)

There is a symbolic expression in data at address ‘EA’ of size ‘Size’ of the form:

’(Symbol2-Symbol1)*Scale+Offset’

Both symbols are given as addresses.

symbolic_expr_from_relocation(EA:address, Size:unsigned, Symbol:symbol, Offset:number, TargetEA:address)

There is a symbolic expression at address ‘EA’ of size ‘Size’ of the form:

Symbol+Offset

This symbolic expression corresponds to a relocation and the symbol

is referenced by name.

symbol_minus_symbol_from_relocation(EA:address, Size:unsigned, Symbol1:symbol, Symbol2:symbol, Scale:unsigned, Offset:number)

There is a symbolic expression at address ‘EA’ of size ‘Size’ of the form:

(Symbol2-Symbol1)*Scale+Offset

This symbolic expression corresponds to a relocation and the symbol is referenced by name.

symbolic_expr(EA:address, Size:unsigned, Symbol:symbol, Offset:number)

There is a symbolic expression at address ‘EA’ of size ‘Size’ of the form:

’Symbol+Offset’

In contrast to symbolic_operand and symbolic_data, the symbol in this predicate is referred by name. This allows us to include symbolic expressions from relocations and to choose between multiple symbols at the same location. This predicate captures all symbolic expressions from symbolic_operand, moved_label, symbolic_data, moved_data_label, and symbolic_expr_from_relocation.

symbolic_expr_symbol_minus_symbol(EA:address, Size:unsigned, Symbol:symbol, Symbol2:symbol, Scale:unsigned, Offset:number)

There is a symbolic expression at address ‘EA’ of size ‘Size’ of the form:

’(Symbol2-Symbol1)*Scale+Offset’

The symbols in this predicate are referred by name.

symbolic_operand_attribute(EA:address, Index:unsigned, Attribute:symbol)

The symbolic operand at address ‘EA’ and ‘Index’ has a symbolic expression attribute ‘Attribute’. Note that some attributes may be inferred but not used, if the corresponding symbolic_operand is not selected.

symbolic_expr_attribute(ea:address, attribute:symbol)

The symbolic expression at address ‘EA’ has a symbolic expression attribute ‘Attribute’.

code_pointer_in_data(EA:address, Val:address)

There is a symbolic expression in data at address ‘EA’ pointing to a code block at address ‘Val’.

labeled_ea(Ea:address)

The address ‘Ea’ needs to be labeled so it can be referred in symbolic expressions.

data_object_boundary(EA:address)

bss_data(ea:address)

symbolic_operand_candidate(ea:address, operand_index:operand_index, Dest:address, Type:symbol)

symbolic_operand_point(ea:address, operand_index:operand_index, points:number, why:symbol)

symbolic_operand_total_points(ea:address, operand_index:operand_index, points:number)

WARNING: Predicate not present in compiled Datalog program (Dead Code)

labeled_data_candidate(EA:address)

symbol_minus_symbol_candidate(EA:address, Size:unsigned, Symbol1:address, Symbol2:address, Scale:unsigned, Offset:number)

A candidate for a symbol-symbol in data (includes jump tables and other relative symbols)

address_in_data_is_printable(EA:address)

The address appearing at ‘EA’ is within a potential ascii_string and therefore more likely to be spurious.

WARNING: Predicate not present in compiled Datalog program (Dead Code)

address_in_data_refined(EA:address, Val:address)

string(EA:address, End:address, Encoding:symbol)

Data-object analysis for string encodings.

Possible string of some ‘Encoding’ at interval [‘EA’,’End’).

string_candidate(EA:address, End:address, Encoding:symbol)

string_candidate_refined(EA:address, End:address, Encoding:symbol)

data_object_candidate(ea:address, size:unsigned, type:symbol)

data_object_point(ea:address, size:unsigned, type:symbol, points:number, why:symbol)

data_object_conflict(ea:address, size:unsigned, type:symbol, ea2:address, size2:unsigned, type2:symbol)

discarded_data_object(ea:address, size:unsigned, type:symbol)

data_object(ea:address, size:unsigned, type:symbol)

after_address_in_data(EA:address, EA_next:address)

next_address_in_data(EA:address, EA_next:address)

address_array_aux(EA:address, Distance:unsigned, type:symbol, InitialEA:address)

Auxiliary predicate to compute address_array.

address_array(EA:address, Distance:unsigned, InitialEA:address)

This predicate is used for the symbolization heuristics. The pointer candidate at address ‘EA’ belongs to a sequence of evenly spaced pointer candidates starting at address ‘InitialEA’. The space between pointers is ‘Distance’. This sequence has at least three pointers. All the pointers in a sequence either point to code or to the same data_segment.

label_conflict(EA:address, Size:unsigned, Kind:symbol)

data_object_total_points(EA:address, Size:unsigned, Type:symbol, Points:number)