pointer_reattribution

Compilers sometimes generate expressions of the form symbol+constant. It can happen that such expression falls:

  1. in the middle of a pointer (or the middle of a symbol if we know its size)

  2. in the middle of an instruction (only for programs with overlapping instructions)

  3. outside the data sections or

  4. on a data section that is different from the one of the symbols.

We want to detect those cases and generate the adequate symbol+constant.

We generate two predicates:

-moved_data_label

for pointers in data sections

-moved_label

for pointers in code sections

We only ‘move’ pointers in data sections if their destination falls in the middle of another pointer, symbol or instruction (cases 1 and 2).

In code sections we consider the three possibilities.

In addition, we distinguish three types:

P) the pointer is a pc-relative operand (and does not access memory i.e. LEA). pc-relative operands should always be symbolic, we just need to find the best candidate.

D) the pointer appears as an displacement in an indirect operand. For indirect operands, we know they are being used to access memory. This makes them more likely to be symbolic. They cannot be a float for example. But they could still be a constant. We make the displacement symbolic if we can “prove” that the registers used cannot contain a base address (so the displacement should contain a base address).

I) the pointer appears as an immediate operand. For immediate operands, they are likely to be symbolic if they are used to compute an address or compared to an address. We specifically detect cases where immediates are used to initialized loop counters or as loop bounds.

moved_data_label(EA:address, Size:unsigned, Dest:address, NewDest:address)

A symbolic expression at address ‘EA’ pointing to ‘Dest’ should use a symbol pointing to ‘NewDest’ plus an offset . The offset is NewDest-Dest.

moved_label(EA:address, Index:operand_index, Dest:address, NewDest:address)

A symbolic operand at address ‘EA’ with index ‘Index’ and pointing to ‘Dest’ should use a symbol pointing to ‘NewDest’ plus an offset. The offset is NewDest-Dest.

boundary_sym_expr(EA:address, Dest:address)

The symbolic expression at address ‘EA’ pointing to ‘Dest’ should point to an ‘at-end’ symbol.

moved_label_class(EA:address, Index:operand_index, Reason:symbol)

cmp_reg_to_reg(EA:address, Reg1:register, Reg2:register)

Instruction at address ‘EA’ compares registers ‘Reg1’ and ‘Reg2’.

dest_enlarged_data_section(EA:address, Reg:register, NewDest:address, Beg:address, End:address, OldBeg:address, OldEnd:address)

Auxiliary predicate to compute moved_label. This predicate detects that the register ‘Reg’ at address ‘EA’ is a loop counter iterating over data in a section at address [OldBeg,OldEnd). Based on the multiplier of the loop, we compute and extended area [Beg,End). If we find pointers to that extended area related to the same loop, we will move them to the NewDest.

addr_outside_section_used_for_memory_access(EA:address, Reg:register, Addr:address, AddrAccessed:address)

Auxiliary predicate to compute moved_label. This predicate detects an address loaded into a register that falls outside a data section, but it is ultimately used to access the data section. This is typically the case for the initialization of loop counters when these are pre-incremented.

The address ‘EA’ is where address ‘Addr’ is loaded into the register ‘Reg’. Then that register is used to access ‘AddrAccessed’ at a later point (‘EA_access’).

E.g.

mov RAX, Addr // EA_from

loop:

sub RAX, 4 mov RBX, [RAX] // EA_access accesses AddrAccessed = Addr - 4 …

moved_pc_relative_candidate(EA:address, Index:operand_index, Val:address, NewVal:address, Distance:unsigned)

A moved_label candidate for an instruction that has a pc-relative memory computation.

moved_displacement_candidate(EA:address, Op_index:operand_index, Dest:address, NewDest:address, Distance:unsigned)

A moved_label candidate for an instruction that has an indirect access (non pc-relative) where the displacement should be symbolic.

moved_immediate_candidate(EA:address, Op_index:operand_index, Immediate:address, New_immmediate:address, Distance:unsigned)

A moved_label candidate for an instruction that has an immediate.

moved_label_candidate(EA:address, Op_index:operand_index, Dest:address, NewDest:address, Priority:unsigned)

Auxiliary predicate to decide which moved_label should be taken for a given address. This is decided based on the ‘Priority’. Lower numbers indicate higher priority.