Next: Clang C Tokenizer, Previous: Multi-objective Fitness, Up: Components [Contents][Index]
The Style Features component offers the ability to extract a set of features from a software object into a feature vector. We use the Code Stylometry Feature Set (CSFS) described in De-anonymizing Programmers via Code Stylometry at https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-caliskan-islam.pdf).
Extracted feature vectors can be used as fitness vectors with the lexicase evolution strategy. One application is to drive evolution towards solutions which better match the features of the surrounding source code.
API support for style features is documented in the entries for classes
sel/sw/styleable:style-feature
,
sel/sw/styleable:styleable
, and
sel/sw/styleable:style-project
. We provide a brief
overview here.
To extract the set of feature vectors
from a software
object use
extract-features
,
providing a software object and a list of feature extractor functions.
Each feature extractor function is expected to operate on a clang
object, and return a feature vector containing the values for that feature.
Function extract-features
returns one large feature vector that
is the result of concatenating all of these vectors in order.
The SEL API provides several AST-related feature extractors for clang software objects, i.e., features derived from properties of a clang AST. The available feature extractors are:
*feature-extractors*
.
ast-node-type-tf-extractor | number of nodes of each different ast-class (e.g., IfStmt or DeclStmt) in the AST. |
max-depth-ast-extractor | maximum depth of any node in the AST. |
avg-depth-ast-extractor | the average depth of nodes in the AST. |
ast-full-stmt-bi-grams-extractor | the number of occurrences of each ast-class bi-gram for full statements in the AST. |
ast-bi-grams-extractor | the number of occurrences of each ast-class bi-gram in the AST. |
ast-keyword-tf-extractor | for each C keyword, the number of occurrences of that keyword in the AST. |
By convention, feature extractor functions have names ending in “-extractor”.
Next: Clang C Tokenizer, Previous: Multi-objective Fitness, Up: Components [Contents][Index]