Next: Templates, Previous: cl-tree-sitter Setup, Up: Source Code with tree-sitter
[Contents][Index]
The classes generated for tree-sitter use the rules stored in each language’s grammar file to enable implicit source text reproduction at the class level. This makes working with and mutating the AST much simpler. As an example, if an ’if’ statement AST without an ’else’ clause has an ’else’clause added to it, the source text of the AST will reflect that an ’else’clause has been added to it without needing to make any other updates. (Prior to structured text, slots holding connective white-space and punctuation required manual updates to accompany most changes to the content of an AST.)
Each class that is generated can have multiple subclasses which represent the different representations of source text that the base class can take. For example, the update expression in C represents both the pre-increment and post-increment. Two subclasses are generated to disambiguate between the source text representations–one for pre-increment and one for post-increment.
Frequently, these subclass ASTs can be copied with slight modifications to their slot values. This can leave the AST copy in an invalid state for the subclass it had been copied from. When this is detected, the AST’s class will be changed dynamically to the first subclass of the base class which can successfully produce source text with the given slot values. This behavior also applies to objects created with the base class, but it may choose a subclass that’s source text is not the desired representation, so it’s best to specify the exact subclass in case where this matters, such as update expressions in C.
Structured text ASTs contain at least 4 slots which help store information
that isn’t implicit to the AST or its parent ast:
::
stores text that directly precedes the AST but is not part
of the rule associated with the AST. This is generally
whitespace. This slot is preferred over the after-text
slot when creating ASTs from a string with #'convert
.
::
stores text that directly procedes the AST but is not part
of the rule associated with the AST. This is generally
whitespace. This slot is preferred when a terminal token
directly follows the AST which does not have a before-text
slot due to being implicit source text.
::
stores comment and error ASTs that occur before the AST
and before the contents of the before-text slot. The
contents of this slot are considered children of the parent
AST. This slot is preferred over the after-text slot when
creating ASTs from a string with #’convert.
::
stores comment and error ASTs that occur before the AST
and after the contents of the after-text slot. The
contents of this slot are considered children of the parent
AST. This slot is preferred when a terminal token
directly follows the AST which does not have a before-text
slot due to being implicit source text.
::
store ASTs which are between two terminal tokens
which are implicit source text. This slot can contain
comment, error and inner-whitespace ASTs.
The internal-asts slots are generated based on the rule associated with the AST. Any possible place in the rule where two terminal tokens can appear consecutively, an internal-asts slot is placed.
A further ’text’ slot is also used for a subset of ASTs that are known
computed-text ASTs. These ASTs hold information that is variable and must
be computed and stored when the AST is created. The ASTs that are computed
text can be identified by computed-text-node-p
.
When creating ASTs, patch-whitespace
can be used to insert whitespace in
relevant places. This utilizes whitespace-between
to determine how much
whitespace should be placed in each slot. This currently does not populate
inner-asts whitespace.
Next: Templates, Previous: cl-tree-sitter Setup, Up: Source Code with tree-sitter
[Contents][Index]