feat: Support bundling Python docstrings on attributes #589

Closed
opened 2025-09-15 15:21:44 +02:00 by wetneb · 1 comment
Owner

In Python, documentation comments are generally added as "docstrings":

class MyClass:
    """
    Documentation of the class
    """

    attribute = None
    """
    Documentation of the attribute
    """

    def method(self):
         """
         Documentation of the method
         """
         return True

Those "comments" are actual string literals that get pulled in as __doc__ attributes by Python.
Because those string literals aren't marked as extra by the tree-sitter grammar, the bundling algorithm does not apply to them. In particular, doc strings for class attributes aren't bundled. This is less of a problem for doc strings for classes or methods, since they already appear in the bloc that defines them, so there is no need for any bundling.

To get the doc strings to be bundled with the attributes, I can imagine multiple approaches:

In Python, documentation comments are generally added as "docstrings": ```python class MyClass: """ Documentation of the class """ attribute = None """ Documentation of the attribute """ def method(self): """ Documentation of the method """ return True ``` Those "comments" are actual string literals that get pulled in as `__doc__` attributes by Python. Because those string literals aren't marked as `extra` by the tree-sitter grammar, the bundling algorithm does not apply to them. In particular, doc strings for class attributes aren't bundled. This is less of a problem for doc strings for classes or methods, since they already appear in the bloc that defines them, so there is no need for any bundling. To get the doc strings to be bundled with the attributes, I can imagine multiple approaches: * extend the LangProfile to contain a list of node types (or tree-sitter query) defining additional nodes to be treated as bundle-able * change the Python grammar to do this bundling directly. There seems to be interest in adding special handling for docstrings from other grammar users: * https://github.com/tree-sitter/tree-sitter-python/issues/168 * https://github.com/tree-sitter/tree-sitter-python/issues/80 * https://github.com/tree-sitter/tree-sitter-python/issues/77 * https://github.com/tree-sitter/tree-sitter-python/issues/197 * https://github.com/tree-sitter/tree-sitter-python/issues/251
Author
Owner

In the case of class attributes (which is the only real case I am aware of where docstring bundling would be useful), there is the additional complication that we are not currently merging those commutatively even without any docstrings, because they are locally indistinguishable from assignments in other sorts of blocks. Both a class body and a function body are just blocks, and both a class attribute declaration and a variable assignment are just assignment nodes.

In other words:

if foo:
    bar = 1
    baz = 3

and

class Foo:
    bar = 1
    baz = 3

are parsed as two assignments in a block.

This also begs for a grammar change, I would say. (Not sure if that one is upstream-able…)

In the case of class attributes (which is the only real case I am aware of where docstring bundling would be useful), there is the additional complication that we are not currently merging those commutatively even without any docstrings, because they are locally indistinguishable from assignments in other sorts of blocks. Both a class body and a function body are just `block`s, and both a class attribute declaration and a variable assignment are just `assignment` nodes. In other words: ```python if foo: bar = 1 baz = 3 ``` and ```python class Foo: bar = 1 baz = 3 ``` are parsed as two assignments in a `block`. This also begs for a grammar change, I would say. (Not sure if that one is upstream-able…)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#589
No description provided.