chore: Refactor AstNode creation #400

Merged
wetneb merged 6 commits from wetneb/mergiraf:refactor_AstNode_creation into main 2025-05-20 09:35:39 +02:00
Owner

The goal of this PR is to integrate the creation and initialization of the tree-sitter parser inside the constructor of the AstNode struct.
This allows for removing a bunch of duplicate initializations of parsers in various areas of the code.
This is in preparation for #5 where we'll need to intertwine the parsing of multiple languages together, making it impossible to use just one tree-sitter parser to parse a given string.

By doing that we are losing the ability to re-use a parser for parsing multiple strings in a row, but I think it should be okay performance-wise.

The goal of this PR is to integrate the creation and initialization of the tree-sitter parser inside the constructor of the `AstNode` struct. This allows for removing a bunch of duplicate initializations of parsers in various areas of the code. This is in preparation for #5 where we'll need to intertwine the parsing of multiple languages together, making it impossible to use just one tree-sitter parser to parse a given string. By doing that we are losing the ability to re-use a parser for parsing multiple strings in a row, but I think it should be okay performance-wise.
src/ast.rs Outdated
@ -86,0 +87,4 @@
let mut parser = Parser::new();
parser
.set_language(&lang_profile.language)
.map_err(|err| format!("Error loading {} grammar: {}", lang_profile.name, err))?;
Owner

aside: maybe implement Display for LangProfile by just printing out name?

aside: maybe implement `Display` for `LangProfile` by just printing out `name`?
ada4a marked this conversation as resolved
@ -86,0 +78,4 @@
/// Internal method to parse a string to a tree,
/// without doing the DFS precomputation and starting
/// allocation of node ids at the supplied counter.
fn parse_root(
Owner

I.. don't know how much sense it makes to have this as a separate method? If you inline it into Ast::parse and reorder the lines a bit, you end up with a pretty logical arrangement imo:

let mut parser = Parser::new();
parser
    .set_language(&lang_profile.language)
    .map_err(|err| format!("Error loading {} grammar: {}", lang_profile.name, err))?;
let tree = parser
    .parse(source, None)
    .expect("Parsing source code failed");

let mut next_node_id = 1;
let root = Self::internal_new(
    &mut tree.walk(),
    source,
    lang_profile,
    arena,
    &mut next_node_id,
)?;

root.internal_precompute_root_dfs(ref_arena);

Ok(root)
I.. don't know how much sense it makes to have this as a separate method? If you inline it into `Ast::parse` and reorder the lines a bit, you end up with a pretty logical arrangement imo: ```rs let mut parser = Parser::new(); parser .set_language(&lang_profile.language) .map_err(|err| format!("Error loading {} grammar: {}", lang_profile.name, err))?; let tree = parser .parse(source, None) .expect("Parsing source code failed"); let mut next_node_id = 1; let root = Self::internal_new( &mut tree.walk(), source, lang_profile, arena, &mut next_node_id, )?; root.internal_precompute_root_dfs(ref_arena); Ok(root) ```
Author
Owner

It's for #5, when we'll need to parse multiple roots with the same id counter. I can delay the introduction of the method if you prefer… maybe a sign that I'm making too small PRs?

It's for #5, when we'll need to parse multiple roots with the same id counter. I can delay the introduction of the method if you prefer… maybe a sign that I'm making too small PRs?
Owner

It's for #5, when we'll need to parse multiple roots with the same id counter.

I see!

I can delay the introduction of the method if you prefer…

No no, don't bother – it's just that I was genuinely unable to understand the reasoning behind having this method based on the desciption of this PR alone

maybe a sign that I'm making too small PRs?

I suspect you already have a long-ish list of commits, which you're now batching into these PRs.

Maybe try to _over_estimate the number of commits per PR? I'll probably be able to tell if it becomes too much, in which case we can move some of the commits over to the next PR.

One other way to solve this could be for you to upload the branch where you're working on this all, so that I could peek a bit into the future and ask you to pull some commits from there into the PR; but I understand if you don't really want to do that.

> It's for #5, when we'll need to parse multiple roots with the same id counter. I see! > I can delay the introduction of the method if you prefer… No no, don't bother – it's just that I was genuinely unable to understand the reasoning behind having this method based on the desciption of this PR alone > maybe a sign that I'm making too small PRs? I suspect you already have a long-ish list of commits, which you're now batching into these PRs. Maybe try to _over_estimate the number of commits per PR? I'll probably be able to tell if it becomes too much, in which case we can move some of the commits over to the next PR. One other way to solve this could be for you to upload the branch where you're working on this all, so that I could peek a bit into the future and ask you to pull some commits from there into the PR; but I understand if you don't really want to do that.
Author
Owner
sure, it's here: https://codeberg.org/wetneb/mergiraf/src/branch/5-nested-languages
ada4a marked this conversation as resolved
@ -38,4 +36,0 @@
parser
.set_language(&lang_profile.language)
.unwrap_or_else(|_| panic!("Error loading {} grammar", lang_profile.name));
debug!("initializing the parser took {:?}", start.elapsed());
Owner

might consider moving the timing thing into AstNode::parse -- or maybe not, given that this is the only place of parser initialization that had it

might consider moving the timing thing into `AstNode::parse` -- or maybe not, given that this is the only place of parser initialization that had it
Owner

yeah probably a bad idea, since it will end up being printed three times for the places where we parse all three sides

yeah probably a bad idea, since it will end up being printed three times for the places where we parse all three sides
Author
Owner

I don't know, I'd also be fine with that, but also fine with the current state… up to you!

I don't know, I'd also be fine with that, but also fine with the current state… up to you!
Owner

As far as I can see, the timers are only really placed in the functions closer to the outer API, like line_merge_and_structured_resolution. And I'd say AstNode::parse is a bit more low-level than that, so let's not bother...

As far as I can see, the timers are only really placed in the functions closer to the outer API, like `line_merge_and_structured_resolution`. And I'd say `AstNode::parse` is a bit more low-level than that, so let's not bother...
ada4a marked this conversation as resolved
@ -58,2 +50,3 @@
let tree_left = parse(&mut parser, contents_left, lang_profile, &arena, &ref_arena);
let tree_base = AstNode::parse(contents_base, lang_profile, &arena, &ref_arena);
let tree_left = AstNode::parse(contents_left, lang_profile, &arena, &ref_arena);
#[rustfmt::skip]
Owner

this is no longer necessary now!

this is no longer necessary now!
Author
Owner

well spotted!

well spotted!
ada4a marked this conversation as resolved
remove rustfmt::skip
All checks were successful
/ test (pull_request) Successful in 56s
61b15706e1
ada4a approved these changes 2025-05-19 23:09:37 +02:00
wetneb merged commit da356499f3 into main 2025-05-20 09:35:39 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#400
No description provided.