feat: Add command to compare files by tree isomorphism #200

Merged
ada4a merged 2 commits from wetneb/mergiraf:197_mgf_dev_isomorphism into main 2025-02-15 14:13:21 +01:00
Owner

This implements the simple version of tree isomorphism checking, without commutativity (#197).

There is still more code duplication than I'm happy with (for the text reading and parsing steps) but I couldn't figure out a way to refactor this given the ownership constraints. But surely one could try harder.

This implements the simple version of tree isomorphism checking, without commutativity (#197). There is still more code duplication than I'm happy with (for the text reading and parsing steps) but I couldn't figure out a way to refactor this given the ownership constraints. But surely one could try harder.
requested review from ada4a 2025-02-11 19:10:46 +01:00
Owner

But surely one could try harder.

Here's what I came up with:

diff --git a/src/bin/mgf_dev.rs b/src/bin/mgf_dev.rs
index 198e3a5..da3fa1f 100644
--- a/src/bin/mgf_dev.rs
+++ b/src/bin/mgf_dev.rs
@@ -1,11 +1,15 @@
-use std::{fs, path::PathBuf, process::exit};
+use std::{
+    borrow::Cow,
+    fs,
+    path::{Path, PathBuf},
+    process::exit,
+};
 
 use clap::{Parser, Subcommand};
 use mergiraf::{
     lang_profile::LangProfile,
     // XXX: move the uses to lib to avoid making these public?
     newline::normalize_to_lf,
-    tree::Ast,
 };
 use tree_sitter::Parser as TSParser;
 use typed_arena::Arena;
@@ -69,29 +73,26 @@ fn real_main(args: &CliArgs) -> Result<i32, String> {
         .set_language(&lang_profile.language)
         .map_err(|err| format!("Error loading {} grammar: {}", lang_profile.name, err))?;
 
+    let contents = |path: &Path| -> Result<Cow<str>, String> {
+        let original_contents = fs::read_to_string(path)
+            .map_err(|err| format!("Could not read {}: {err}", path.display()))?;
+        let contents = normalize_to_lf(original_contents);
+
+        Ok(contents)
+    };
+
     match &args.command {
         Command::Parse { path } => {
-            let original_contents = fs::read_to_string(path)
-                .map_err(|err| format!("Could not read {}: {err}", path.display()))?;
-            let contents = normalize_to_lf(original_contents);
-
-            let ts_tree = parser.parse(&*contents, None).ok_or("Parsing failed")?;
-            let tree = Ast::new(&ts_tree, &contents, lang_profile, &arena, &ref_arena)
+            let contents = contents(path)?;
+            let tree = mergiraf::parse(&mut parser, &contents, lang_profile, &arena, &ref_arena)
                 .map_err(|err| format!("File has parse errors: {err}"))?;

             print!("{}", tree.root().ascii_tree(lang_profile));
             Ok(0)
         }
         Command::Compare { first, second } => {
-            let original_contents_first = fs::read_to_string(first)
-                .map_err(|err| format!("Could not read {}: {err}", first.display()))?;
-            let contents_first = normalize_to_lf(original_contents_first);
-
-            let ts_tree_first = parser
-                .parse(&*contents_first, None)
-                .ok_or("Parsing failed")?;
-            let tree_first = Ast::new(
-                &ts_tree_first,
+            let contents_first = contents(first)?;
+            let tree_first = mergiraf::parse(
+                &mut parser,
                 &contents_first,
                 lang_profile,
                 &arena,
@@ -99,15 +100,9 @@ fn real_main(args: &CliArgs) -> Result<i32, String> {
             )
             .map_err(|err| format!("File has parse errors: {err}"))?;
 
-            let original_contents_second = fs::read_to_string(second)
-                .map_err(|err| format!("Could not read {}: {err}", first.display()))?;
-            let contents_second = normalize_to_lf(original_contents_second);
-
-            let ts_tree_second = parser
-                .parse(&*contents_second, None)
-                .ok_or("Parsing failed")?;
-            let tree_second = Ast::new(
-                &ts_tree_second,
+            let contents_second = contents(second)?;
+            let tree_second = mergiraf::parse(
+                &mut parser,
                 &contents_second,
                 lang_profile,
                 &arena,
diff --git a/src/lib.rs b/src/lib.rs
index 31e016f..088d1c9 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -85,7 +85,7 @@ pub const DISABLING_ENV_VAR: &str = "mergiraf";
 pub(crate) const FROM_PARSED_ORIGINAL: &str = "from_parsed_original";
 
 /// Helper to parse a source text with a given tree-sitter parser.
-pub(crate) fn parse<'a>(
+pub fn parse<'a>(
     parser: &mut TSParser,
     contents: &'a str,
     lang_profile: &LangProfile,

We use something very similar to fn contents() in mergiraf.rs (read_file_to_string+normalize_to_lf) -- we might consider defining a function for that in the lib. While we're at it, we could use read_file_to_string across the codebase -- currently we wrap every call to fs::read_to_string with (very helpful!) error messages mentioning what file couldn't be read. We could put both methods in newline.rs and rename the module to readutils.rs or something similar, since we use all the functions there to read and write files.

> But surely one could try harder. Here's what I came up with: ```diff diff --git a/src/bin/mgf_dev.rs b/src/bin/mgf_dev.rs index 198e3a5..da3fa1f 100644 --- a/src/bin/mgf_dev.rs +++ b/src/bin/mgf_dev.rs @@ -1,11 +1,15 @@ -use std::{fs, path::PathBuf, process::exit}; +use std::{ + borrow::Cow, + fs, + path::{Path, PathBuf}, + process::exit, +}; use clap::{Parser, Subcommand}; use mergiraf::{ lang_profile::LangProfile, // XXX: move the uses to lib to avoid making these public? newline::normalize_to_lf, - tree::Ast, }; use tree_sitter::Parser as TSParser; use typed_arena::Arena; @@ -69,29 +73,26 @@ fn real_main(args: &CliArgs) -> Result<i32, String> { .set_language(&lang_profile.language) .map_err(|err| format!("Error loading {} grammar: {}", lang_profile.name, err))?; + let contents = |path: &Path| -> Result<Cow<str>, String> { + let original_contents = fs::read_to_string(path) + .map_err(|err| format!("Could not read {}: {err}", path.display()))?; + let contents = normalize_to_lf(original_contents); + + Ok(contents) + }; + match &args.command { Command::Parse { path } => { - let original_contents = fs::read_to_string(path) - .map_err(|err| format!("Could not read {}: {err}", path.display()))?; - let contents = normalize_to_lf(original_contents); - - let ts_tree = parser.parse(&*contents, None).ok_or("Parsing failed")?; - let tree = Ast::new(&ts_tree, &contents, lang_profile, &arena, &ref_arena) + let contents = contents(path)?; + let tree = mergiraf::parse(&mut parser, &contents, lang_profile, &arena, &ref_arena) .map_err(|err| format!("File has parse errors: {err}"))?; print!("{}", tree.root().ascii_tree(lang_profile)); Ok(0) } Command::Compare { first, second } => { - let original_contents_first = fs::read_to_string(first) - .map_err(|err| format!("Could not read {}: {err}", first.display()))?; - let contents_first = normalize_to_lf(original_contents_first); - - let ts_tree_first = parser - .parse(&*contents_first, None) - .ok_or("Parsing failed")?; - let tree_first = Ast::new( - &ts_tree_first, + let contents_first = contents(first)?; + let tree_first = mergiraf::parse( + &mut parser, &contents_first, lang_profile, &arena, @@ -99,15 +100,9 @@ fn real_main(args: &CliArgs) -> Result<i32, String> { ) .map_err(|err| format!("File has parse errors: {err}"))?; - let original_contents_second = fs::read_to_string(second) - .map_err(|err| format!("Could not read {}: {err}", first.display()))?; - let contents_second = normalize_to_lf(original_contents_second); - - let ts_tree_second = parser - .parse(&*contents_second, None) - .ok_or("Parsing failed")?; - let tree_second = Ast::new( - &ts_tree_second, + let contents_second = contents(second)?; + let tree_second = mergiraf::parse( + &mut parser, &contents_second, lang_profile, &arena, diff --git a/src/lib.rs b/src/lib.rs index 31e016f..088d1c9 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -85,7 +85,7 @@ pub const DISABLING_ENV_VAR: &str = "mergiraf"; pub(crate) const FROM_PARSED_ORIGINAL: &str = "from_parsed_original"; /// Helper to parse a source text with a given tree-sitter parser. -pub(crate) fn parse<'a>( +pub fn parse<'a>( parser: &mut TSParser, contents: &'a str, lang_profile: &LangProfile, ``` We use something very similar to `fn contents()` in `mergiraf.rs` (`read_file_to_string`+`normalize_to_lf`) -- we might consider defining a function for that in the lib. While we're at it, we could use `read_file_to_string` across the codebase -- currently we wrap every call to `fs::read_to_string` with (very helpful!) error messages mentioning what file couldn't be read. We could put both methods in `newline.rs` and rename the module to `readutils.rs` or something similar, since we use all the functions there to read and write files.
Author
Owner

Makes sense! I'm pushing your refactoring for now.

Makes sense! I'm pushing your refactoring for now.
ada4a requested changes 2025-02-14 18:13:31 +01:00
Dismissed
ada4a left a comment
Owner

one more thing, after that we're good to go!

one more thing, after that we're good to go!
@ -24,12 +28,21 @@ enum Command {
/// Print the parsed tree for a file, for debugging purposes
Owner

could you please #[deny(missing_docs)] at Command while we're at it?

could you please `#[deny(missing_docs)]` at `Command` while we're at it?
Author
Owner

ah nice, didn't know it existed.

ah nice, didn't know it existed.
Owner

And maybe extract the first 3 commits into a separate PR?

And maybe extract the first 3 commits into a separate PR?
wetneb force-pushed 197_mgf_dev_isomorphism from a2e7b0f2d2 to a80eb303fe 2025-02-15 13:48:56 +01:00 Compare
ada4a approved these changes 2025-02-15 14:12:47 +01:00
ada4a merged commit daf2866a97 into main 2025-02-15 14:13:21 +01:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#200
No description provided.