feat: manual language selection #327

Merged
wetneb merged 16 commits from wetneb/mergiraf:manual-language-selection-v2 into main 2025-04-28 20:29:09 +02:00
Owner

Continuation of #296 by @mathstuf.

Continuation of #296 by @mathstuf.
requested review from ada4a 2025-04-17 15:42:35 +02:00
ada4a requested changes 2025-04-17 17:04:44 +02:00
Dismissed
ada4a left a comment
Owner

Love the tests, and docs! There were still some things left unaddressed from #296 -- I mostly copied the comments from there verbatim

Love the tests, and docs! There were still some things left unaddressed from #296 -- I mostly copied the comments from there verbatim
doc/src/usage.md Outdated
@ -84,0 +84,4 @@
### Manually specifying the file's language
You can use the `--language` option (short: `-L`) to specify the language of the files to merge.
It accepts both file extensions (`--language js`) and language names (`--language javascript`), as specified in the list of [supported languages](./languages.md).
Owner

It accepts [...] file extensions

it doesn't actually though?.. At least right now, LangProfile::find_by_name only looks at name and alternate_names, not extensions

> It accepts [...] file extensions it doesn't actually though?.. At least right now, `LangProfile::find_by_name` only looks at `name` and `alternate_names`, not `extensions`
Author
Owner

Ah, great that you point this out! Wouldn't it make sense to also include extensions then? Arguably, they are language identifiers that users will be familiar with, no?

Ah, great that you point this out! Wouldn't it make sense to also include extensions then? Arguably, they are language identifiers that users will be familiar with, no?
Owner

Definitely! I mean the only problem with that would be .pl, which is used both for Prolog and Perl. But:

  1. it's recommended to use .pro extension for Prolog because of that
  2. are we seriously going to support Prolog...

Other than that, I think it's pretty much a no-brainer

Definitely! I mean the only problem with that would be `.pl`, which is used both for Prolog and Perl. But: 1. it's [recommended](https://www.swi-prolog.org/pldoc/man?section=fileext) to use `.pro` extension for Prolog because of that 2. are we seriously going to support Prolog... Other than that, I think it's pretty much a no-brainer
@ -53,1 +53,4 @@
conflict_marker_size: Option<usize>,
/// Override automatic language detection.
#[arg(short = 'L', long)]
language: Option<String>,
Owner

#296 (comment):

I think the option would make sense for mgf_dev as well (for both subcommands in fact), but we can leave that for another PR

https://codeberg.org/mergiraf/mergiraf/pulls/296#issuecomment-3497861: > I think the option would make sense for `mgf_dev` as well (for both subcommands in fact), but we can leave that for another PR
@ -162,6 +163,7 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
},
LangProfile {
name: "Java properties",
Owner

#296 (comment):

I'd definitively have a test case for languages whose names contain spaces

https://codeberg.org/mergiraf/mergiraf/pulls/296#issuecomment-3136124: > I'd definitively have a test case for languages whose names contain spaces
ada4a marked this conversation as resolved
@ -349,6 +354,7 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
},
LangProfile {
name: "Javascript",
alternate_names: &[],
Owner

#296 (comment):

(This is just an example but) I really don't expect anyone to type out "javascript" -- the first instinct will definitively be "js". This raises a question of whether we should allow specifying languages just by their extensions

#296 (comment):

I think the list can certainly be expanded in the future. Probably want a test to make sure there aren't any ambiguous options…

https://codeberg.org/mergiraf/mergiraf/pulls/296#issuecomment-3136799: > (This is just an example but) I really don't expect anyone to type out "javascript" -- the first instinct will definitively be "js". This raises a question of whether we should allow specifying languages just by their extensions https://codeberg.org/mergiraf/mergiraf/pulls/296#issuecomment-3143180: > I think the list can certainly be expanded in the future. Probably want a test to make sure there aren't any ambiguous options…
Author
Owner

Yes, I think that's a good case for including extensions then, no?

Yes, I think that's a good case for including extensions then, no?
ada4a changed title from feat: Manual language selection to feat: manual language selection 2025-04-20 13:36:44 +02:00
wetneb force-pushed manual-language-selection-v2 from 5bc5517e72 to c010642734 2025-04-22 14:52:27 +02:00 Compare
ada4a requested changes 2025-04-22 22:31:45 +02:00
Dismissed
ada4a left a comment
Owner

mostly stylistic nits

mostly stylistic nits
@ -215,0 +216,4 @@
let repo_dir = tempfile::tempdir().expect("failed to create the temp dir");
let test_file = repo_dir.path().join(Path::new("file.txt"));
fs::copy(
Path::new("examples/java/working/demo/Base.java"),
Owner

you should be able to omit Path::new here and directly above, since the methods take AsRef<Path>, which is implemented for &str

you should be able to omit `Path::new` here and directly above, since the methods take `AsRef<Path>`, which is implemented for `&str`
ada4a marked this conversation as resolved
@ -34,0 +38,4 @@
pub fn find_by_name(name: &str) -> Option<&'static Self> {
SUPPORTED_LANGUAGES.iter().find(|lang_profile| {
lang_profile.name.eq_ignore_ascii_case(name)
|| lang_profile
Owner
-                || lang_profile
-                    .alternate_names
-                    .iter()
+                || (lang_profile.alternate_names.iter())
```diff - || lang_profile - .alternate_names - .iter() + || (lang_profile.alternate_names.iter()) ```
ada4a marked this conversation as resolved
@ -34,0 +41,4 @@
|| lang_profile
.alternate_names
.iter()
.chain(lang_profile.extensions.iter())
Owner

You should be able to replace .iter() with & here

You should be able to replace `.iter()` with `&` here
ada4a marked this conversation as resolved
@ -237,0 +290,4 @@
#[test]
fn find_by_filename_or_name() {
assert_eq!(
LangProfile::find_by_filename_or_name(Path::new("file.json"), None)
Owner

you shouldn't need all the Path::news in this test, since LangProfile::find_by_filename_or_name accepts AsRef<Path>, so a &str works on its own

you shouldn't need all the `Path::new`s in this test, since `LangProfile::find_by_filename_or_name` accepts `AsRef<Path>`, so a `&str` works on its own
ada4a marked this conversation as resolved
@ -237,0 +301,4 @@
.name,
"JSON"
);
LangProfile::find_by_filename_or_name(
Owner

expect_err is kind of confusing -- I think using assert!(/* ... */.is_err(), /* the message */) might be clearer

`expect_err` is kind of confusing -- I think using `assert!(/* ... */.is_err(), /* the message */)` might be clearer
ada4a marked this conversation as resolved
@ -41,3 +38,1 @@
fname_base.display()
);
return line_based_merge(contents_base, contents_left, contents_right, &settings);
let lang_profile = match LangProfile::find_by_filename_or_name(fname_base, language) {
Owner

couldn't you use let Ok(lang_profile) = /* ... */ else { /* ... */ }?

couldn't you use `let Ok(lang_profile) = /* ... */ else { /* ... */ }`?
Author
Owner

I thought about it and the problem is that I need the error message in the else block… do you have a solution for that?

I thought about it and the problem is that I need the error message in the `else` block… do you have a solution for that?
Owner

Oh, completely missed that! It looks like that match is indeed required then, yeah.

By the way, message already ends with a ., so you'll end up with two of them – I think it makes more sense to remove the ones in find_by_filename_or_name

Oh, completely missed that! It looks like that `match` is indeed required then, yeah. By the way, `message` already ends with a `.`, so you'll end up with two of them – I think it makes more sense to remove the ones in `find_by_filename_or_name`
Author
Owner

Good eyes! :)

Good eyes! :)
ada4a marked this conversation as resolved
src/solve.rs Outdated
@ -28,3 +26,1 @@
fname_base.display()
)
})?;
let lang_profile = if let Some(lang_name) = language {
Owner

can't this whole block be replaced with LangProfile::detect_from_filename_or_name?

can't this whole block be replaced with `LangProfile::detect_from_filename_or_name`?
Author
Owner

Very good catch, of course that's much better.

Very good catch, of course that's much better.
ada4a marked this conversation as resolved
ada4a requested changes 2025-04-27 15:50:21 +02:00
Dismissed
ada4a left a comment
Owner

more of the same

more of the same
@ -215,0 +223,4 @@
"parse",
"--language",
"java",
test_file.display().to_string().as_str(),
Owner

I'm pretty sure .display() is plain redundant here, since to_string is implemented in terms of it -- so you can just remove that.

You could also replace to_string().as_str() with to_str().unwrap() (first one replaces non-Unicode characters with the ? thing, while the second one returns None on non-UTF8 characters, both of which won't happen in this case), but it's not like one is really shorter than the other... Replacing .as_str() with &, on the other hand, will save some characters

I'm pretty sure `.display()` is plain redundant here, since `to_string` is implemented in terms of it -- so you can just remove that. You could also replace `to_string().as_str()` with `to_str().unwrap()` (first one replaces non-Unicode characters with the `?` thing, while the second one returns `None` on non-UTF8 characters, both of which won't happen in this case), but it's not like one is really shorter than the other... Replacing `.as_str()` with `&`, on the other hand, _will_ save some characters
Owner

I see that you used to_str().unwrap() in mergiraf.rs, so maybe go with that for consistency... up to you

I see that you used `to_str().unwrap()` in `mergiraf.rs`, so maybe go with that for consistency... up to you
ada4a marked this conversation as resolved
@ -237,0 +301,4 @@
);
assert!(
LangProfile::find_by_filename_or_name(
Path::new("file.json"),
Owner

missed one Path::new here:)

missed one `Path::new` here:)
ada4a marked this conversation as resolved
@ -237,0 +307,4 @@
.is_err(),
"If a language name is provided, the file name should be ignored"
);
LangProfile::find_by_filename_or_name(Path::new("file.unknown_extension"), None)
Owner

this could also be turned into an assert!, and Path::new can be removed as well

this could also be turned into an `assert!`, and `Path::new` can be removed as well
ada4a marked this conversation as resolved
Style fixes
All checks were successful
/ test (pull_request) Successful in 1m4s
d7e2290b5e
@ -237,0 +276,4 @@
assert_eq!(LangProfile::find_by_name("python").unwrap().name, "Python");
assert_eq!(LangProfile::find_by_name("py").unwrap().name, "Python");
assert_eq!(
LangProfile::find_by_name("Java properties").unwrap().name,
Owner

When I wrote about the need for tests for languages whose names contain spaces, I was thinking more about cli tests... but I mean the users will probably be able to realize that such names need quoting. So yeah, I guess that doesn't really need tests after all, and having this test is also good

When I wrote about the need for tests for languages whose names contain spaces, I was thinking more about cli tests... but I mean the users will probably be able to realize that such names need quoting. So yeah, I guess that doesn't really need tests after all, and having this test is also good
ada4a requested changes 2025-04-28 19:30:29 +02:00
Dismissed
ada4a left a comment
Owner

One final change, and we can finally merge this. Thank you for bringing this to completion:)

One final change, and we can finally merge this. Thank you for bringing this to completion:)
@ -542,6 +559,7 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
},
LangProfile {
name: "Typescript (TSX)",
alternate_names: &["TSX"],
Owner

this one is redundant now that we match on extensions

this one is redundant now that we match on extensions
Remove explicit TSX alternate name
All checks were successful
/ test (pull_request) Successful in 1m2s
230050c63b
ada4a approved these changes 2025-04-28 20:12:41 +02:00
wetneb merged commit b5dcca663b into main 2025-04-28 20:29:09 +02:00
wetneb referenced this pull request from a commit 2025-04-28 20:29:11 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#327
No description provided.