manual-language-selection #296
No reviewers
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#296
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "mathstuf/mergiraf:manual-language-selection"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Given that I found #295, this would help users work around the problem without needing to patch the source code.
LangProfile
is'static
c9cf6122f6I'll add test cases if the code looks OK.
Hi, thanks for the PR(s)!
I've left a... larger-than-usual number of comments, but I guess that's partly because this a non-trivial change
@ -34,0 +36,4 @@
/// Load a profile by language name.
pub fn find_by_name(name: &str) -> Option<&'static Self> {
SUPPORTED_LANGUAGES.iter().find(|lang_profile| {
lang_profile.name.eq_ignore_ascii_case(name)
having this be case-insensitive is definitely a good call
@ -34,2 +46,3 @@
/// Detects the language of a file based on its filename
pub fn detect_from_filename<P>(filename: &P) -> Option<&Self>
pub fn detect_from_filename<P>(filename: &P) -> Option<&'static Self>
good catch!
@ -113,14 +113,27 @@ pub fn line_merge_and_structured_resolution(
attempts_cache: Option<&AttemptsCache>,
debug_dir: Option<&'static Path>,
timeout: Duration,
language: Option<String>,
Option<&str>
ought to suffice here. Looks like a false-negative ofclippy::needless_pass_by_value
(which we have on)Same for
resolve_merge_cascading
@ -124,0 +118,4 @@
let lang_profile = if let Some(lang_name) = language {
let Some(lang_profile) = LangProfile::find_by_name(&lang_name) else {
debug!(
"Unknown language {}. Falling back to a line-based merge.",
I feel like "unknown language" is not very descriptive. I'd like this to somehow mention that it has to do with the specified language -- something as simple as "Language specified could not be found".
Also this should arguably be a
warn!
. I think the "Could not find a supported language ..." thing sets a false precedent -- we should probably be warning there as well. I think we use warnings too sparingly in general, but that can be a discussion for another time -- this is good for now.Here's the reasoning behind the existing
debug!
: #135 (comment). I think I forgot aboutinfo!
at the time, which IMO would've been a better option thandebug!
. So maybe this could beinfo!
as well.I think a warning for a manually-specified language makes sense;
info
for the auto-detection fallback also makes sense.@ -124,0 +119,4 @@
let Some(lang_profile) = LangProfile::find_by_name(&lang_name) else {
debug!(
"Unknown language {}. Falling back to a line-based merge.",
lang_name
Could you inline
lang_name
into the format string please? We have this warning for{e,}println!
s, but it doesn't flag the logging functions unfortunately... (since they aren't from std)@ -124,0 +121,4 @@
"Unknown language {}. Falling back to a line-based merge.",
lang_name
);
return line_based_merge(contents_base, contents_left, contents_right, &settings);
Part of me wishes that we'd at least try to
LangProfile::detect_from_filetype
here... I guess that depends on whether false detection is more probable than failed detection -- I.. don't think it is?The compelling case, to me, is the
.pl
extension: it is used for both Perl and Prolog. Any kind of auto-detection there is destined for failure without considering the contents of the file as well (but Perl is certainly the global safer-choice default). My view is that the user said it is language X, let's not second guess if things are going downhill.I mean even if we were to choose the wrong
lang_profile
, we'd fail almost immediately after that, when we try to parse the file with the wrong parser. But I see your point -- let's leave this as is@ -419,3 +430,1 @@
fname_base.display()
)
})?;
let lang_profile = language
This might be a personal thing, but I find
or_else
andok_or_else
in the same method chain to be very confusing. Also I think needing to resort toOption::transpose
is almost always.. not a good sign.The following should be identical logic-wise while also being more legible?:
It being similar to the thing over at
line_merge_and_structured_resolution
might be considered an additional benefit@ -422,0 +430,4 @@
let lang_profile = language
.map(|lang_name| {
LangProfile::find_by_name(&lang_name)
.ok_or_else(|| format!("Unknown language {lang_name}"))
See #296/files (comment)
@ -162,6 +163,7 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
},
LangProfile {
name: "Java properties",
I'd definitively have a test case for languages whose names contain spaces -- but that's for later
@ -349,6 +354,7 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
},
LangProfile {
name: "Javascript",
alternate_names: &[],
(This is just an example but) I really don't expect anyone to type out "javascript" -- the first instinct will definitively be "js". This raises a question of whether we should allow specifying languages just by their extensions
I think the list can certainly be expanded in the future. Probably want a test to make sure there aren't any ambiguous options…
8bbf5415d4
to6bf6706c2e
Ok I think the code looks pretty good now! The only thing left are tests...
mergiraf {merge,solve}
out into separate modules #297Oh and this will definitely need docs.. I think a paragraph in mergiraf.org/usage should do, but I'm not sure where exactly one would put it. Since this is supposed to be a last-resort (or at least unhappy-path) kind of option, probably somewhere near the end?
I think the option would make sense for
mgf_dev
as well (for both subcommands in fact), but we can leave that for another PRJust wanted to chime back in: not sure when I'll get time to look at this again.
Another thing I thought about was using
gitattributes
to store language information for files that use this to avoid having to pass the flag all the time. But that's a separate PR :) . Something likemergiraf.language=c++
or the like.@mathstuf thanks a lot for the update, and for the contributions! :)
I'd be happy to help out with any of the outstanding things (tests, documentation) to get this PR merged, so it doesn't get stale.
Using
.gitattributes
for this does make a lot of sense.The only reservation I have about this is that this kind of language-identifier-attribute feels like something many more programs would use, not just Mergiraf. Therefore I'd be keen to use some standard attribute instead of introducing a Mergiraf-specific one. Unfortunately, the closest thing to that (that I know of) is GitHub's Linguist, and it doesn't even need the specifier most of the time, since its heuristics seem to work good enough on their own.
mergiraf {merge,solve}
out into separate modules #297I'm working on fixing conflicts, adding tests and documenting this PR.
Let's continue this in #327 (I can't force-push to this branch).
Sorry for going dormant here, just haven't had time for this. I subscribed to the new PR (strange that a mention didn't do so…).
Also, is there an option I missed to allow maintainers to push or does Forgejo just not support it (yet)?
@mathstuf normally, maintainers should be allowed to push to PR branches, but maybe not force-push (which is what I tried to fix the conflicts)
Pull request closed