feat: Support for nested languages #403
No reviewers
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#403
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "wetneb/mergiraf:5-nested-languages"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #5.
This uses the injection queries defined by tree-sitter parsers to recursively parse the corresponding areas with other parsers.
The failing test case demonstrates a small issue with whitespace handling that I'm interested to investigate further.
I normally try to go over the whole PR at once in order to have enough context for my reviews, but since this is a bigger PR (and since I've been struggling to cobble together free time lately 😅), I'd try going with the approach of multiple small reviews this time. And so, here's the first one
@ -95,0 +116,4 @@
source: &'a str,
lang_profile: &'a LangProfile,
) -> FxHashMap<usize, &'static LangProfile> {
let mut node_id_to_injection_lang = FxHashMap::default();
I'd like to avoid the indentation caused by
if let Some
. How about changing the first 2 lines to the following?Maybe it would even make sense to change the return type to an
Option
? Not sure about this one thoughSure! Is it more convenient for you if I amend the commits and force-push, or add those refactorings as separate commits on top of the branch?
That's a good question. On one hand, I do prefer a cleaner history. On the other hand, I think Codeberg doesn't quite like it when a change is pulled from under the feet a review comment (since it doesn't keep track of PR versions AFAICT). So let's play it safe and go with the second option
@ -95,0 +149,4 @@
.expect("injection.language capture didn't match any node");
source[lang_node.byte_range()].to_owned()
});
m.nodes_for_capture_index(content_capture_index)
if let Some()
doesn't depend on the particularnode
, it can be pulled out offor_each
nodes.for_each( map.insert() )
can be replaced withmap.extend(nodes)
@ -95,0 +151,4 @@
});
m.nodes_for_capture_index(content_capture_index)
.for_each(|node| {
if let Some(injected_lang) = LangProfile::find_by_name(&language) {
shouldn't we maybe return an error if we don't support the injected language? Because I think that if we ignore its actual language, we'll end up handling it like the language it's embedded in, which might backfire
no, in that case we just don't re-parse the injection recursively at all, and just leave it to how it was parsed by the parent grammar… which feels appropriate to me, given that the parent grammar will generally leave this part un-analyzed.
For instance, we currently don't support CSS, and the HTML grammar sets up injections for
<style>
elements to be parsed as CSS. Because we don't find a suitable language, the<style>
block is left un-analyzed. The presence of a<style>
block shouldn't prevent the HTML from parsing, since we can still resolve conflicts in the rest of the HTML tree.Right, I since saw that explained at line 189. Maybe it would make sense to copy that comment into here as well
@ -0,0 +2,4 @@
```yml
foo2:
- bar
It took me longer than I'd like to admit to notice the yaml->yml change introduced by the left side. Maybe add a README like:
or, if you want to be more terse:
My bad, actually I should have kept this test case in the PR that introduced the markdown parser, since it's independent from injections support…
I guess that would've stopped me from trying to look inside the codeblock... But all good:)
But I think having the README is helpful even despite that 😅
Ah, you did add it! Thanks:)
@ -0,0 +5,4 @@
from bookmaker import front_page, compile, table_of_contents
def main():
compile(toc=table_of_contents, debug=True)
Nice reference 😅
@ -95,0 +132,4 @@
let pattern_properties = query.property_settings(m.pattern_index);
// first, check if the language is statically defined in this clause of the query as a property
let lang_property_value = pattern_properties.iter().find_map(|property| {
if property.key == "injection.language".into() {
This
.into()
turns the"injection.content"
on the right into an ownedBox<str>
, since that's the type ofproperty.key
on the left.Box<str>
is an owned equivalent of&str
(the only difference fromString
being that the latter can be resized, so the triple is isomorphic toBox<[T]>
vs&[T]
vsVec<T>
), and it implementsDeref<Target=str>
, so it can be coerced to a&str
with a reborrow:Pretty much the same goes for
property.value
, except that there theBox<str>
is wrapped in anOption
, so we need to useOption::as_deref
to reborrow the value. The rest should follow, but here's the solution just in case:)Thanks a lot, you can tell I've struggled with those boxes :-D
@ -95,0 +142,4 @@
let language = lang_property_value.unwrap_or_else(|| {
// otherwise, look if the language is defined via a capture
let lang_node = m
.nodes_for_capture_index(language_capture_index.expect(
I think extracting this into a variable would be a bit cleaner...
@ -779,0 +809,4 @@
if let Some(injection) = lang_profile.injections {
assert!(
!injection.trim().is_empty(),
"Injection query for language {} set as an empty string, use None instead",
can use the
Display
impl forLangProfile
here:)@ -133,2 +214,2 @@
(range, local_source)
};
let (range, local_source) =
// don't trim injections, because their children could expand beyond the node itself
is there an example of this you know of? maybe even a test case that shows an injection being trimmed resulting in an incorrect output...
If you remove the condition on the injection, you should get a nice error when running
examples/markdown/failing/nested_injections
.Interesting. But the fact that the test is failing even with this restriction in place could mean that the latter is not quite correct, no?
The error you mention appears in
trailing_whitespace
, in particular when (panickingly) slicing intoself.source
, so I tried replacing that with aget
, but then the test output was a whole different kind of wrong... Maybe I'll admit my defeat here, and trust you to debug this in the future 😅But why do children of injection expand beyond the node itself! That feels so wrong to me
The root of a tree as returned by tree-sitter generally (if not always) spans the whole source that you give it to parse. Because we do not to trim roots (condition
node.parent().is_some()
), this means that the containing node (the one picked out by the injection query) shouln't be trimmed either. This doesn't need to propagate further up the tree since the node picked out by the injection query is the one defining the source area to parse, so we know we're not expanding beyond that.An alternative (which I would like to explore as a follow up) would be to trim the injected roots as well, and only not trim the real, actual root of the entire document.