Estimate cursor position within ligatures
For ligatures like ffi or ff it should still be possible to select each individual letter. Estimate the position by dividing glyph width.
Not a perfect solution since the ligature can visually look completely different than side by side character which created it. Not as bad as it might since the underlying text layout libraries already filter out good chunk of cases that shouldn't allow placing cursor in the middle.
This commit does not change the set of possible cursor positions as it was already possible to select them using keyboard. It only improves the visual cursor position which allows selecting them using mouse as well and makes keyboard navigation clearer.
Fixes #5972
Based on my tests other software like LibreOffice writer or Firefox isn't much better. Many of cases that don't work too well with this approach behave similar in other software.
Warning - test data of this MR contains bidirectional text. I hope this won't break any of CI linters or other tooling. In the worst case I can replaced it with unicode escape sequences, but I would prefer if possible keep it as is so that it can be more easily be copied for manual testing and verification.
Overall I would say there are 5 cases:
- regular text characters map directly glyphs, no ligatures
- ligatures that visually look similar to original sequence of characters just with some adjustment -> this MR improves the behavior and should work correctly even for RTL and vertical text assuming ligature follows the text direction
- ligatures that combine multiple symbols in non linear order (CJK scripts have quite a few that combine 4 into a square)
- some of these are correctly marked by Pango with no cursor position attribute so they are treated as single thing
- others (especially discretionary ligatures) permit cursor position in middle, but at least it's more or less obvious that square doesn't fit the linear text selection model. Things are already confusing enough with bidirectional text selection, no need for introducing multi-row glyphs
- ligatures that replace certain character sequences with custom symbols (Example: Noto Sans CJK JP has JIS logo
), no reasonable cursor behavior possible and it's soemwhat obvious that it's not a regular text - ligatures that permit cursor placment and preserve linear order but components are reordered (Noto Sans Hebrew has one)
In some ways case 5 seems like it is worse than 3.2 and 4 since selection behavior looks normal but the weird part starts after trying to apply style. Although if you are a native speaker and recognize which parts form a single thing it might not be so bad.
Other observations:
VSCode seems to be quite bad at dealing with bidirectional text. Some of the lines in test data look completely wrong.
CLion has a nice feature when cursor is inside mixed direction text it adds a small hook indicating current direction.