[go: up one dir, main page]

Menu

#115 hex mode and utf-8

v3.8
closed-fixed
nobody
v4.5
5
2017-09-26
2005-06-11
No

There are some interesting issues with hex mode in utf-8
environment.

If you step on one of the bytes of an utf-8 sequence which is not
the first byte, then the cursor immediately advances to the next
character in the left column (hex view), however, in the right
column (ascii view) it is only adjusted about a second later.

The left arrow doesn't work at utf-8 sequences, since the cursor
goes one byte backwards and it then jumps back from the middle
of an utf-8 sequence to the next character. So what you see is
that nothing happens in the left column, however, in the right
column it goes one step to the left and soon one step back to the
right. So it should go back one complete utf-8 sequence.

But that's only the beginning of a longer story.

Basically I think the problem is that Linux distros quickly switch to
UTF-8 so very soon this will be the default everywhere, however, I
guess users rather expect a hex editor to edit the raw file and
hence cope with bytes instead of utf-8 characters. Okay, I know it
only takes a "^T E latin1" to get this behavior but it may not be
trivial to other users.

I know it is really not trivial to find out how a hex editor in utf-8
mode should work. Here's how I'd imagine it, which pretty much
differs from the current method, but you may find some of my
ideas useful:

In hex view, I'd like to be able to easily enter hex characters, e.g.
where I see "48 65 6c 6c 6f" ("Hello") I'd like to be able to type
simply the hex digits, that is, 21 and not `x21 to get a "!". So this
way I could enter hex digits, which is especially important for >127
bytes since here the unicode value and the hex representation are
not easily convertable in head so currently it's kind of impossible
(in utf-8 mode) to insert a particular >127 byte to the file.

On the other hand it's also good to be able to letters directly.

So, my idea:

The TAB button should quickly switch between two modes. In one
of the modes the cursor is in the hex column, and the character
corresponding to it on the ascii column is underlined or something
like that (so that my eye can easily find the corresponding stuff,
but still I can see which one of these two is the cursor itself). In
this mode I could directly type hex digits, whatever I want (even
non-valid UTF-8), I could put my cursor wherever I wanted to,
Delete or Backspace always erased one single byte, and the right
(ascii) column would be updated to reflect what I type.

TAB would switch the cursor to the ascii column, and now the hex
counterpart would be underlined or something like this. In this
mode I could type letters from my keyboard. This could work as
the current hex editor works, that is, it could force the cursor not
to stand in the middle of a UTF-8 sequence, Del or BS could erase
a whole character, and, of course, the left arrow should also jump
back a whole UTF-8 sequence.

And finally: I'd prefer if the characters above 128 would also be
displayed in the right column. This would be easy for latin-1 and
similar encodings. In UTF-8 I still think it's better to see the
accented letter followed by (n-1) dots or a special symbols rather
than n dots. Displaying double-width (cjk and similar) symbols is a
good question though.

Discussion

  • Egmont Koblinger

    Logged In: YES
    user_id=79382

    I use joe with the -keepup option.
    Without -keepup it's worse, the cursor doesn't even advance
    automatically in the hex column, unless I press ^R to refresh the screen.

     
  • Joe Allen

    Joe Allen - 2008-10-29

    Now in CVS: I turn off UTF-8 mode when the user switches to hex display. If the user tries to set UTF-8 while there is a hex window open on the buffer, it fails.

    I'll leave this bug open to remind me of your other good, but more difficult to implement suggestions.

     
  • John J. Jordan

    John J. Jordan - 2017-09-26
    • status: open --> closed-fixed
    • Applies To: --> v4.5
    • Group: --> v3.8
     
  • John J. Jordan

    John J. Jordan - 2017-09-26

    These were mostly fixed around the 3.8-4.0 timeframe. Reopen if that's not the case.

     

Log in to post a comment.