[go: up one dir, main page]

Menu

#2576 jEdit 4.3pre5 does not open files with right encoding

closed
nobody
7
2007-05-25
2006-07-17
Ian Lewis
No

jEdit often opens files in the default encoding instead
of opening them in the encoding saved in recent.xml. It
also doesn't detect that the file is UTF-8 based on the
UTF-8 magic numbers like it used to.

You can reproduce it as follows.

1.) Create a new file.
2.) Open the buffer options and set the encoding to UTF-8
3.) Copy and paste some non-ascii characters in the
buffer, such as German umlauts.
4.) Save the buffer.
5.) Close jEdit.
6.) Verify that the recent.xml file has the correct
encoding in it.
7.) Start up jEdit again.
8.) Right click on the file in the browser and look at
the encoding. It is now the DEFAULT encoding rather
than UTF-8
9.) Open the file and see the characters as garbage. It
fails to see that it is UTF-8 based on the magic
characters as well.

This has caused a number of files to inadvertently get
saved in the wrong encoding because the files are read
using the wrong encoding and then saved in the wrong
encoding. This kills many characters that are non-ASCII.

Discussion

  • Ian Lewis

    Ian Lewis - 2006-07-17
     
  • Ian Lewis

    Ian Lewis - 2006-07-17

    The activity.log after starting jEdit at step #7

     
  • Ian Lewis

    Ian Lewis - 2006-07-17
    • summary: jEdit does not open files with right encoding --> jEdit 4.3pre5 does not open files with right encoding
     
  • Ian Lewis

    Ian Lewis - 2006-07-18

    Try the scenario with this file.

     
  • Ian Lewis

    Ian Lewis - 2006-07-18

    Logged In: YES
    user_id=478898

    It seems to depend on the file but happens every time with a
    particular file I've created using jEdit. I attached it below.

     
  • Marcelo Vanzin

    Marcelo Vanzin - 2006-07-18

    Logged In: YES
    user_id=75113

    Hi Ian,

    I can't reproduce the problem following your tests. By any
    chance, does the file *name* you're saving have any extra
    characters? I think the current code might mess things up in
    some cases if that happens...

     
  • Skeeve

    Skeeve - 2006-07-18

    Logged In: YES
    user_id=864970

    That file does not contain any Byte Order Mark (
    http://en.wikipedia.org/wiki/Byte_Order_Mark ) so jEdit
    can't see that the file is supposed to be UTF-8

    It should start with EF BB BF but starts with 23 20 4a

     
  • Marcelo Vanzin

    Marcelo Vanzin - 2006-07-19

    Logged In: YES
    user_id=75113

    UTF-8 doesn't have any BOM; UTF-8Y does. As for the problem,
    I'd need a copy of the $HOME/.jedit/recent.xml file that
    causes the problem, otherwise, I can't reproduce it...

     
  • Skeeve

    Skeeve - 2006-07-19

    Logged In: YES
    user_id=864970

    So the Unicode organization is wrong when they show the BOM
    for UTF-8?
    http://www.unicode.org/unicode/faq/utf_bom.html#BOM :-)

    To be precise: UTF-8 doesn't *need* to have a BOM, but jEdit
    will know from it that the file is UTF-8. How else should it
    know?

     
  • Marcelo Vanzin

    Marcelo Vanzin - 2006-07-19

    Logged In: YES
    user_id=75113

    That's beyond the point of the bug; jEdit will recognize the
    BOM if it's there, and should restore it with whatever
    enconding the history file says it was last edited with. So
    until we see the recent.xml that causes the problem no
    discussion here is gonna do any good.

     
  • Ian Lewis

    Ian Lewis - 2006-08-09

    recent.xml file.

     
  • Ian Lewis

    Ian Lewis - 2006-08-09

    Logged In: YES
    user_id=478898

    I can reproduce this bug with these steps (I also added the
    recent.xml after running this scenario).

    1.) Delete the recent.xml file.
    2.) Open jEdit.
    3.) Right click on the file in the file browser. Select
    encoding and see that it is set to the default encoding
    rather than UTF-8 (that by itself is a bug). Select UTF-8 as
    the encoding from the File Browser for this file.
    4.) Open the file.
    5.) Close the file.
    6.) Close jEdit.
    7.) Open jEdit. Right click on the file in the file browser
    and note the encoding (it's the default encoding). Don't
    select a new encoding.
    8.) Open the file. Notice it's opened in the default
    encoding rather than UTF-8.

     
  • Skeeve

    Skeeve - 2006-08-09

    Logged In: YES
    user_id=864970

    I downloaded the file and it didn't open with UTF-8 when I switched my default
    encoding to MacRoman.

    When I copied the first 2 japanese characters (from .ok=...) to the comment in
    the first line, it opened as UTF-8.

    Then I concatenated all lines, up to this first characters to one line and found
    that the japanese characters appear at position 188 (or a bit later). Maybe jEdit
    doesn't check more than 128 characters to find the proper encoding?

    just wild guessing...

     
  • Alan Ezust

    Alan Ezust - 2006-12-15
    • priority: 5 --> 7
     
  • Kazutoshi Satoda

    Logged In: YES
    user_id=1483238
    Originator: NO

    Hi Ian. I can't see what is a bug in the steps you
    sent.

    In the step 3, it is normal that the encoding is set to
    the default encoding because jEdit has no information
    about encoding of the file.

    In the step 8, it is normal that the file is opened in
    the default encoding because it is selected in step 7.

    I can see an issue at the step 7. It might be correct
    if the File System Browser look for encoding in
    recent.xml. If this is your issue, it seems have to be
    a new feature request. This tracker item is full of
    non essential information.

     
  • Ian Lewis

    Ian Lewis - 2007-04-27

    Logged In: YES
    user_id=478898
    Originator: YES

    Hey k_satoda,

    jEdit automatically detects if a file is UTF-8 or not. It's done this for a while. Even if no information is contained in the recent.xml file (It's in BufferIORequest.java if you care to take a look). Perhaps my mentioning that, and how jEdit is failing to do so is beside the point and perhaps a separate bug. But any perceived "non-essential" information is simply me trying to explain in the best way I can the issue that I am having. As well as attempting to provide the most information I can about reproducing the issue.

    You are right and step 3 is not a bug. You are right that the recent.xml file has no information about the encoding of the file at that point. So it may not see that the file is not encoded in UTF-8 at that point.

    However, in step 4 I open the file in UTF-8. In step 5 and 6 I close the file and close jEdit which should write that the file is encoded in UTF-8 to the recent.xml file. It is after all the encoding the file was set at when I closed the file. (I believe it does if you check the recent.xml after step 6).

    Again, however, When I reopen jEdit and reopen the file it doesn't then know the file is in UTF-8. Even though it should have read that info from the recent.xml file (step 8). That is definitely a bug. I did not select the default encoding in step 7. It was already selected. I just looked at it. In fact I say in step 7 specifically NOT to select an encoding.

     
  • Ian Lewis

    Ian Lewis - 2007-04-27

    Logged In: YES
    user_id=478898
    Originator: YES

    Ok, attempting to simplify things.

    1.) Apparently you're right. jEdit, since at least 4.2, has not used the encoding from the recent.xml file when opening the file using the File Browser. Though it does use the caret info. I think not using the encoding is a bug since it uses the other info from the recent.xml. Though, I'm willing to move it to a feature request.

    2.) I noticed that BufferIORequest doesn't exist anymore so you can't look at it in SVN. But it used to detect whether a file was UTF-8, UTF-8Y, or UTF-16 when opening the file. If it could detect that it was one of these encodings it would open the file in that encoding rather than the default encoding. jEdit is no longer doing that. Whether that is a bug, or a feature that was removed I'm not sure.

     
  • Kazutoshi Satoda

    Logged In: YES
    user_id=1483238
    Originator: NO

    > 1.) ...
    I looked into the code.

    The problem is the browser has "current encoding".
    Though "Encoding" menu is shown in pop-up menu for a
    file, it's not for the file. It shows the current
    encoding of the browser. The current encoding is
    initialized by the global default encoding at start-up
    of the browser. The current encoding is used for all
    files opened from the browser. Then, the encoding in
    recent.xml is not used because a encoding for opening
    file is specified by the browser.

    A new feature request is fine. This problem seems to
    need some discussion of the way to be fixed.

    > 2.) ...
    jEdit can detect UTF-8Y, but can't detect UTF-8 which
    don't have BOM. UTF-8 can be detected only if the file
    is a XML file and has right encoding declaration. jEdit
    can also detect UTF-16 LE and BE with BOM. I think
    this is not changed between 4.2final and 4.3pre9 (and
    current svn trunk).

     
  • Kazutoshi Satoda

    Logged In: YES
    user_id=1483238
    Originator: NO

    > 1.) ...
    I looked into the code.

    The problem is the browser has "current encoding".
    Though "Encoding" menu is shown in pop-up menu for a
    file, it's not for the file. It shows the current
    encoding of the browser. The current encoding is
    initialized by the global default encoding at start-up
    of the browser. The current encoding is used for all
    files opened from the browser. Then, the encoding in
    recent.xml is not used because a encoding for opening
    file is specified by the browser.

    A new feature request is fine. This problem seems to
    need some discussion of the way to be fixed.

    > 2.) ...
    jEdit can detect UTF-8Y, but can't detect UTF-8 which
    don't have BOM. UTF-8 can be detected only if the file
    is a XML file and has right encoding declaration. jEdit
    can also detect UTF-16 LE and BE with BOM. I think
    this is not changed between 4.2final and 4.3pre9 (and
    current svn trunk).

     
  • Ian Lewis

    Ian Lewis - 2007-05-19

    Logged In: YES
    user_id=478898
    Originator: YES

    Created new feature request 1721796.

     
  • Kazutoshi Satoda

    • status: open --> closed
     
  • Kazutoshi Satoda

    Logged In: YES
    user_id=1483238
    Originator: NO

    Closing this one because the remaining issue was moved
    into the feature request.

     

Log in to post a comment.