jEdit / Bugs / #2576 jEdit 4.3pre5 does not open files with right encoding

Ian Lewis - 2006-07-17

activity.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-07-17

The activity.log after starting jEdit at step #7

activity.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-07-17

summary: jEdit does not open files with right encoding --> jEdit 4.3pre5 does not open files with right encoding
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-07-18

Try the scenario with this file.

messages.ja

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-07-18

Logged In: YES
user_id=478898

It seems to depend on the file but happens every time with a
particular file I've created using jEdit. I attached it below.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcelo Vanzin - 2006-07-18

Logged In: YES
user_id=75113

Hi Ian,

I can't reproduce the problem following your tests. By any
chance, does the file *name* you're saving have any extra
characters? I think the current code might mess things up in
some cases if that happens...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Skeeve - 2006-07-18

Logged In: YES
user_id=864970

That file does not contain any Byte Order Mark (
http://en.wikipedia.org/wiki/Byte_Order_Mark ) so jEdit
can't see that the file is supposed to be UTF-8

It should start with EF BB BF but starts with 23 20 4a

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcelo Vanzin - 2006-07-19

Logged In: YES
user_id=75113

UTF-8 doesn't have any BOM; UTF-8Y does. As for the problem,
I'd need a copy of the $HOME/.jedit/recent.xml file that
causes the problem, otherwise, I can't reproduce it...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Skeeve - 2006-07-19

Logged In: YES
user_id=864970

So the Unicode organization is wrong when they show the BOM
for UTF-8?
http://www.unicode.org/unicode/faq/utf_bom.html#BOM :-)

To be precise: UTF-8 doesn't *need* to have a BOM, but jEdit
will know from it that the file is UTF-8. How else should it
know?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcelo Vanzin - 2006-07-19

Logged In: YES
user_id=75113

That's beyond the point of the bug; jEdit will recognize the
BOM if it's there, and should restore it with whatever
enconding the history file says it was last edited with. So
until we see the recent.xml that causes the problem no
discussion here is gonna do any good.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-08-09

recent.xml file.

recent.xml

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2006-08-09

Logged In: YES
user_id=478898

I can reproduce this bug with these steps (I also added the
recent.xml after running this scenario).

1.) Delete the recent.xml file.
2.) Open jEdit.
3.) Right click on the file in the file browser. Select
encoding and see that it is set to the default encoding
rather than UTF-8 (that by itself is a bug). Select UTF-8 as
the encoding from the File Browser for this file.
4.) Open the file.
5.) Close the file.
6.) Close jEdit.
7.) Open jEdit. Right click on the file in the file browser
and note the encoding (it's the default encoding). Don't
select a new encoding.
8.) Open the file. Notice it's opened in the default
encoding rather than UTF-8.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Skeeve - 2006-08-09

Logged In: YES
user_id=864970

I downloaded the file and it didn't open with UTF-8 when I switched my default
encoding to MacRoman.

When I copied the first 2 japanese characters (from .ok=...) to the comment in
the first line, it opened as UTF-8.

Then I concatenated all lines, up to this first characters to one line and found
that the japanese characters appear at position 188 (or a bit later). Maybe jEdit
doesn't check more than 128 characters to find the proper encoding?

just wild guessing...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alan Ezust - 2006-12-15

priority: 5 --> 7
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kazutoshi Satoda - 2007-04-11

Logged In: YES
user_id=1483238
Originator: NO

Hi Ian. I can't see what is a bug in the steps you
sent.

In the step 3, it is normal that the encoding is set to
the default encoding because jEdit has no information
about encoding of the file.

In the step 8, it is normal that the file is opened in
the default encoding because it is selected in step 7.

I can see an issue at the step 7. It might be correct
if the File System Browser look for encoding in
recent.xml. If this is your issue, it seems have to be
a new feature request. This tracker item is full of
non essential information.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2007-04-27

Logged In: YES
user_id=478898
Originator: YES

Hey k_satoda,

jEdit automatically detects if a file is UTF-8 or not. It's done this for a while. Even if no information is contained in the recent.xml file (It's in BufferIORequest.java if you care to take a look). Perhaps my mentioning that, and how jEdit is failing to do so is beside the point and perhaps a separate bug. But any perceived "non-essential" information is simply me trying to explain in the best way I can the issue that I am having. As well as attempting to provide the most information I can about reproducing the issue.

You are right and step 3 is not a bug. You are right that the recent.xml file has no information about the encoding of the file at that point. So it may not see that the file is not encoded in UTF-8 at that point.

However, in step 4 I open the file in UTF-8. In step 5 and 6 I close the file and close jEdit which should write that the file is encoded in UTF-8 to the recent.xml file. It is after all the encoding the file was set at when I closed the file. (I believe it does if you check the recent.xml after step 6).

Again, however, When I reopen jEdit and reopen the file it doesn't then know the file is in UTF-8. Even though it should have read that info from the recent.xml file (step 8). That is definitely a bug. I did not select the default encoding in step 7. It was already selected. I just looked at it. In fact I say in step 7 specifically NOT to select an encoding.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2007-04-27

Logged In: YES
user_id=478898
Originator: YES

Ok, attempting to simplify things.

1.) Apparently you're right. jEdit, since at least 4.2, has not used the encoding from the recent.xml file when opening the file using the File Browser. Though it does use the caret info. I think not using the encoding is a bug since it uses the other info from the recent.xml. Though, I'm willing to move it to a feature request.

2.) I noticed that BufferIORequest doesn't exist anymore so you can't look at it in SVN. But it used to detect whether a file was UTF-8, UTF-8Y, or UTF-16 when opening the file. If it could detect that it was one of these encodings it would open the file in that encoding rather than the default encoding. jEdit is no longer doing that. Whether that is a bug, or a feature that was removed I'm not sure.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kazutoshi Satoda - 2007-04-28

Logged In: YES
user_id=1483238
Originator: NO

> 1.) ...
I looked into the code.

The problem is the browser has "current encoding".
Though "Encoding" menu is shown in pop-up menu for a
file, it's not for the file. It shows the current
encoding of the browser. The current encoding is
initialized by the global default encoding at start-up
of the browser. The current encoding is used for all
files opened from the browser. Then, the encoding in
recent.xml is not used because a encoding for opening
file is specified by the browser.

A new feature request is fine. This problem seems to
need some discussion of the way to be fixed.

> 2.) ...
jEdit can detect UTF-8Y, but can't detect UTF-8 which
don't have BOM. UTF-8 can be detected only if the file
is a XML file and has right encoding declaration. jEdit
can also detect UTF-16 LE and BE with BOM. I think
this is not changed between 4.2final and 4.3pre9 (and
current svn trunk).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kazutoshi Satoda - 2007-04-28

Logged In: YES
user_id=1483238
Originator: NO

> 1.) ...
I looked into the code.

The problem is the browser has "current encoding".
Though "Encoding" menu is shown in pop-up menu for a
file, it's not for the file. It shows the current
encoding of the browser. The current encoding is
initialized by the global default encoding at start-up
of the browser. The current encoding is used for all
files opened from the browser. Then, the encoding in
recent.xml is not used because a encoding for opening
file is specified by the browser.

A new feature request is fine. This problem seems to
need some discussion of the way to be fixed.

> 2.) ...
jEdit can detect UTF-8Y, but can't detect UTF-8 which
don't have BOM. UTF-8 can be detected only if the file
is a XML file and has right encoding declaration. jEdit
can also detect UTF-16 LE and BE with BOM. I think
this is not changed between 4.2final and 4.3pre9 (and
current svn trunk).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ian Lewis - 2007-05-19

Logged In: YES
user_id=478898
Originator: YES

Created new feature request 1721796.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kazutoshi Satoda - 2007-05-25

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kazutoshi Satoda - 2007-05-25

Logged In: YES
user_id=1483238
Originator: NO

Closing this one because the remaining issue was moved
into the feature request.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jEdit 4.3pre5 does not open files with right encoding

jEdit is a programmer's text editor written in Java.

Group

Searches

Help

#2576 jEdit 4.3pre5 does not open files with right encoding

Discussion