zipfile header language encoding bit set differently between Python2 and Python3
I would like this code to work the same when run with Python 2 or Python 3
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)
However, under Python 2 out.zip starts:
50 4b 03 04 14 00 00 08
Under Python3, it starts:
50 4b 03 04 14 00 00 00
The differing part is flag_bits
, set to 0x800
for Python 2, 0x00
for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii")
throws.
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00
in _open_to_write()
.
I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.
EDIT: Updated to add the info.flag_bits = 0x800
line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:
pythonX test.py
python python-2.7 zipfile python-3.7
|
show 1 more comment
I would like this code to work the same when run with Python 2 or Python 3
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)
However, under Python 2 out.zip starts:
50 4b 03 04 14 00 00 08
Under Python3, it starts:
50 4b 03 04 14 00 00 00
The differing part is flag_bits
, set to 0x800
for Python 2, 0x00
for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii")
throws.
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00
in _open_to_write()
.
I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.
EDIT: Updated to add the info.flag_bits = 0x800
line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:
pythonX test.py
python python-2.7 zipfile python-3.7
Perhaps I am mistaken but I seem to get the output50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 at 1:06
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47
|
show 1 more comment
I would like this code to work the same when run with Python 2 or Python 3
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)
However, under Python 2 out.zip starts:
50 4b 03 04 14 00 00 08
Under Python3, it starts:
50 4b 03 04 14 00 00 00
The differing part is flag_bits
, set to 0x800
for Python 2, 0x00
for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii")
throws.
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00
in _open_to_write()
.
I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.
EDIT: Updated to add the info.flag_bits = 0x800
line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:
pythonX test.py
python python-2.7 zipfile python-3.7
I would like this code to work the same when run with Python 2 or Python 3
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)
However, under Python 2 out.zip starts:
50 4b 03 04 14 00 00 08
Under Python3, it starts:
50 4b 03 04 14 00 00 00
The differing part is flag_bits
, set to 0x800
for Python 2, 0x00
for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii")
throws.
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00
in _open_to_write()
.
I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.
EDIT: Updated to add the info.flag_bits = 0x800
line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:
pythonX test.py
python python-2.7 zipfile python-3.7
python python-2.7 zipfile python-3.7
edited Nov 12 at 9:52
asked Nov 12 at 0:29
Keeely
30129
30129
Perhaps I am mistaken but I seem to get the output50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 at 1:06
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47
|
show 1 more comment
Perhaps I am mistaken but I seem to get the output50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 at 1:06
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47
Perhaps I am mistaken but I seem to get the output
50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13– Algorithmic Canary
Nov 12 at 1:06
Perhaps I am mistaken but I seem to get the output
50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13– Algorithmic Canary
Nov 12 at 1:06
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47
|
show 1 more comment
2 Answers
2
active
oldest
votes
Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"schN{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as thezipfile
library code is the same...
– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One issys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other issys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.
– torek
Nov 12 at 11:26
add a comment |
I am using something like this for the time being:
from zipfile import ZipFile, ZipInfo
import struct
orig_function = ZipInfo.FileHeader
def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)
setattr(ZipInfo, "FileHeader", new_function)
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)
Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"schN{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as thezipfile
library code is the same...
– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One issys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other issys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.
– torek
Nov 12 at 11:26
add a comment |
Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"schN{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as thezipfile
library code is the same...
– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One issys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other issys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.
– torek
Nov 12 at 11:26
add a comment |
Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"schN{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).
Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"schN{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).
edited Nov 12 at 11:59
answered Nov 12 at 1:09
torek
182k17231313
182k17231313
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as thezipfile
library code is the same...
– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One issys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other issys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.
– torek
Nov 12 at 11:26
add a comment |
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as thezipfile
library code is the same...
– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One issys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other issys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.
– torek
Nov 12 at 11:26
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
Can you post your full code if you got the bit set for filename==file.txt with Python3?
– Keeely
Nov 12 at 9:34
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the
zipfile
library code is the same...– torek
Nov 12 at 9:55
@Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the
zipfile
library code is the same...– torek
Nov 12 at 9:55
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
– Keeely
Nov 12 at 10:33
One is
sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.– torek
Nov 12 at 11:26
One is
sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0)
, the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
. Let me try re-creating the test, too.– torek
Nov 12 at 11:26
add a comment |
I am using something like this for the time being:
from zipfile import ZipFile, ZipInfo
import struct
orig_function = ZipInfo.FileHeader
def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)
setattr(ZipInfo, "FileHeader", new_function)
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)
Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.
add a comment |
I am using something like this for the time being:
from zipfile import ZipFile, ZipInfo
import struct
orig_function = ZipInfo.FileHeader
def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)
setattr(ZipInfo, "FileHeader", new_function)
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)
Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.
add a comment |
I am using something like this for the time being:
from zipfile import ZipFile, ZipInfo
import struct
orig_function = ZipInfo.FileHeader
def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)
setattr(ZipInfo, "FileHeader", new_function)
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)
Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.
I am using something like this for the time being:
from zipfile import ZipFile, ZipInfo
import struct
orig_function = ZipInfo.FileHeader
def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)
setattr(ZipInfo, "FileHeader", new_function)
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)
Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.
answered Nov 12 at 13:48
Keeely
30129
30129
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Perhaps I am mistaken but I seem to get the output
50 4b 03 04 14 00 00 00
for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13– Algorithmic Canary
Nov 12 at 1:06
Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50
@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36
Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42
@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47