Post by glen herrmannsfeldtPost by AndyHancockThere have been times in the past when a zip file I created was
corrupt (or became corrupt somehow). I encountered this either using
WinZip or command-line zip, possibly in old Unix environments. To me,
there is a risk in relying solely on zip archives and deleting the
original unzipped files. The risk isn't only in losing one file that
is corrupt -- any corruption anywhere in the entire archive could
render all the files therein inaccessible. Hence, the risk increases
with the size of the archive.
For tgz, the usual unix gzipped tar file, any corruption makes it
pretty hard to recover anything after that.
As I understand zip, though, each file is compressed separately.
The index is at the end, and could be lost or corrupt, but it is usually
possible to find the beginning of a file, and uncompress it, even
without the index.
it is also (often) possible to recover files from chunks of multi-part
ZIP archives, if the person knows how to do so...
there is a problem though that some tools are stupid and will make no
attempt to access an archive if they can't find the "end of central
directory" marker, but this is more of a problem of stupid tools than
the ZIP format itself.
this is partly because the ZIP format compresses each file separately,
and actually stores information about each file in several locations:
directly preceding the compressed file data;
in the central directory (stored at the end of the archive).
more so, each entry also has a nifty magic-code which can be used for
resynchronization.
note though that one thing which may often irreparably foul up ZIP
archives is LF <-> CR-LF autoconversion, which can happen sometimes.
this was often as a result of buggy FTP software (which mistakenly
identified a binary file as text), or occasionally brain-damaged
filesystem code (such as cases of UMSDOS auto-conversion being enabled).
(possibly as a result) some formats (such as PNG) include special logic
to at least detect if the file has gotten screwed up by such a conversion.
Post by glen herrmannsfeldtPost by AndyHancockHow reliable is the zip that is native to Windows 7? In addition to
that general question, what about specifically for files in the
Gigabyte range (fraction of a GB or several GBs)? If it is very
reliable, then I will use the Windows 7's "compressed (zipped) folder"
to create archives for writing to DVD.
You mean how often does it create corrupt zip files?
I would expect more often the corruption occurs later.
You could store them on a RAID (redundant) disk to reduce the corruption
that could happen.
yep.
although, ironically, while raid protects fairly well against physical
failure of disks, it (sadly) generally lacks any protection against
OS-induced corruption (such as cases where the OS kernel gets corrupt
somehow and manages to go berserk somewhat before finally crashing /
blue-screening...).
often following these events, files would often be "sliced and diced",
often with contents of one file being mixed in with another, ...
this partly gave me a mistrust of NTFS on WinXP computers, partly as
IME, NTFS drives seemed to get fouled up by crashes a lot more often
than FAT32 drives.
luckily, this issue seems to have largely gone away AFAICT in Vista and
Win7.
for external archiving, it is a tradeoff...
I have often found the long-term reliability of CD-R and DVD-R's to
sometimes be a bit lacking...
granted, even as such, they still seem to hold up better IME than old
HDDs. if an HDD is left sitting unused for a number of years, often
either the spindle is stuck (so it can't spin up) or the contents are
otherwise corrupt/unreadable.
OTOH, IME, HDDs seem to last a lot longer when used occasionally, as
this seems to keep the spindles from seizing, and causes data to
regenerate (I suspect because HDDs will read and rewrite sectors to help
keep their contents from degrading and similar).
usually, at least some contents can be recovered from an old CD-R, but
not a whole lot can be recovered from a seized HDD...
or such...