About ZIP Unicode Processing
Someone in Shanghai zips a folder of design assets and emails it over.
You extract on macOS and the filenames arrive as
寤鸿_˙??_璁捐.pdf — pure gibberish. A Japanese
contractor sends a ZIP and Windows shows
ƒvƒƒWƒFƒNƒg.txt instead of project.txt. A
German colleague's filenames have ü turned into garbage. The
file contents are intact; the metadata about which encoding
the filenames were stored in never made it into the ZIP, and
your extractor guessed wrong.
This tool reads the ZIP central directory, identifies the original
filename encoding — UTF-8, GBK, GB2312, Shift-JIS / CP932, Latin1 or
CP437 — and re-encodes every entry name to UTF-8 with the proper
Unicode flag set. You can repair the archive in place (output a new
ZIP with corrected names), extract just the file list to verify the
fix before downloading, or override the source encoding manually when
auto-detection picks the wrong one. No installing The Unarchiver,
Bandizip or 7-Zip; no fiddling with unzip -O cp932 on
the command line.
Typical scenarios: receiving a ZIP from a colleague on a different OS locale, downloading email attachments where the UTF-8 flag was never set, working with legacy archives created before ZIP 6.3 made UTF-8 metadata standard, or sharing project files across multilingual teams. Archives up to 50 MB are processed in a stateless serverless function and discarded after the response.