ZIP Unicode Processing

Fix and convert Unicode filenames in ZIP files. Supports encoding repair, file list extraction, and encoding conversion

1.0.0
Version
Auth
Batch

About ZIP Unicode Processing

Someone in Shanghai zips a folder of design assets and emails it over. You extract on macOS and the filenames arrive as 寤鸿_˙??_璁捐.pdf — pure gibberish. A Japanese contractor sends a ZIP and Windows shows ƒvƒƒWƒFƒNƒg.txt instead of project.txt. A German colleague's filenames have ü turned into garbage. The file contents are intact; the metadata about which encoding the filenames were stored in never made it into the ZIP, and your extractor guessed wrong.

This tool reads the ZIP central directory, identifies the original filename encoding — UTF-8, GBK, GB2312, Shift-JIS / CP932, Latin1 or CP437 — and re-encodes every entry name to UTF-8 with the proper Unicode flag set. You can repair the archive in place (output a new ZIP with corrected names), extract just the file list to verify the fix before downloading, or override the source encoding manually when auto-detection picks the wrong one. No installing The Unarchiver, Bandizip or 7-Zip; no fiddling with unzip -O cp932 on the command line.

Typical scenarios: receiving a ZIP from a colleague on a different OS locale, downloading email attachments where the UTF-8 flag was never set, working with legacy archives created before ZIP 6.3 made UTF-8 metadata standard, or sharing project files across multilingual teams. Archives up to 50 MB are processed in a stateless serverless function and discarded after the response.

ZIP Unicode Processing Use Cases

  • Receiving ZIP archives from Chinese-, Japanese- or Korean-language Windows machines
  • Sharing project files across cross-OS teams (Windows / macOS / Linux) without filename garbling
  • Extracting archives from email attachments where the UTF-8 flag was not set correctly
  • Cleaning up legacy ZIP archives created before ZIP 6.3 UTF-8 metadata became standard
  • Verifying filenames before extracting a large download to avoid filesystem write errors
  • Converting GBK or Shift-JIS archives to UTF-8 for upload to cloud storage services
  • Recovering re-zipped archives where mojibake was baked into the inner filenames

ZIP Unicode Processing Features

  • Auto-detects original encoding — UTF-8, GBK, GB2312, Shift-JIS / CP932, Latin1, CP437
  • Three actions: fix archive (rewrite with UTF-8 names), extract file list, or convert specific encoding
  • Outputs a clean ZIP with the proper UTF-8 flag set — opens identically on Windows, macOS and Linux
  • File-list preview shows original (garbled) and repaired names side-by-side before download
  • Handles ZIP archives up to 50 MB containing thousands of entries
  • No software install required — replaces The Unarchiver, Bandizip and unzip -O cp932 for ad-hoc fixes
  • Stateless processing — uploaded archive is discarded after the response, nothing retained on disk

How to Use ZIP Unicode Processing

Upload your garbled ZIP

Drag and drop a .zip archive (up to 50 MB) into the upload area. Inner files of any language work — the tool inspects the central directory to identify the original encoding.

Pick an action

Fix Encoding rewrites the archive with UTF-8 names (most common). Extract File List shows what's inside without modifying anything. Convert Encoding lets you force a specific source encoding if auto-detection picks wrong.

Click Start Processing

The tool scans every entry, decodes the original bytes, re-encodes to UTF-8 and writes a new archive with the proper Unicode flag set. Most ZIPs process in under a second.

Verify the file list

The result panel shows original and repaired names so you can confirm the fix worked. Entries where the decode still looks wrong usually mean the source encoding was unusual — switch to Convert mode and try another encoding.

Download the cleaned ZIP

The output archive opens cleanly on any modern OS — no configuring system locale, installing The Unarchiver, or running unzip -O cp932 from the command line.

ZIP Unicode Processing FAQ

Older ZIP archives stored filenames in whatever encoding the local OS used at compression time — CP932 on Japanese Windows, GBK on Chinese Windows, Latin1 on Western European systems. The ZIP format had no field to say 'these names are GBK', so extractors on a different OS guess (often CP437) and produce mojibake. ZIP 6.3 added a UTF-8 flag in 2006, but tools created before then or running on older OS locales still produce non-UTF-8 archives.

UTF-8 (the modern standard), GBK and GB2312 (Simplified Chinese), Shift-JIS / CP932 (Japanese), CP949 (Korean — via auto-detect), Latin1 / ISO-8859-1 (Western European), and CP437 (the historical IBM PC default). Auto-detection picks the most likely encoding from byte patterns; you can override manually in Convert mode if needed.

The file is processed in a stateless serverless function and discarded immediately after the response is returned. Nothing is logged to disk and no copy is retained. If the archive contains sensitive payloads (source code, contracts, PII), the file leaves only as the repaired ZIP download in your own browser.

50 MB per archive. For larger archives, prefer a desktop tool like The Unarchiver (macOS), Bandizip (Windows) or the command-line unzip -O <encoding> (Linux / macOS). Those have no practical size cap and run faster than uploading over the network.

No. The tool only rewrites filename metadata in the ZIP central directory — the compressed file contents are passed through unchanged byte-for-byte. Any binary, document, image or video inside the archive opens identically to the original.

Switch to Convert mode and select the source encoding manually. If you know the archive came from Japanese Windows, force Shift-JIS / CP932; from Chinese Windows, force GBK; from older Korean Windows, force CP949. Auto-detection is heuristic — manual override is more reliable when you know the origin.

The Unarchiver is excellent on macOS but requires a download and install, and only fixes during extraction — it doesn't produce a repaired ZIP you can re-share. unzip -O works on the command line but only extracts, with the same caveat. This tool produces a corrected ZIP you can hand back to your collaborator, runs anywhere with a browser, and supports more encodings than CP932-only tools.

Supports ZIP format, max 50MB
Select processing mode: Fix encoding will automatically detect and repair filename encoding issues

File information will be displayed after selecting a ZIP file

ZIP Unicode Processing Tutorial

Fix Encoding

  1. Select the ZIP file to fix
  2. Choose "Fix Encoding" processing mode
  3. Click "Start Processing" button
  4. Wait for processing to complete and download the fixed ZIP file

Extract File List

  1. Select a ZIP file
  2. Choose "Extract File List" processing mode
  3. Click "Start Processing" button
  4. View the file list, including original and fixed filenames

Convert Encoding

  1. Select the ZIP file to convert
  2. Choose "Convert to UTF-8 Encoding" processing mode
  3. Click "Start Processing" button
  4. Download the converted ZIP file

Features

  • Supports Unicode filename encoding repair
  • Supports ZIP file list extraction
  • Supports filename encoding conversion (UTF-8)
  • Automatically detects and fixes common encoding issues (GBK, GB2312, etc.)
  • Processing results include file list and error information
  • Supports ZIP files up to 50MB