Update dependency chardet to v7 #71

Merged
timatlee merged 1 commits from renovate/chardet-7.x into main 2026-03-05 18:42:54 -07:00
Collaborator

This PR contains the following updates:

Package Update Change
chardet (changelog) major ==6.0.0.post1==7.0.1

Release Notes

chardet/chardet (chardet)

v7.0.1

Compare Source

Fixes

  • Fixed false UTF-7 detection of SHA-1 git hashes (#​324, fixing #​323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

  • @​rembish made their first contribution — both reporting the UTF-7 false detection issue and submitting the fix! (#​323, #​324)

v7.0.0

Compare Source

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Highlights:

  • MIT license (previous versions were LGPL)
  • 96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)
  • 41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer
  • Language detection for every result (90.5% accuracy across 49 languages)
  • 99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)
  • 12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing
  • Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs
  • Optional mypyc compilation — 1.49x additional speedup on CPython
  • Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+
  • Negligible import memory (96 B)
  • Zero runtime dependencies

Breaking changes vs 6.0.0:

  • detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)
  • Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.
  • LanguageFilter is accepted but ignored (deprecation warning emitted)
  • chunk_size is accepted but ignored (deprecation warning emitted)

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

This PR contains the following updates: | Package | Update | Change | |---|---|---| | [chardet](https://github.com/chardet/chardet) ([changelog](https://chardet.readthedocs.io/en/latest/changelog.html)) | major | `==6.0.0.post1` → `==7.0.1` | --- ### Release Notes <details> <summary>chardet/chardet (chardet)</summary> ### [`v7.0.1`](https://github.com/chardet/chardet/releases/tag/7.0.1) [Compare Source](https://github.com/chardet/chardet/compare/7.0.0...7.0.1) #### Fixes - Fixed false UTF-7 detection of SHA-1 git hashes ([#&#8203;324](https://github.com/chardet/chardet/pull/324), fixing [#&#8203;323](https://github.com/chardet/chardet/issues/323)) — requirements files with VCS pins (e.g., `+4bafdea3...`) were misdetected as UTF-7, breaking tools like tox - Fixed `_SINGLE_LANG_MAP` missing aliases for single-language encoding lookup (e.g., `big5` → `big5hkscs`) - Fixed PyPy `TypeError` in UTF-7 codec handling #### Improvements - Retrained bigram models — 24 previously failing test cases now pass - Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages) #### New Contributors - [@&#8203;rembish](https://github.com/rembish) made their first contribution — both reporting the UTF-7 false detection issue and submitting the fix! ([#&#8203;323](https://github.com/chardet/chardet/issues/323), [#&#8203;324](https://github.com/chardet/chardet/pull/324)) ### [`v7.0.0`](https://github.com/chardet/chardet/releases/tag/7.0.0) [Compare Source](https://github.com/chardet/chardet/compare/6.0.0.post1...7.0.0) Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate! **Highlights:** - **MIT license** (previous versions were LGPL) - **96.8% accuracy** on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer) - **41x faster** than chardet 6.0.0 with mypyc (**28x** pure Python), **7.5x faster** than charset-normalizer - **Language detection** for every result (90.5% accuracy across 49 languages) - **99 encodings** across six eras (MODERN\_WEB, LEGACY\_ISO, LEGACY\_MAC, LEGACY\_REGIONAL, DOS, MAINFRAME) - **12-stage detection pipeline** — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing - **Bigram frequency models** trained on CulturaX multilingual corpus data for all supported language/encoding pairs - **Optional mypyc compilation** — 1.49x additional speedup on CPython - **Thread-safe** `detect()` and `detect_all()` with no measurable overhead; scales on free-threaded Python 3.13t+ - **Negligible import memory** (96 B) - **Zero runtime dependencies** **Breaking changes vs 6.0.0:** - `detect()` and `detect_all()` now default to `encoding_era=EncodingEra.ALL` (6.0.0 defaulted to `MODERN_WEB`) - Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved. - `LanguageFilter` is accepted but ignored (deprecation warning emitted) - `chunk_size` is accepted but ignored (deprecation warning emitted) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41MS4xIiwidXBkYXRlZEluVmVyIjoiNDMuNTQuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->
renovate-bot added 1 commit 2026-03-03 18:00:21 -07:00
Update dependency chardet to v7
All checks were successful
Build Docker Image / build (pull_request) Successful in 1m40s
d10b684776
renovate-bot force-pushed renovate/chardet-7.x from d10b684776 to 3238f42cb1 2026-03-04 15:00:17 -07:00 Compare
timatlee merged commit 9e0e43b26f into main 2026-03-05 18:42:54 -07:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: timatlee/cloudflare-ddns-docker-updated#71