msgspec build issue
Date: 2025 11-November 21
okay
i am having an issue with packaging my python project which uses msgspec
in several case i have has a msgspec._core warning when trying to run my shiv built pyz file. it builds without PYC
i think i might want to migrate to ujson or json, something in pure python, rather than use msgspec.__core
what are the drawbacks?
oolong@CIT-36NZRL3 /mnt/c/Users/george.bennett/dev/pipeline dev ± grep -r "import msgspec"
src/pipeline/api/status_api.py:import msgspec.json
src/pipeline/gui_litestar_msgspec_plotly.py:import msgspec
src/pipeline/gui_starlette_msgspec_plotly.py:import msgspec
src/pipeline/server/config_server.py:import msgspec.json # New import for fast JSON serialization
src/pipeline/server/trend_server_eds.py:import msgspec.json # New import for fast JSON serialization
src/pipeline/server/trend_server_eds.py:#import msgspec.struct # Import for creating data structures
src/pipeline/server/trend_server_eds.py:# FIX: Use 'from msgspec import Struct' instead of 'import msgspec.struct'
oolong@CIT-36NZRL3 /mnt/c/Users/george.bennett/dev/pipeline dev ± grep -r "from msgspec"
src/pipeline/gui_litestar_msgspec_plotly.py:from msgspec import Struct
src/pipeline/server/trend_server_eds.py:# FIX: Use 'from msgspec import Struct' instead of 'import msgspec.struct'
src/pipeline/server/trend_server_eds.py:from msgspec import Struct
oolong@CIT-36NZRL3 /mnt/c/Users/george.bennett/dev/pipeline dev ±
some of these are old defunct sample files
This is a classic "zipapp vs. C-extension" conflict. Tools like shiv, pex, and standard Python zipapps struggle to load compiled C-extensions (like msgspec._core or ujson) directly from within the compressed .pyz archive because the OS loader needs a physical file on the filesystem to link against.
Here is an analysis of the drawbacks of migrating away from msgspec, specifically given your codebase's usage.
1. The Major Drawback: Loss of Struct (Schema & Validation)
According to your grep results, you are using:
src/pipeline/server/trend_server_eds.py:from msgspec import Struct
This is the biggest friction point. msgspec isn't just a JSON serializer; it is also a schema validation library similar to Pydantic, but faster.
-
Current Behavior: When you define a
class MyModel(msgspec.Struct),msgspecenforces types at runtime (e.g., ensuring a field is anint) and handles decoding JSON bytes directly into that class instance efficiently. -
The Migration Cost: Standard
jsonandujsondo not support Structs or schemas. They only output rawdictandlistobjects. -
The Workaround: To replicate this functionality without
msgspec, you would need to:-
Rewrite all
msgspec.Structclasses as standard Python@dataclasses. -
Manually handle the conversion from
dict(whatjson.loadproduces) to your Dataclass, or use a library likedacite.
-
2. Performance Impact (Trend Server)
You have a file named trend_server_eds.py. If "Trend" implies time-series data or large arrays of numbers:
-
msgspec: Optimized for low memory usage and high speed. It decodes directly into objects.
-
standard json: Significantly slower and uses more memory because it creates intermediate dictionary objects for everything.
-
ujson: Faster than standard
json, but often faster by cutting corners (less strictly compliant).
Verdict: If your payloads are small (config files, status checks), you won't notice the difference. If you are shipping megabytes of trend data, the migration to standard json will notably increase CPU usage and latency.
3. ujson is likely a Lateral Move
You mentioned considering ujson (UltraJSON).
-
The Trap:
ujsonis also written in C. -
If your
shivbuild is failing because it cannot load themsgspec._coreC-extension, it will likely fail to load theujsonC-extension for the exact same reason. -
Safe Bet: If you want guaranteed packaging stability, you must use the standard library
jsonmodule.
Migration Comparison Table
| Feature | msgspec | ujson | json (Std Lib) |
|---|---|---|---|
| Packaging | Difficult (C-ext, needs extraction) | Difficult (C-ext, needs extraction) | Easy (Pure Python) |
| Speed | Excellent | Very Good | Moderate/Slow |
| Type Safety | High (Struct) |
None (Returns Dicts) | None (Returns Dicts) |
| Validation | Built-in | None | None |
| Code Change | N/A | High (Must remove Struct) |
High (Must remove Struct) |
Alternative: Fix the Packaging (Keep msgspec)
Before you rewrite your code, you might be able to fix the shiv build. shiv (and pex) has a mechanism to handle C-extensions: Use the preamble to extract the files.
By default, shiv tries to run code out of the zip. C-extensions cannot run out of a zip. You need to tell shiv to unzip the underlying packages into a cache directory (usually ~/.shiv) on the target machine.
Try building your pyz with a preamble that forces extraction:
If you haven't explicitly configured shiv to extract, the .so / .pyd files are trapped inside the zip.
-
Ensure your build command does not imply "pure python only" if that flag exists in your version.
-
When running the
.pyz,shivshould automatically detect the need to extract if the dependencies are packaged correctly.
However, if you are cross-compiling (building on WSL/Linux for a different Linux distro, or building on Windows for Linux), msgspec will fail because the .so file inside the pyz matches the build OS, not the runtime OS.
Summary Recommendation
-
If the app handles heavy data: Do not migrate. The performance penalty of standard
jsonis high. Focus on fixing theshivbuild (ensure the build environment matches the deployment environment exactly, e.g., build inside a Docker container that mimics the target). -
If the app is low-throughput: Migrate to standard
json+dataclasses.-
Change
msgspec.Structto@dataclass. -
Change
msgspec.json.decodetojson.loads. -
Avoid
ujsonas it doesn't solve the C-extension packaging risk.
-
Would you like me to show you the code equivalent of converting a msgspec.Struct to a standard library dataclass implementation?
This is a classic "zipapp vs. C-extension" conflict. Tools like `shiv`, `pex`, and standard Python zipapps struggle to load compiled C-extensions (like `msgspec._core` or `ujson`) directly from within the compressed `.pyz` archive because the OS loader needs a physical file on the filesystem to link against.
Here is an analysis of the drawbacks of migrating away from `msgspec`, specifically given your codebase's usage.
### 1. The Major Drawback: Loss of `Struct` (Schema & Validation)
According to your grep results, you are using:
src/pipeline/server/trend_server_eds.py:from msgspec import Struct
This is the biggest friction point. `msgspec` isn't just a JSON serializer; it is also a schema validation library similar to Pydantic, but faster.
- **Current Behavior:** When you define a `class MyModel(msgspec.Struct)`, `msgspec` enforces types at runtime (e.g., ensuring a field is an `int`) and handles decoding JSON bytes directly into that class instance efficiently.
- **The Migration Cost:** Standard `json` and `ujson` **do not** support Structs or schemas. They only output raw `dict` and `list` objects.
- **The Workaround:** To replicate this functionality without `msgspec`, you would need to:
1. Rewrite all `msgspec.Struct` classes as standard Python `@dataclasses`.
2. Manually handle the conversion from `dict` (what `json.load` produces) to your Dataclass, or use a library like `dacite`.
### 2. Performance Impact (Trend Server)
You have a file named `trend_server_eds.py`. If "Trend" implies time-series data or large arrays of numbers:
- **msgspec:** Optimized for low memory usage and high speed. It decodes directly into objects.
- **standard json:** Significantly slower and uses more memory because it creates intermediate dictionary objects for everything.
- **ujson:** Faster than standard `json`, but often faster by cutting corners (less strictly compliant).
**Verdict:** If your payloads are small (config files, status checks), you won't notice the difference. If you are shipping megabytes of trend data, the migration to standard `json` will notably increase CPU usage and latency.
### 3. `ujson` is likely a Lateral Move
You mentioned considering `ujson` (UltraJSON).
- **The Trap:** `ujson` is _also_ written in C.
- If your `shiv` build is failing because it cannot load the `msgspec._core` C-extension, it will likely fail to load the `ujson` C-extension for the exact same reason.
- **Safe Bet:** If you want guaranteed packaging stability, you must use the standard library `json` module.
---
### Migration Comparison Table
|**Feature**|**msgspec**|**ujson**|**json (Std Lib)**|
|---|---|---|---|
|**Packaging**|**Difficult** (C-ext, needs extraction)|**Difficult** (C-ext, needs extraction)|**Easy** (Pure Python)|
|**Speed**|Excellent|Very Good|Moderate/Slow|
|**Type Safety**|High (`Struct`)|None (Returns Dicts)|None (Returns Dicts)|
|**Validation**|Built-in|None|None|
|**Code Change**|N/A|**High** (Must remove `Struct`)|**High** (Must remove `Struct`)|
---
### Alternative: Fix the Packaging (Keep `msgspec`)
Before you rewrite your code, you might be able to fix the `shiv` build. `shiv` (and `pex`) has a mechanism to handle C-extensions: **Use the preamble to extract the files.**
By default, `shiv` tries to run code out of the zip. C-extensions cannot run out of a zip. You need to tell `shiv` to unzip the underlying packages into a cache directory (usually `~/.shiv`) on the target machine.
**Try building your pyz with a preamble that forces extraction:**
If you haven't explicitly configured `shiv` to extract, the `.so` / `.pyd` files are trapped inside the zip.
1. Ensure your build command does **not** imply "pure python only" if that flag exists in your version.
2. When running the `.pyz`, `shiv` should automatically detect the need to extract if the dependencies are packaged correctly.
However, if you are cross-compiling (building on WSL/Linux for a different Linux distro, or building on Windows for Linux), `msgspec` will fail because the `.so` file inside the `pyz` matches the _build_ OS, not the _runtime_ OS.
### Summary Recommendation
1. **If the app handles heavy data:** Do not migrate. The performance penalty of standard `json` is high. Focus on fixing the `shiv` build (ensure the build environment matches the deployment environment exactly, e.g., build inside a Docker container that mimics the target).
2. **If the app is low-throughput:** Migrate to **standard `json`** + **`dataclasses`**.
- Change `msgspec.Struct` to `@dataclass`.
- Change `msgspec.json.decode` to `json.loads`.
- Avoid `ujson` as it doesn't solve the C-extension packaging risk.
**Would you like me to show you the code equivalent of converting a `msgspec.Struct` to a standard library `dataclass` implementation?**
Yes, this is 100% expected. In fact, it is guaranteed to fail.
The reason is simple: WSL is Linux.1 When you build inside WSL, you are building a Linux application, but you are trying to run it on Windows.
The Core Problem: .so vs .pyd
Since msgspec is a C-extension, it compiles into a binary file specific to the operating system.
-
On WSL (Linux):
msgspeccompiles to a.so(Shared Object) file (e.g.,msgspec.cpython-310-x86_64-linux-gnu.so). -
On Windows:
msgspeccompiles to a.pyd(Python DLL) file (e.g.,msgspec.cp310-win_amd64.pyd).
When you run the PYZ on Windows, the Windows Python interpreter looks inside the zip, finds the Linux .so file, and crashes because it doesn't know how to read Linux binary formats.
Why shiv doesn't auto-fix this
shiv is just a wrapper around pip install.2
-
When you run
shivinside WSL, it asks PyPI for "msgspec for Linux." -
PyPI gives it the Linux version.
-
shivzips that Linux version up.
The Solution: "Build where you Run"
If you want to deploy this PYZ to Windows users, you must build the PYZ on Windows (using Command Prompt or PowerShell, not WSL).
If you need to support both Linux (WSL) and Windows users:
You cannot create a single "Universal" PYZ if it contains C-extensions.3 You must build two separate artifacts:
-
pipeline-linux.pyz(Built inside WSL/Linux) -
pipeline-windows.pyz(Built in PowerShell/CMD)
Summary
This confirms my previous point: switching to ujson would not solve this. ujson is also a C-extension and would suffer the exact same "Build on Linux -> Fail on Windows" issue.
Would you like me to explain how to set up a simple GitHub Action (or script) to automatically build both Windows and Linux versions of your PYZ?
hm
i said ujson erroneously
i don't know the right answer
are dataclasses and json really the only pure python alternative?
how is it that pipx installed smoothly?
i would like to have a flow like pipx install except off line and file based
is there a way i can provide the .so and the .pyd in the .pyz to work cross platform?
Here are the answers to your questions regarding pure Python alternatives, why pipx worked, and how to achieve that "offline file-based" workflow.
1. Are dataclasses + json the only pure Python alternative?
No, but they are the most "standard" ones. If you want the schema validation and structure handling that msgspec provided—but in Pure Python—the best alternative is cattrs (Class Converters).
-
What is it?
cattrsworks with standard@dataclasses. It handles the complex logic of converting unstructured data (dicts/JSON) into structured data (Dataclasses) and back. -
Why it fits: It is pure Python. It effectively replaces the
msgspec.Structfunctionality. -
The Stack: You would use
json(standard lib) for parsing the string to a dict, andcattrsfor validating/converting that dict to a class.
Python
import json
from dataclasses import dataclass
from cattrs import structure
@dataclass
class TrendData:
id: int
values: list[float]
# Raw JSON string
raw_json = '{"id": 101, "values": [1.1, 2.2, 3.3]}'
# 1. Parse JSON (Pure Python standard lib)
data_dict = json.loads(raw_json)
# 2. Validate and Structuring (Pure Python library 'cattrs')
trend = structure(data_dict, TrendData)
2. How is it that pipx installed smoothly?
pipx works because it compiles/installs at runtime on the target machine, not the build machine.
-
You ran
pipx install .on Windows. -
pipxlooked at yourpyproject.toml/requirements.txt. -
It saw
msgspec. -
It reached out to PyPI from the Windows machine.
-
It downloaded the Windows binary wheel (
.pyd). -
It installed it.
Shiv, by contrast, froze the dependencies at build time on the Linux machine. It downloaded the Linux binary (.so) and locked it inside the zip.
3. Can I bundle .so and .pyd in one .pyz?
With shiv: No.
Shiv installs packages into a flat site-packages folder inside the zip. You cannot have two folders named msgspec (one for Linux, one for Windows) in the same directory; they will overwrite each other.
With pex: Yes.
PEX is a more advanced alternative to shiv. It supports multi-platform builds. It allows you to specify "I want the wheels for Linux and Windows bundled in this file."
However, PEX files are complex. When you run them, they check the current OS and activate the correct internal wheel.
The PEX Command for a multi-platform build:
Bash
pex . --platform manylinux2014_x86_64 --platform win_amd64 -o pipeline.pex
Note: This creates a larger file because it contains binaries for both OSs.
4. The Solution: The "Offline Pipx" Flow
If you want the reliability of pipx (installing the correct binary for the correct OS) but you want it to be offline and file-based, you should use Wheel Bundling.
You don't give the user a .pyz. You give them a folder containing your source wheel and all dependency wheels.
Step 1: Download wheels for both platforms (On your build machine)
You can download dependencies for platforms other than the one you are on.
Bash
# Create a folder for the artifacts
mkdir dist_offline
# Download Windows dependencies
pip download -d dist_offline --only-binary=:all: --platform win_amd64 --python-version 3.11 msgspec ujson plotly
# Download Linux dependencies
pip download -d dist_offline --only-binary=:all: --platform manylinux2014_x86_64 --python-version 3.11 msgspec ujson plotly
# Build your own package into a wheel
python -m build --wheel --outdir dist_offline
Step 2: The Offline Install (On the Target Machine)
You zip up that dist_offline folder and send it to the user. They run:
Bash
# Install using only the local folder, no internet allowed
pip install --no-index --find-links=./dist_offline pipeline-project
Summary Recommendation
-
If you want a Single File Executable: You must switch to PEX and use the
--platformflags to bundle both Windows and Linux binaries. -
If you want Pure Python (easiest packaging): Switch to
cattrs+json. You lose some speed, but you gain "run anywhere" simplicity. -
If you can ship a Folder: Use the "Offline Pipx" method (Wheel bundling).
Would you like me to generate the code to replace msgspec.Struct with cattrs + dataclasses for one of your files?