Size Comparisons: Dataset-JSON vs. XPT
Overview
This report examines one of the practical considerations when evaluating Dataset-JSON as a replacement for the legacy SAS XPORT (XPT) format: file size. The XPT format allocates fixed-width fields and pads records with blanks, which can waste considerable storage. Dataset-JSON, being a text-based format, compresses variable-length data efficiently for sparse or short-string heavy datasets — but can be larger for numerically dense data due to JSON’s human-readable representation.
The datasets used here come from a real clinical trial study (CDISC SDTM pilot data), spanning domain types such as demography, adverse events, laboratory results, questionnaires, and supplemental qualifiers. By comparing file sizes side-by-side we can see where Dataset-JSON offers storage savings and where XPT may be more compact.
Key questions this analysis answers:
- How does the size of each Dataset-JSON file compare to its XPT counterpart?
- Which datasets benefit most from the JSON format?
- Are there datasets where the JSON representation is actually larger than XPT?
Interactive Size Comparison Table
The table below shows all 21 datasets. By default it displays 12 rows sorted by XPT file size (largest first). You can:
- Sort any column by clicking the column header
- Search across all columns using the search box
- Adjust the number of rows shown using the “Show entries” control
- Page through results using the navigation at the bottom
Bar Chart: XPT vs. JSON File Sizes (Top 12 by XPT Size)
The chart below focuses on the 12 largest datasets (by XPT size). Hover over any bar for exact values.
Ratio Chart: JSON / XPT File Size Ratio
A ratio below 1.0 (dashed red line) indicates the JSON file is smaller than the XPT file; a ratio above 1.0 means JSON is larger. Hover for dataset details.
Summary
Across 21 datasets compared:
- 19 datasets are smaller as Dataset-JSON than as XPT.
- 2 datasets are larger as Dataset-JSON than as XPT.
- The average JSON/XPT ratio is 0.42 (values below 1 indicate overall space savings).
- The biggest space saving is in SE, while SC sees the largest size increase in JSON format.
These results highlight that Dataset-JSON is not universally more compact than XPT. Datasets with many short fixed-width fields (especially padded numerics) tend to compress well in JSON, while datasets with many long, dense numeric columns may grow. Understanding this trade-off is important when planning a regulatory submission strategy centred on Dataset-JSON.