Get Started
Main Idea
The main idea of admiralpy is that an ADaM dataset is built by a sequence of
derivations. Each derivation adds one or more variables or records to the
processed dataset. This modular approach makes it easy to adjust code by adding,
removing, or modifying individual derivation steps. Each derivation is a single
function call that accepts a pandas.DataFrame and returns a new DataFrame.
In this guide we explore the different types of derivation functions offered by
admiralpy, the naming conventions they follow, and how to chain them together.
Setup
We will use a small set of illustrative SDTM-like datasets throughout this guide. In a real project you would read these from SAS Transport (XPT) files or a data warehouse.
import pandas as pd
import numpy as np
from admiralpy import (
derive_vars_dtm,
derive_vars_merged,
derive_param_computed,
convert_dtc_to_dt,
filter_extreme,
derive_var_base,
derive_var_chg,
derive_var_pchg,
)
# ── Exposure (EX) ─────────────────────────────────────────────────────────────
ex = pd.DataFrame({
"STUDYID": ["PILOT01"] * 5,
"USUBJID": ["01-701-1015", "01-701-1015", "01-701-1023",
"01-703-1086", "01-716-1024"],
"EXSEQ": [1, 2, 1, 1, 1],
"EXSTDTC": [
"2014-01-02T10:00",
"2014-02-01T08:00",
"2014-01-05T09:30",
"2014-01-08T11:00",
"2014-01-10T14:00",
],
"EXTRT": ["XANO", "XANO", "XANO", "XANO", "XANO"],
"EXDOSE": [54, 54, 54, 54, 54],
})
# ── Demographics (DM) ─────────────────────────────────────────────────────────
dm = pd.DataFrame({
"STUDYID": ["PILOT01"] * 5,
"USUBJID": ["01-701-1015", "01-701-1023", "01-703-1086",
"01-703-1096", "01-716-1024"],
"RFSTDTC": [
"2014-01-02",
"2014-01-05",
"2014-01-08",
"2014-01-12",
"2014-01-10",
],
"AGE": [63, 52, 70, 45, 58],
"SEX": ["F", "M", "F", "M", "F"],
"COUNTRY": ["USA", "USA", "CAN", "CAN", "USA"],
})
# ── Disposition (DS) ─────────────────────────────────────────────────────────
ds = pd.DataFrame({
"STUDYID": ["PILOT01"] * 5,
"USUBJID": ["01-701-1015", "01-701-1023", "01-703-1086",
"01-703-1096", "01-716-1024"],
"DSDECOD": ["FINAL LAB VISIT", "FINAL LAB VISIT", "COMPLETED",
"FINAL LAB VISIT", "COMPLETED"],
"DSSTDTC": [
"2014-07-02",
"2014-07-10",
"2014-06-20",
"2014-07-15",
"2014-06-28",
],
})
# ── Vital Signs (VS) ──────────────────────────────────────────────────────────
vs = pd.DataFrame({
"USUBJID": (["01-701-1015"] * 4 + ["01-701-1023"] * 4),
"VSTESTCD": ["SYSBP", "DIABP", "SYSBP", "DIABP"] * 2,
"VSSTRESN": [121.0, 51.0, 121.0, 50.0, 130.0, 79.0, 128.0, 77.0],
"VSPOS": ["SUPINE"] * 8,
"VISIT": ["Baseline", "Baseline", "Week 2", "Week 2"] * 2,
"VISITNUM": [1, 1, 3, 3] * 2,
})
# Build a working ADVS base
advs = vs.assign(
PARAM = vs["VSTESTCD"].map({"SYSBP": "Systolic BP (mmHg)",
"DIABP": "Diastolic BP (mmHg)"}),
PARAMCD = vs["VSTESTCD"],
AVAL = vs["VSSTRESN"],
AVISIT = vs["VISIT"],
AVISITN = vs["VISITNUM"],
)
# Build an ADSL skeleton
adsl = dm.copy()
print("EX shape:", ex.shape)
print("DM shape:", dm.shape)
print("ADVS shape:", advs.shape)
Derivation Functions
The most important functions in admiralpy are the derivation functions.
They add variables or records to the input dataset without changing existing
rows. All derivation functions start with derive_.
| Prefix | Pattern | Example |
|---|---|---|
derive_var_* |
Derives a single variable | derive_var_base() → BASE |
derive_vars_* |
Can derive multiple variables | derive_vars_dt() → *DT + *DTF |
derive_param_* |
Derives a new parameter record | derive_param_computed() → MAP |
Example: Adding Variables with derive_vars_merged
derive_vars_merged() adds variable(s) from an additional dataset, mirroring
the R derive_vars_merged() function.
Below we add the treatment start datetime (TRTSDTM) and its imputation
flag (TRTSTMF) to adsl by pulling the first non-missing exposure start
datetime for each subject.
# Step 1: Convert the DTC string to a proper datetime column
ex_ext = derive_vars_dtm(
ex,
dtc="EXSTDTC",
new_vars_prefix="EXST",
highest_imputation="n",
)
print(ex_ext[["USUBJID", "EXSTDTM"]].head())
USUBJID EXSTDTM
0 01-701-1015 2014-01-02 10:00:00
1 01-701-1015 2014-02-01 08:00:00
2 01-701-1023 2014-01-05 09:30:00
3 01-703-1086 2014-01-08 11:00:00
4 01-716-1024 2014-01-10 14:00:00
# Step 2: Merge TRTSDTM (first exposure datetime per subject) into ADSL
adsl = derive_vars_merged(
adsl,
dataset_add=ex_ext,
by_vars=["STUDYID", "USUBJID"],
new_vars={"TRTSDTM": "EXSTDTM", "TRTSTMF": "EXSTTMF"},
filter_add=lambda df: df["EXSTDTM"].notna(),
order=["EXSTDTM", "EXSEQ"],
mode="first",
)
print(adsl[["USUBJID", "TRTSDTM", "TRTSTMF"]])
USUBJID TRTSDTM TRTSTMF
0 01-701-1015 2014-01-02 10:00:00 None
1 01-701-1023 2014-01-05 09:30:00 None
2 01-703-1086 2014-01-08 11:00:00 None
3 01-703-1096 NaT None
4 01-716-1024 2014-01-10 14:00:00 None
Example: Adding Records with derive_param_computed
derive_param_computed() adds a derived parameter record for each
by-group. Below we compute Mean Arterial Pressure (MAP) from SYSBP and DIABP:
$$MAP = \frac{SYSBP + 2 \times DIABP}{3}$$
advs = derive_param_computed(
advs,
by_vars=["USUBJID", "AVISIT", "AVISITN"],
parameters=["SYSBP", "DIABP"],
set_values_to={
"AVAL": lambda w: (w["AVAL_SYSBP"] + 2 * w["AVAL_DIABP"]) / 3,
"PARAMCD": "MAP",
"PARAM": "Mean Arterial Pressure (mmHg)",
},
)
print(
advs[advs["PARAMCD"] == "MAP"][
["USUBJID", "AVISIT", "PARAMCD", "AVAL"]
].reset_index(drop=True)
)
USUBJID AVISIT PARAMCD AVAL
0 01-701-1015 Baseline MAP 74.333333
1 01-701-1015 Week 2 MAP 73.666667
2 01-701-1023 Baseline MAP 96.000000
3 01-701-1023 Week 2 MAP 94.000000
Note
For convenience, admiralpy also provides compute_bmi() and
compute_bsa() to directly compute BMI and BSA parameters.
Other Types of Functions
Computation Functions
Computation functions operate on scalar values or pd.Series objects and
return converted values. They are typically used inside
DataFrame.assign() / mutate() calls.
# Add the date of the final lab visit to ADSL using convert_dtc_to_dt()
adsl_labs = derive_vars_merged(
dm,
dataset_add=ds,
by_vars=["USUBJID"],
new_vars={"FINLABDT": "DSSTDTC"},
filter_add='DSDECOD == "FINAL LAB VISIT"',
)
adsl_labs["FINLABDT"] = convert_dtc_to_dt(adsl_labs["FINLABDT"])
print(adsl_labs[["USUBJID", "FINLABDT"]])
USUBJID FINLABDT
0 01-701-1015 2014-07-02
1 01-701-1023 2014-07-10
2 01-703-1086 NaT
3 01-703-1096 2014-07-15
4 01-716-1024 NaT
convert_dtc_to_dt() can also be applied in a DataFrame.assign() call:
adsl_labs = adsl_labs.assign(
RFSTDT=lambda df: convert_dtc_to_dt(df["RFSTDTC"])
)
print(adsl_labs[["USUBJID", "RFSTDTC", "RFSTDT"]].head(3))
USUBJID RFSTDTC RFSTDT
0 01-701-1015 2014-01-02 2014-01-02
1 01-701-1023 2014-01-05 2014-01-05
2 01-703-1086 2014-01-08 2014-01-08
Filter Functions
Filter functions return a subset of the input dataset. filter_extreme()
extracts the first or last record within each by-group.
# Extract the most recent MAP record per subject
advs_lastmap = (
advs[advs["PARAMCD"] == "MAP"]
.pipe(
filter_extreme,
by_vars=["USUBJID"],
order=["AVISITN", "PARAMCD"],
mode="last",
)
)
print(advs_lastmap[["USUBJID", "AVISIT", "PARAMCD", "AVAL"]].reset_index(drop=True))
BDS Findings Workflow: Baseline, CHG, and PCHG
A very common BDS workflow derives baseline (BASE), change from baseline
(CHG), and percent change from baseline (PCHG).
# Add a baseline flag column for the first visit
advs_bds = advs.copy()
advs_bds["ABLFL"] = advs_bds["AVISITN"].apply(lambda x: "Y" if x == 1 else None)
# Chain all three derivations together
advs_bds = (
advs_bds
.pipe(derive_var_base, by_vars=["USUBJID", "PARAMCD"])
.pipe(derive_var_chg)
.pipe(derive_var_pchg)
)
print(
advs_bds[["USUBJID", "PARAMCD", "AVISIT", "AVAL", "BASE", "CHG", "PCHG"]]
.sort_values(["USUBJID", "PARAMCD", "AVISITN"])
.reset_index(drop=True)
)
USUBJID PARAMCD AVISIT AVAL BASE CHG PCHG
0 01-701-1015 DIABP Baseline 51.0 51.0 0.0 0.000000
1 01-701-1015 DIABP Week 2 50.0 51.0 -1.0 -1.960784
2 01-701-1015 MAP Baseline 74.33 74.33 0.0 0.000000
...
Date/Time Derivation
derive_vars_dt() converts an ISO 8601 character DTC column to a proper
date variable, with optional partial-date imputation.
from admiralpy import derive_vars_dt
mhdt = pd.DataFrame({
"MHSTDTC": [
"2019-07-18T15:25:40", # full datetime → date extracted
"2019-07-18", # complete date
"2019-02", # missing day → imputed to first
"2019", # missing month+day → imputed to Jan 1
"", # empty → NaT
]
})
result = derive_vars_dt(
mhdt,
new_vars_prefix="AST",
dtc="MHSTDTC",
highest_imputation="M", # impute up to month level
date_imputation="first",
)
print(result)
MHSTDTC ASTDT ASTDTF
0 2019-07-18T15:25:40 2019-07-18 None
1 2019-07-18 2019-07-18 None
2 2019-02 2019-02-01 D
3 2019 2019-01-01 M
4 NaT None
The ASTDTF flag column indicates which date component was imputed:
"D" = day, "M" = month (and day).
Naming Conventions
admiralpy uses consistent naming to make functions easy to find:
| Pattern | Description |
|---|---|
derive_var_* |
Derives a single variable |
derive_vars_* |
Can derive multiple variables |
derive_param_* |
Derives a new parameter record |
compute_* |
Vector computation (scalar or Series input) |
convert_* |
Type conversion helper |
filter_* |
Returns a filtered subset of the input dataset |
Next Steps
- Explore the full API Reference for detailed parameter descriptions and more examples.
- See the Changelog for the list of functions available in the current release.
- Raise issues or ask questions on the GitHub repository.