Introduction
Our team specializes in migrating companies from SAS to open source alternatives such as R. We have successfully transitioned companies across industries including finance, market research, and non-profit research. Typically these migrations are straightforward and we follow a standardized process that delivers a high-fidelity migration to our clients.
Clinical research in the pharmaceutical industry presents my group with unique challenges because of the importance of data and analytic standards in the regulatory submission process. The Clinical Data Interchange Standards Consortium(CDISC) prescribes a set of guidelines and standards for clinical research reporting that has been embraced by the industry. The Food and Drug Administration (FDA) now requires that submissions conform to many of these standards including SDTM and ADaM datasets and Define-XML metadata.
SAS was closely involved in the development and implementation of these standards and the industry is understandably reluctant to transition away from an analytic platform and risk their development pipeline on the 20,000+ open source R packages available on CRAN.
In order to address this concern, an informal connection of companies and statistical programmers have developed a set of open source R packages for clinical reporting. The pharmaverse adopted packages in the R ecosystem for data management and reporting such as the tidyverse. Industry representatives identified gaps in these existing packages and are developing unified solutions to common industry workflows.
This blog post is the first in a series describing the pharmaverse. We will provide a broad overview of the various workflows covered by these packages. Subsequent blog posts will focus on individual packages and how they can be leveraged across each stage of clinical trial analysis from SDTM mapping, ADaM derivations, and TLG production.
SDTM
SDTM provides a set of data standards to streamline the process of collection, management, and reporting. Implementing these standards facilitates data aggregation, reanalysis of historical data, data portability, and regulatory review. It is one of the standards required for all data submissions to the FDA.
OAK is an R-based solution developed at Roche to automate SDTM Domains and implemented in the sdtm.oak package available on CRAN. The reusable algorithms available in this package provide a framework for modular programming with the goal of automating the conversion of raw clinical data to SDTM to standardized specification.
ADaM
ADaM datasets are a mandatory part of any submission to the FDA. The ADaM standards are detailed in accordance with the “Analysis Data Model Implementation Guide” available through CDISC. The admiral package, available on CRAN, introduces a toolbox of functions that sequentially derive new variables or parameters to construct an ADaM dataset.
Most of the functions in the admiral dataset follow a descriptive set of naming conventions to indicate their purpose and identify groups of similar functions. The advantage of these standards is that the programmer can focus on the data rather than the programming. The developers of the admiral package have provided users with a helpful database of functions mapped to variables across ADaM domains.

CDISC compliant xpt files
Clinical submissions require data packages to be formatted to certain standards of variable naming and types, labels and character lengths, among others. These data packages are submitted as .xpt files that are often sent through additional validation pipelines developed in an organization. This is often a labor-intensive process that is prone to human error without developing extensive automation macros.
Many of these standards were developed based on idiosyncrasies of SAS that R programmers don’t often think about. For example, R data frames rarely include labels, and character variables do not have a length attribute as a whole. Regardless, these standards must be adhered to and the xportr package was developed for streamlining this process for R programmers.
The xportr package, available in CRAN, was designed to streamline this process for an R-centric analytic pipeline while reducing possible errors resulting from manual processes. Clinical programmers can create ADaM datasets entirely in R using the admiral package and write CDISC compliant xpt files with well-defined metadata. The xportr package will also run a series of validations check on the data including checking variable names and types, labels and character lengths, and any unsupported values in the data.
In order to take advantage of the xportr package, users must import a specification matching their prepared ADaM dataset. In this case we are using sample data provided from the package.
library(xportr)
library(magrittr)
# load sample data
data("adsl_xportr")
# Spec files
var_spec <- readxl::read_xlsx("data/ADaM_spec.xlsx",
sheet = "Variables") %>%
dplyr::rename(type = "Data Type") %>%
dplyr::rename_with(tolower)
dataset_spec <- readxl::read_xlsx("data/ADaM_spec.xlsx",
sheet = "Datasets") %>%
dplyr::rename(label = "Description") %>%
dplyr::rename_with(tolower)
xportr includes functions that can apply single parts of the specification file to the dataset including xportr_type(), xportr_format(), and others. However, the real power of the xportr package is provided in a single wrapper function that will apply all specifications to the dataset and write a final .xpt file.
adsl_xportr |>
xportr::xportr(
var_metadata = var_spec,
df_metadata = dataset_spec,
domain = "ADSL",
path = "data/adsl.xpt"
)
── All variables in specification file are in dataset ──
── 50 reordered in dataset ──
To reduce the risk of passing dataset errors into the submission, xportr includes a number of validation checks and make necessary changes to the data or alert the programmer to these errors. We can demonstrate the utility of this feature by changing the type of a variable to something unsupported.
adsl_errors <- adsl_xportr |>
dplyr::mutate(SITEID = as.factor(SITEID))
adsl_errors <- adsl_errors |>
xportr::xportr_metadata(var_spec, "ADSL") |>
xportr_type(verbose ="warn")
── Variable type mismatches found. ──
✔ 1 variables coerced
Warning: Variable type(s) in dataframe don't match metadata: `SITEID`
Conclusion
In summary, the pharmaverse ecosystem is rapidly evolving to meet the needs of clinical research professionals transitioning from SAS to R. By leveraging packages like sdtm.oak for SDTM automation, admiral for ADaM derivations, and xportr for CDISC-compliant submissions, the industry is establishing a robust, standardized, and regulatory-compliant open-source workflow. These tools not only streamline the clinical trial analysis process but also reduce the risk of human error while maintaining flexibility and efficiency.
As this blog series continues, we will explore these packages in greater depth, providing practical insights into their implementation. By embracing the pharmaverse, pharmaceutical companies can modernize their data pipelines while maintaining compliance with FDA and CDISC requirements. Stay tuned for our next post, where we dive deeper into the admiral package and its role in simplifying ADaM dataset creation.