From SAS Macros to R Functions: A Seamless Transition for Efficient Programming

Table of Contents

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

A professional blog image with a split design. The left side represents SAS programming with elements like code snippets, the SAS logo, and macro variables. The right side represents R programming with R code snippets, the R logo, and elements like functions and the tidyverse package. In the center, a symbolic transition or bridge between SAS and R. The style is modern and sleek, with a dark theme and neon highlights, giving it a tech-savvy and sophisticated look.

Introduction 

My team specializes in SAS to R migrations. SAS is an established company that offers a wide variety of technologies to its customers, so every migration requires some flexibility in our approach. But every SAS migration requires my team to be able to be fluent in the SAS Macros system. 

SAS users love their macros for the same reason R developers love their functions. Essentially they serve the same purpose: automate efficient workflows that are less prone to errors. I think that a good macro programmer can transition to writing good R functions with relatively little fuss. This blog post will document my experience in guiding this transition by highlighting the similarities and important differences between macros and functions.

 

SAS Macros 

The best description of SAS macros I’ve found is that macros are programs that write other programs. They are programming tools that generate a piece of code incorporating a set of parameters. This is more efficient than programming multiple iterations that only differ by a few things. But essentially, there is no difference under the hood between making manual changes to the program vs using macros because macros are just shortcuts. 

The macro system in SAS includes two types, the macro object and the macro variable. Macro objects call some sort of subroutine that is user-defined or bundled with SAS. They are easy to spot because all of them have the % prefix. Macro variables are parameters that can be used in SAS programs and are identified using the & prefix. These macro variables can be anything: character strings, data set variable names, entire data sets, or even other macros. 

The following example shows how a macro variable can be added to the environment using the %LET macro and then used in a data step. In this case, %LET is a macro subroutine that assigns a value to the &macro_var variable.

The following example shows how a macro variable can be added to the environment using the %LET macro and then used in a data step. In this case, %LET is a macro subroutine that assigns a value to the &macro_var variable.

A well-constructed SAS program often employs a section of macro variables used to parameterize the workflow. These parameters can be used to pull data from a given source, subset these data based on a particular criterion, and output and file a report to the appropriate location. A process can be fully updated in the future by simply editing this section. It is all very handy and limits the opportunity for errors!

SAS users will be comforted to know that this is very similar in R. Many production R programs will include a similar set of parameters. These can be loaded automatically at startup or defined explicitly inside the program. Once defined, these variables can be used in R much the same way they can be used in SAS.

 

SAS Code

 

R Code

The glue package is a handy set of tools from the tidyverse that allows a program to interpret expressions within character strings. If variables or expressions are included in curly braces, glue will evaluate them and return the output into the final string.

				
					year <- 2024
month <- "JUNE"
programmer <- "BRIANCARTER"

input <- glue::glue("/users/{programmer}/SAS/{year}")
output <- glue::glue("/users/{programmer}/SAS/{year}/{month}")

readRDS(glue::glue("{input}/data_for_{year}.rds")) |>
  dplyr::filter(month == !!month) |>
  saveRDS(glue::glue("{output}/{month}.rds"))
				
			

User-defined macros

The most powerful application of the macro system is the ability for SAS programmers to develop custom macros for any given task. Earlier in my career, I would run a series of models for a research project, pull and organize the estimates, and then use PROC REPORT to generate a pretty table for publication. Rather than develop de novo code each time, I could write a general-purpose macro that did the work for me.

When I work with SAS programmers migrating to R, I like to point out that the general anatomy and construction of a custom R function is very similar to a SAS macro. A user-defined SAS macro begins with the %MACRO command to create a named object that includes a series of macro variables that are included as parameters in the body of the macro to automate some task. The end of the macro is closed using the %MEND command.

R functions follow a very similar structure: a user defines a named function() object that includes an arbitrary number of arguments that parameterize the body of the function. Any object the user wishes to retain from the function is included in return().

 

SAS Code

R Version

				
					fun <- function(dat, event, time, categorical, continuous) {
  require(survival)  

  
  # define my model formula
  f <- formula(
    paste("Surv(", time, ",", event,") ~ ",
          paste(c(categorical, continuous), collapse = "+"))
  )
  
  # fit the model
  fit <- coxph(f, data = df, ties = "breslow")
  
  # Pull the confidence intervals
  limits <- confint(fit) |> exp() |> data.frame()
  
  # format my output
  out <- data.frame(estimates = exp(coef(fit)),
                    limits) 
  names(out) <- c("HazardRatio", "HRLowerCL", "HRUpperCL")
  
  # return a single object
  return(out)
  
}

fun(dat = df, 
    event = "event", 
    time = "time", 
    categorical = "exposure", 
    continuous = "covariate")
Loading required package: survival
                    HazardRatio HRLowerCL HRUpperCL
exposureNot Exposed    1.625042 0.3829869  6.895170
covariate              1.159639 0.9735426  1.381307
				
			

Conditional programming

A basic use of the SAS data step is the IF-ELSE conditional logic for cleaning and deriving variables, but there is a complimentary %IF-%ELSE for conditionally evaluating a macro flow. These provide the programmer with an option to build a single macro that behaves differently based on the input parameters and corresponds identically to IF() – ELSE()within an R function.

 

SAS Code

R Code

				
					fun <- function(dat, event, time, categorical, continuous, subset) {
  
  require(survival)
  
  # define my model forumla
  f <- formula(
    paste("Surv(", time, ",", event,") ~ ",
          paste(c(categorical, continuous), collapse = "+"))
  )
  
  # fit the model
  fit <- coxph(f, data = df, ties = "breslow")
  
  # Pull the confidence intervals
  limits <- confint(fit) |> exp() |> data.frame()
  
  # format my output
  out <- data.frame(estimates = exp(coef(fit)),
                    limits) 
  names(out) <- c("HazardRatio", "HRLowerCL", "HRUpperCL")
  
  if (subset == 1) {
    out <- out["exposureNot Exposed",]
  }


  # return a single object
  return(out)
  
}

fun(dat = df, 
    event = "event", 
    time = "time", 
    categorical = "exposure", 
    continuous = "covariate",
    subset = 1)
				
			

Evaluation of macro arguments

Both SAS macros and R functions allow the users to include arguments that are not used. This is referred to as Lazy evaluation and is often referenced in discussions of R functions. Lazy evaluation simply means that an object is not evaluated until it is explicitly required. Although I’ve never heard the term applied to SAS macros, the functionality is the same. A quick demonstration of lazy evaluation in SAS and R is below. Although both include arguments for x and y, only x is referenced so the program will run without error if we fail to provide a value for y.

 

SAS Code

R Code

				
					fun <- function(x, y) {
  i <- x **2
  cat(glue::glue("the value i is {i}"))
}

fun(x = 2)
the value i is 4
				
			

Default values are included in SAS macros to provide options to the end user. In my past life as a data analyst, I was responsible for running models and formatting output for several lead researchers. Each of these investigators had their own preferences for formatting output. In particular, each had her own preference for p-values. Rather than post hocmanual adjustments to the output, I could simply program these preferences into my macros. The default option would leave the p-values as they came from SAS; however, simply changing an option could fix them to the required style.

R provides an identical mechanism for default function arguments. If the user does not wish to change a default value, they do not have to explicitly reference it in the function call.

 

SAS Code

R Code

				
					df <- data.frame(pvalues = c(0.0128945, 0.001, 0.0682))


p <- function(dat, style = "None") {


if (style == "Mia") {
  dat <- dat |>
    dplyr::mutate(pvalues = format(pvalues, scientific = TRUE, digits = 3))
}

if (style == "Vicky") {
  dat <- dat |>
    dplyr::mutate(pvalues = round(pvalues, digits = 4))
}

return(dat)

}

p(df) # using default value
    pvalues
1 0.0128945
2 0.0010000
3 0.0682000
p(df, "Mia") # p-values for Mia
   pvalues
1 1.29e-02
2 1.00e-03
3 6.82e-02
p(df, "Vicky") # p-values for Vicky
  pvalues
1  0.0129
2  0.0010
3  0.0682
				
			

So how are functions and macros different?

Until this point, SAS macros and R functions have seemed pretty similar in their structure and use; however, there are notable differences under the hood that tend to cause problems for SAS users. As discussed above, a SAS macro is simply a programming hack: it takes a set of parameters as input to write a program that is executed when called. Under the hood, this is no different than simply writing the same DATA and PROC steps repeatedly.

The consequence of this is that anything created by a SAS macro will persist in the programming environment after execution. In the below example, I’ve written a macro that simply creates three subsets of an input dataset. Afterwards, we can run PROC DATASETS to print all the objects in the WORK library and you can see that these subsets are available to use.

SAS Code

 

R Code

The R function here is identical to the SAS macro, but after running it, none of the intermediate data frames are returned to memory.

				
					df <- data.frame(
  month = c("January", "February", "March", "April", "May",
            "June", "July", "August", "September", "October",
            "November", "December")
) 

fun <- function(dat) {
  jan <- dat |>
    dplyr::filter(month == "January")

  feb <- dat |>
      dplyr::filter(month == "February")
  
  march <- dat |>
      dplyr::filter(month == "March")

}
fun(df)

ls() # ls() displays all the objects in the environment
[1] "df"  "fun"
				
			

This demonstrates a fundamental difference in how R and SAS perform under the hood. R functions are not just a programming trick, they are self-contained objects that create a unique operating environment to safely process data without side effects. Side effects change the state of the program environment, either by adding/removing objects from memory, or changing the value of an object. In short, anything object created and used within an R function ceases to exist after execution unless explicitly returned to the global environment.

I have found that this often frustrates SAS programmers who are accustomed to macros as programming shortcuts. Often times it is useful for a macro to create a bunch of new datasets and output for reference later in the program. A good macro will use parameters to label all this output and SAS will keep it nicely organized. When they try to replicate this in an R function, they don’t understand why their data aren’t returned.

It takes a little practice to get into the new mindset, but this is an important feature of R that I have come to rely on. An R function will only return what I explicitly return. R functions will not accidentally overwrite existing objects; they will not accidentally change a system option; and they will not litter system memory with unneeded garbage.

 

Return values

R functions are designed to return only a single object. Users can return this object implicitly by just calling it as the last step of the function or explicitly using the return() function.

 

Implicit return

				
					fun <- function(x) {

  x * 5 

}

fun(x = 5)
[1] 25
				
			

Explicit return

				
					fun <- function(x) {

  return(x * 5)

}

fun(x = 5)
[1] 25
				
			

Return multiple objects

SAS users are accustomed to everything inside the macro being available after the macro runs. This is really useful! I would always construct my macros to provide me with a lot of output, even if I didn’t necessarily need it. I would run models and format the output into a table, but I would also want to save all the raw output from the models in case I needed to go back and review it.

We can do the same thing in R if we just organize our output into a list object and return that. An R list is a heterogeneous collection of objects. I like to think of it as a data bucket that I can throw anything into. I love this feature of R because it forces me to organize my output and drop the junk that I don’t need.

				
					fun <- function(dat, y, x) {

f <- formula(paste0(y, "~", x))

fit <- lm(f, data = dat)

results <- summary(fit)$coef |>
  data.frame()

final <- list(results = results, model_fit = fit)

return(final)

}

foo <- fun(mtcars, "mpg", "cyl")
names(foo)
[1] "results"   "model_fit"
				
			

The second way is less preferred because it breaks R’s no side-effect policy. Users can use the <<- assignment operator to force the creation of a variable in the parent environment. In principle, programmers can use this operator to output multiple objects from a function but this is considered poor practice. As a general rule, a function without any side effects is a safe program that can be used across contexts. That said, there are times when using the <<- operator is helpful. For example, I often use the <<- to create a counter that records how many times a function gets called in a process. This is useful for logging, reproducing errors, and general housekeeping of my processes.

				
					counter <- function() {
if (!exists("i")) i <- 0 # initialize my counter
i <<- i + 1
}

someFunction <- function() {
 ### do something useful
 counter()
}

someFunction()  # -> [1] 1
someFunction()  # -> [1] 2
someFunction()  # -> [1] 3

print(i)
[1] 3
				
			

Conclusions and best practices

In my experience, advanced SAS programmers accustomed to working in a macro environment will find the transition to R functions to be fairly intuitive once they have overcome the general differences between the two languages. When I started my career as a SAS programmer, my manager told me that the best way to learn macro programming was to do a lot of macro programming and I’d offer the same advice for R functions. Don’t be afraid to jump right in!

 

In conclusion, I’d like to offer a few tips for functional programming that SAS users should keep in mind when making their transition:

  1. Have a plan before you start typing: I like to think backwards. I explicitly define the details of my function output before outlining the steps needed to produce that output. Outlining those steps will tell me all the inputs that I need. I often write these steps out with # comments before programming the function. I’ve learned the hard way that working without a thorough plan will only produce a mess.
  2. Simple is better than complex: Follow the Unix philosophy with your functions. Each function should do exactly one thing well. It is better to string together a lot of simple functions than add features to one big complicated function. I’ve written 9000-line functions, they are rarely generalizable, can’t be reused, and are full of bugs that are difficult to find.
  3. Functions should be self-contained: Every input required for a function should be an argument to the function. Don’t rely on a function to just pull values from the global environment, explicitly name those parameters in your function definition.
  4. Think about the end user: A good function will include helpful messages to your users. Adding helpful message(), warning(), and stop() scripts in your functions will keep you aware of the limits of your function. You’ll use this function a year later and be glad you provided some guardrails.
  5. Documentation saves time: The roxygen package will create a skeleton documentation for each of your functions. This package mirrors the R documentation bundled with every package and it only takes a few minutes to fully document your function. You will never regret taking the time to do this.
  6. Don’t be afraid to jump right in: The best way to learn functional programming to to jump right in. Framing your workflows as repeatable functions is a mindset that can be learned. The more you practice, the more intuitive automation becomes.

 

Thank you for reading and keep an eye out for our next post in our SAS to R Guide series where we will be taking a look at the SAS Data Step.

Author

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.