Applied best practices in code conversion from SAS production processes to R

Applied best practices in code conversion from SAS production processes to R

Company Information

For one of the world’s largest market research companies, staying the engineering leader means staying abreast of the latest trends and technology in data management and analysis.  SAS just wasn’t the right tool for 21st century analytics:  it had an outdated syntax, simplistic data structures, processing capacity woefully inadequate for modern algorithms, and a $7 million/year price tag.   The incentive was there to make a big change with a big payoff, and ProCogia was instrumental in designing and executing a migration plan.

The Challenge

Like many organizations that have deeply invested in SAS, the company faced a variety of challenges in migrating away from it. SAS not only offered a comprehensive suite of tools and analytic environments, but it also supported an entire infrastructure that needed an adequate replacement. Complicating matters further, the data was dispersed—analysts often had to access necessary programs and data from file shares or local workstations. The situation was exacerbated by the departure of the authors of the most complex macros, leaving behind only a few analysts who understood certain processes. Moreover, most of the programs were run interactively with minimal logging or documentation, making it difficult to trace past processes. There was also a notable lack of long-term testing strategies to ensure the integrity of these programs. Additionally, the company lacked internal staff fully comfortable working with both SAS and R, adding another layer of complexity to the migration effort.

Procogia’s Approach

This marketing company set an ambitious goal of migrating nearly ~200,000 lines of SAS code into R within a year and ProCogia stepped up to the challengeWe proposed a line-for-line translation of all programs as the fastest approachWe designed a standardized structure for each process that included logging, unit and integration testing, sample data, and data integrity processesOur client had identified Databricks as an analytic platform, and we designed a style guide for migrating the SAS programs into tidyverse/sparklyr code that would run efficiently in the new environment.

Staffing

ProCogia is proud to serve our clients with a globe-spanning staff. Technical leads and client managers serve our clients locally in North America. A team of developers was recruited in India with expertise in SAS and R. We worked closely with a team of stakeholders within the company and provided daily updates and progress reports. Our staffing approach allows for a rapid deployment of experienced talent, and we were able to fill the team in short order.

Preparation

All SAS code had to be documented and prepared prior to migration. We met individually with subject matter experts to identify and verify appropriate code. Sample input data had to be created and cached that could be used for both SAS and R. Once these programs were ready, our team began the meticulous process of a thorough code review of each SAS program, identifying possible bugs and errors, and individually outputting all intermediate data steps to a folder to verify the translation. Final outputs were verified with the product owners and the programs were turned over to the development team for migration.

Code Migration

ProCogia’s development team translated each SAS program line by line. As our developers migrated each section of code, they could reference cached versions of the same SAS output that were created in the preparation phase. This ensured that at any point in the migration, developers had a point of truth to refer to so that errors didn’t compound through the course of the program. We developed a set of testing scripts that would verify that the migration reproduced the SAS output to a predetermined acceptance criteria.

CI/CD Strategy

Essential to this plan was the long-term viability of these programs. To this end, we used the testthat package as an essential part of each set of programs. We included tests of the cached input data to verify its integrity, we included unit tests for all functions, and integration tests that evaluated the outputs based on verified versions cached in the repository. Each time a process was kicked off, these tests would run automatically and alert the analytic team to any changes or failures.

Staff Support

Staff were trained in R through Posit’s training program, but they were not prepared to immediately manage their code. During the migration process, our developers met with the client analysts to support them as they began their journey into open-source analytics. We have continued to support their work past the initial migration.

The Results

The bulk of the migration was completed within the year and the client was able to end their contract with SAS after 18 months.  The success of this project helped our client realize the following benefits immediately:

Annual savings of $7 million from SAS licenses.

Analytic workforce have successfully transitioned from SAS analysts to R developers.

We identified hundreds of bugs and errors and made their programs run faster, more efficiently, and more accurately.

Each business-critical set of programs included version control, CI/CD checks, and data integrity verification.

Retired all on-prem data analytic servers to Azure Cloud resources

Explore more stories

Dig deeper into data development by browsing our blogs…

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.