Reproducibility with R
In order to analyze any dataset you need a process. If those data sets involve vast amounts of information, how do you always ensure that the same exact same processes are followed each time?
If you analyze a data set several times over using the same algorithms and the same tools, the assumption is that the results produced would be the same each time. Right? Wrong. This is not always the case. Why, because it’s often very difficult to keep track of every step of the analysis and if this is deviated from, the results will vary.
Variations in results can devalue the accuracy of data analysis. Businesses that are fully invested in data science simply can’t afford inaccuracies in their data, as the results are often what drive key strategy and the business decision making process.
A new R product, called Targets, has been developed by a leading pharmaceutical company, Eli Lilly. Targets is an incredibly important tool for data science as it allows a reproducible workflow to be maintained.
Workshops to facilitate data reproducibility with R
As R experts at ProCogia, our experience working with R is unrivalled; we are currently the only full service RStudio partner on the Pacific West Coast. We work with cutting edge technology, such as the newly developed Targets package, which allows us to offer our clients the very latest in data science techniques.
As well as presenting a keynote presentation and workshop at the global CascadiaR conference, our team holds bespoke training sessions and practical workshops for our clients focusing on the R Targets package. If you’d like to know more about our training programs, which can be delivered online or in person, we’d love to hear from you.
To discuss your requirementsContact us or Call us on 1 425-230-7396
Cutting edge technology
The open source, freely available Targets package supersedes Drake, an older R-focused package. Targets creates a framework which wraps existing analysis in a code allowing the user to detect when a change has been made to an existing analytics program. This ability to detect changes allows users to go back in time and reproduce analysis exactly.
Targets has been developed to be accessible to R users and allows data scientists and researchers to work entirely within R. It can easily be adopted and combined within existing workflows, helping users maintain their data analysis projects.
Many organisations use R to help analyze statistical information such as customer retention figures, customer churn rates etc. This information often runs onto many thousands of lines of code and, prior to Targets, the onus has always been on the analyst to document their approach diligently.
The introduction of Targets ensures the accuracy of replicated analysis whilst also addressing any compliance issues as, in effect, it creates an audit trail of how a data set has been analyzed. Targets enable complicated workloads to be reproduced at the push of a button.
Benefits of using Targets:
The benefits of using R’s open source Targets package includes:
- Time savings
- Ease of reproducibility
- Creation of audit trail
- Customisable coding
How Targets works
Targets is applied at a project level; a small amount of code wraps around existing analytics code. The whole purpose of Targets is to track what the code within is doing.
“We always take time to communicate to our clients the importance of starting with the correct base. It’s worth spending time to get the right systems and software in place at the start of a task, this will ensure that future headaches further down the road can be avoided,” explains Mike, Data Science Consultant.
Who uses Targets?
Targets is perfect for repeating analytics work on differing data sets, for example drugs discovery trials. “The ability to create a new visualization or modify your findings on the fly without messing up any other part of your analytics or sacrificing accuracy is key.” adds Mike.
To find out more about the Targets R package and see the benefitsContact us or Call us on 1 425-230-7396