Recognising the growing demand for R programming skills in the healthcare domain, the Global Health Data Science community hub is putting a Spotlight on: R for Health Data Research, which brings together freely available and helpful educational materials tailored to beginners in R for health data science.
This resource covers fundamental R concepts, data manipulation, analysis techniques and data visualisation, along with specialised packages and techniques employed in health research. It is aimed at students, researchers, health care professionals or anyone who is interested in learning R programming, to efficiently process, analyse and visualise health data.
Please note that this initiative does not claim ownership of the materials published. If you have any feedback, comments or know of additional materials that could benefit those seeking to learn R, we encourage you to share them with the community here.
Webinar: Getting started with R for Health Data Science
The 'Getting started with R for Health Data Science' webinar explored the dynamic world of R programming and its applications in health data research. This webinar serves as an entry point and offer you insights into the powerful capabilities and versatility that R brings in the pursuit of a diverse range of health research projects, focusing on data manipulation, analysis and presentation/visualisation.
Presentations:
- Overview of R programming language and demonstration, Miss Aashna Uppal
- Analysis of stunting in Bangladesh, a case study of presenting findings in R. Mr Md. Sojibul Islam
- Using R to support data preparation and visualisation for a research study in Brazil. Dr Soraida Aguilar
- User-Centred Dashboards for COVID-19 Trends in Africa. Dr Frank Kagoro
Materials from the session:
- Presentation slides (PDF)
- Downloading and installing R instructions (PDF)
- Sample case data file (Excel)
- Demo R script (R fille)
This webinar is available to replay below in English, Spanish and Portugese:
English | Spanish | Portuguese |
What is R, how can it be used in health data science?
R is a powerful and versatile programming language and environment that has gained increased attention in recent years, particularly due to its data science capabilities with respect to data manipulation, analysis and visualisation.
You can find a quick overview of R on the R-Project website. Below are some YouTube videos that give a good overview of the benefits and use of R for data analysis:
Alongside R there are other statistical programming languages that you may be aware of or have used, such as Python, STATA, SPSS, SAS. Below are some links that outline some of the differences and pros and cons for each language. This can help you decide whether R is right for your use case:
Examples of health research using R
R is used widely in the field of health data science. Below are some publications from different teams using R for Health Data science around the world.
- The National COVID-19 Epi Model (NCEM): Estimating cases, admissions and deaths for the first wave of COVID-19 in South Africa
- Primary healthcare protects vulnerable populations from inequity in COVID-19 vaccination: An ecological analysis of nationwide data from Brazil
Learning the foundations of R
This section highlights resources that are helpful for those beginning their journey of learning R for health data science. These explore fundamental concepts that underpin R, from understanding variables and data types, to using basic operations and control structures.Whether you're completely new to programming or transitioning from another language, this foundational knowledge will provide you with a solid foundation to confidently navigate the world of R and embark on more complex data analysis.
Setting up R and RStudio
The two resources below contain simple instructions on how to download and step up R and RStudio for the first time.
Learning by doing
Often the best way to learn a new programming language is by doing. There are many great written, video and interactive tutorials available to practise R skills and test your knowledge. The W3 R tutorial is a great start and offers an interactive editor to try out R code. This Youtube video R Programming Tutorial - Learn the Basics of Statistical Computing gives a great introduction to R and its foundational features.
Another great option are books and manuals that contain R code to test along the way. A Succinct Introduction to R, by Steve Haroz is a great example of this and gives the reader a short introduction to the R language. It covers the basics of R and is a quick way to get up to speed before diving into the analysis and visualisation aspects.
A great foundational and comprehensive book to begin with is the second edition of R for Data Science. The first edition is also available in Spanish, Italian, and Turkish. This book will teach you how to get your data into R, get it into the most useful structure, transform it and visualise - with lots of code examples and step-by-step instructions
Applied Epi is a free online resource that teaches R from the basics. You can download the entire course as a PDF. This book is great because it comes with a package that you can install to practise all the exercises.
R in Action, Third Edition Data analysis and graphics with R and Tidyverse by Robert I. Kabacoff is another great resource for new to intermediate programmers. It also contains ample exercises and examples. This is a paid for resource.
There are also many high-quality R courses available from elearning sites such as Coursera, Udemy and edX. Not all of these are free, however many allow you to access some of the content for free. You are able to pay for the full set of features and access to certificates. Some notable courses include:
Both Codecademy and DataCamp are excellent platforms for those who are interested in learning R. They offer a wide range of well-structured courses that cater to various skill levels and include interactive code platforms to run, test and mark your progress. These sites have both paid-for and free courses and features.
The Global Health Network (THGN) Asia team in partnership with the International Centre for Diarrhoeal Disease Research, Bangladesh (iddcr,b) have recently delivered data science club sessions titled Introduction to R for Epidemiology. These hands-on sessions used materials from The Epidemiologist R Handbook and provided an introduction to epidemiology research and activities using R and RStudio. Session one and two are available below to re-watch and code along.
The use of R in each stage of the Health Data Research Lifecycle
Research planning, workflows and reproducibility
An important benefit to using programming languages such as R for data analysis is the ability to create reproducible workflows. This ensures that there is a structure and flow to the preparation, analysis and output of the data you work with. The article, R Workflow from Statistical Thinking describes this approach and gives a great overview of the book, R Workflow for Reproducible Data Analysis and Reporting.
A great tool to support your R analysis and research is Git. Git is a version control system that enables more effective collaboration, reproducibility and sharing of your R code. An Introduction to GIT and how to use it with RStudio is a quick guide to getting started.
Data preparation, cleaning and standardisation
The management and curation of data is a very important stage in the whole lifecycle of a data analysis project. It is often very time-consuming and requires a deep understanding of the content of the data itself and the techniques required to wrangle the data ready for analysis. Below are some resources which will show how this can be done using R.
Tidyverse is a comprehensive set of R packages designed to improve data management and analysis in R. With Tidyverse, you can easily perform essential data tasks such as cleaning, wrangling, and curation, thanks to packages like dplyr and tidyr. These packages provide a consistent and intuitive framework for tasks like filtering, transforming, reshaping, and visualising data, ensuring that your data preparation and exploration processes are more accessible and less error-prone.
The guide Tidyverse Basics: Load and Clean Data with R tidyverse Tools is a great way to start learning the Tidyverse packages and how they can streamline your data curation in R.
Statistical analysis
R offers a vast array of statistical capabilities for health research. Whether it be in the fields of epidemiology, health analytics or bioinformatics, R stands out as a powerful and versatile tool for researchers and scientists. Below are some resources for different types of statistical analysis in R.
Visualisations, dashboards and other outputs
Visualisations are an effective way to understand, interpret and present your data research findings. R is highly regarded for its data visualisation capabilities due to its rich ecosystem of packages and libraries, with the most prominent being ggplot2 and base R graphics. It excels in producing high-quality, customisable visualisations that cater for many different research needs. Below are some resources which give an overview of visualisation and how this can be done using R.
To support your visualisations and other research findings R Markdown provides a format that combines R code, text, tabular data and visualisations into a single document, making it easy to create narrative, reproducible reports and presentations. These R Markdown documents can be published on R Pubs enabling collaborative and accessible sharing of your data analysis.
Quarto is another great tool to support the publishing of your work. It is compatible with other languages such as Python, Julia, and Observable and allows you to publish dynamic documents that include descriptions, code, visualisations, and citations to enhance reproducibility and sharing.
R also supports interactive apps and dashboards through the creation of Shiny apps. Shiny apps in R are interactive web-based applications that enable users to interact with your data, models, visualisations. and analysis. The free book Mastering Shiny is a comprehensive guide for creating Shiny apps with lots of examples.
Sometimes in data analysis pipelines, you may want to leverage the power of R in other software/programming languages, such as Python, JavaScript, C#. An example scenario is if you have an existing application and want to show a plot or table but you cannot install R on the server, you can use the plumber API (Application Programmable Interface) to send information to your application without running R in the target application. Plumber is similar to shiny except it does not have an interface on its own, and will only show the elements you requested. For example, if you would like a table, you can request the table endpoint which returns just the data table. You are free to use any software of your choice to integrate R. Here is an introduction to plumber article and video Integrating R with Plumber APIs demonstration of plumber to get you started.
Ways to get help
Hand-on projects using R for health data science
Below are some great hands-on projects to help gain practical experience working with different health-relevant data.
Supplementary resources
Below are some other great resources to help continue your journey in learning R. These dive deeper into specific aspects of R and cater to various learning preferences and levels of expertise.
Please note that this initiative does not claim ownership of the materials published. If you have any feedback, comments or know of additional materials that could benefit those seeking to learn R, we encourage you to share them with the community here.