The R Toolkit for Pathogen Genomics was developed with support from a Wellcome-funded project, as a practical learning resource for anyone who wants to build skills in R for data analysis, with a particular focus on pathogen genomics. It brings together a curated collection of learning materials, including courses, videos, tutorials, handbooks, and other useful resources, to support learners at different stages of their journey.
The toolkit is designed to serve both beginners who are new to R and more experienced users seeking to strengthen intermediate level skills or apply R in genomics related work. It includes foundational content on installing R and RStudio, understanding key concepts in R programming, and developing confidence in data handling and analysis. It also provides more advanced resources for users interested in topics such as data visualisation, reporting, modelling, and genomics specific applications in R.
By organising resources into beginner, intermediate, and genomics focused sections, this toolkit offers a structured pathway for self-directed learning. Users can explore materials according to their current level of experience and learning needs, while gradually building the knowledge and practical skills required to use R effectively in pathogen genomics studies.
This toolkit is intended to support capacity strengthening in data analysis and to help widen access to high quality training opportunities in pathogen genomics. Whether you are a student, researcher, laboratorian, epidemiologist, or public health professional, this resource provides a starting point for developing practical and relevant R skills for genomics research and analysis.
Before You Start
Use this section to orient learners and get R + RStudio installed.
How to use this toolkit
Start with the installation resources, then move through the stages in order: beginner, intermediate, and genomics-focused material.
Install R and RStudio
Follow the step-by-step PDF guide first, then use the video demonstration if you need a visual walkthrough.
Important note
You need both R and RStudio installed before most of the linked materials will be useful.
Foundational
Core introductions to R syntax, data handling, and practical health-data learning resources.
R programming for beginners - Why you should use R
R Programming 101 (YouTube) | Video | 2018-12-14
R programming is typically used to analyze data and do statistical analysis. In this video, the host talks about why R is a better option than other statistical packages and software options (including SPSS, STATA, SAS etc.) They also give a short demonstration in which they calculate the mean, median of two variables, plot a histogram and calculate the correlation coefficient.
R Tutorial
W3 Schools | Tutorial | Unknown
Learn R syntax through a step by step tutorial. Sign up for a free account; with their "Try it Yourself" editor, you can edit R code and view the result
The Epidemiologist R Handbook
Applied Epi | Textbook (Online) | 2024-09-18
This handbook has been used over 3 million times by 850,000 people around the world. It serves as a quick R code reference manual (online and offline) with task-centered examples that address common epidemiological problems.
Introduction to R for Health Data Science
The University of Manchester | Textbook (Online) | Unknown
As a health data scientist, it is vitally important that you have a firm understanding of a statistical programming language, and that you can work in a clear, reproducible fashion. This course will provide you with the baseline skills to use R for health data science.
Introduction to R for Health Researchers
Afredac | eLearning Course | 2024-10-06
This e-learning is designed to equip clinical health researchers, public health professionals, bioinformaticians, and final-year bachelor’s and postgraduate students in health-related fields with essential data analysis skills. This course provides a comprehensive introduction to R programming, focusing on practical applications in health research. Participants will learn to handle, analyse, and visualise data, enhancing their capacity to make evidence-based decisions in research and public health interventions. Whether you're new to coding or seeking to expand your analytical toolkit, this course offers a solid foundation for leveraging R in health sciences.
Intermediate
Practice resources for analysis, visualization, reporting, and applied workflows.
Making an antigenic map from titer data using Racmacs
github | training material
This covers the process of making an antigenic map using Racmacs and viewing and saving the resulting map and images and interactive plots of the map.
Racmacs: Antigenic Cartography Macros
CRAN repository | training material | 2025-07-01
A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.
Introduction to Modelling Infectious Diseases in R
GRAPH network | e-learning modules
Interactive video lessons that cover the basics of infectious disease mathematical modelling, starting from the principles of compartmental SIR type models, how to present them theoretically using differential equations, and building and solving the models in the R software.
Further Data Analysis with R, Using Real-World Data from HIV, TB and Malaria
GRAPH network | e-learning modules
The curriculum includes: - Working with dates, factors and strings - Techniques for thorough data cleaning - Implementing conditional functions for advanced data examination - Joining data sources for enriched analysis - Employing loops for systematic data processing - Advanced string manipulation
Data Reporting with R:
GRAPH network | e-learning modules
The curriculum includes: - Building demographic pyramids to represent population structures - Creating data tables for comprehensive data overview - Developing choropleth maps to visualize patterns across regions - Automating the generation of visualizations - Analyzing time series trends - Parametrizing reports for tailored data presentation
General compilation of resources
Dr Gabriela K Hajduk | Blog post with compilation of different R sources
Beyond the Basics: Intermediate R Training Part 1-Data Visualization
Afredac | eLearning Course | 2025-08-12
This course is part two of a series program organized by the RForFun Club aimed at empowering health workers with practical R programming skills. The club is dedicated to fostering peer-to-peer learning and collaboration among health researchers and data enthusiasts. Facilitated by experienced professionals, the CoP offers ongoing support, discussions, and hands-on sessions to enhance participants’ R programming skills. Through this community, members can engage with like-minded individuals, share resources, and stay updated on the latest trends and tools in health data analysis.
Advanced
Genomics, bioinformatics, single-cell, and antigenic cartography resources.
A repository of free resources on GitHub
GitHub | A repository of resources
A curated collection of free resources to help the aspiring computational biologist learn about the R programming language. It has wider resources on R learning, but what’s most relevant, is the genomics section.
Computational Genomics with R
Dr. Altuna Akalin | Textbook | 2020
The aim of this book is to provide the fundamentals for data analysis for genomics. We developed this book based on the computational genomics courses we are giving every year. We have had invariably an interdisciplinary audience with backgrounds from physics, biology, medicine, math, computer science or other quantitative fields. We want this book to be a starting point for computational genomics students and a guide for further data analysis in more specific topics in genomics. This is why we tried to cover a large variety of topics from programming to basic genome biology. As the field is interdisciplinary, it requires different starting points for people with different backgrounds. A biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology. In the same manner, a more experienced person might want to refer to this book when needing to do a certain type of analysis, but having no prior experience.
Introduction to R for Bioinformatics
UC Davis Bioinformatics Core | Textbook - practical | 2022-02-01
Get off to a good start in bioinformatics with this three-part online workshop in R. This workshop lays the foundation or successful bioinformatics experiments, including RNA-Seq, single cell RNA-Seq, epigenetics, and more. Get off to a good start in bioinformatics with this three-part online workshop in R. The three three-hour sessions combine lecture and exercises in a survey of the basics of R for bioinformatics. Completion of this material will allow participants to get the most out of our other experiment-centric workshops. We recommend this course of all beginners.
Depository of materials
Washington University in St. Louis | Depository - lecture recordings & materials freely available
This is a cloud location
Introduction to Single-Cell Analysis with Bioconductor
Textbook - R & Bioconductor Manual | 2025