The R Toolkit for Pathogen Genomics was developed with support from a Wellcome-funded project, as a practical learning resource for anyone who wants to build skills in R for data analysis, with a particular focus on pathogen genomics. It brings together a curated collection of learning materials, including courses, videos, tutorials, handbooks, and other useful resources, to support learners at different stages of their journey.

The toolkit is designed to serve both beginners who are new to R and more experienced users seeking to strengthen intermediate level skills or apply R in genomics related work. It includes foundational content on installing R and RStudio, understanding key concepts in R programming, and developing confidence in data handling and analysis. It also provides more advanced resources for users interested in topics such as data visualisation, reporting, modelling, and genomics specific applications in R.

By organising resources into beginner, intermediate, and genomics focused sections, this toolkit offers a structured pathway for self-directed learning. Users can explore materials according to their current level of experience and learning needs, while gradually building the knowledge and practical skills required to use R effectively in pathogen genomics studies.

This toolkit is intended to support capacity strengthening in data analysis and to help widen access to high quality training opportunities in pathogen genomics. Whether you are a student, researcher, laboratorian, epidemiologist, or public health professional, this resource provides a starting point for developing practical and relevant R skills for genomics research and analysis.

Before You Start

Use this section to orient learners and get R + RStudio installed.

How to use this toolkit

Start with the installation resources, then move through the stages in order: beginner, intermediate, and genomics-focused material.

Install R and RStudio

Follow the step-by-step PDF guide first, then use the video demonstration if you need a visual walkthrough.

Important note

You need both R and RStudio installed before most of the linked materials will be useful.

Foundational

Core introductions to R syntax, data handling, and practical health-data learning resources.

R programming for beginners - Why you should use R

R Programming 101 (YouTube) | Video | 2018-12-14

R programming is typically used to analyze data and do statistical analysis. In this video, the host talks about why R is a better option than other statistical packages and software options (including SPSS, STATA, SAS etc.) They also give a short demonstration in which they calculate the mean, median of two variables, plot a histogram and calculate the correlation coefficient.

Open resource

R Tutorial

W3 Schools | Tutorial | Unknown

Learn R syntax through a step by step tutorial. Sign up for a free account; with their "Try it Yourself" editor, you can edit R code and view the result

Open resource

The Epidemiologist R Handbook

Applied Epi | Textbook (Online) | 2024-09-18

This handbook has been used over 3 million times by 850,000 people around the world. It serves as a quick R code reference manual (online and offline) with task-centered examples that address common epidemiological problems.

Open resource

Introduction to R for Health Data Science

The University of Manchester | Textbook (Online) | Unknown

As a health data scientist, it is vitally important that you have a firm understanding of a statistical programming language, and that you can work in a clear, reproducible fashion. This course will provide you with the baseline skills to use R for health data science.

Open resource

Introduction to R for Health Researchers

Afredac | eLearning Course | 2024-10-06

This e-learning is designed to equip clinical health researchers, public health professionals, bioinformaticians, and final-year bachelor’s and postgraduate students in health-related fields with essential data analysis skills. This course provides a comprehensive introduction to R programming, focusing on practical applications in health research. Participants will learn to handle, analyse, and visualise data, enhancing their capacity to make evidence-based decisions in research and public health interventions. Whether you're new to coding or seeking to expand your analytical toolkit, this course offers a solid foundation for leveraging R in health sciences.

Open resource

Intermediate

Practice resources for analysis, visualization, reporting, and applied workflows.

Making an antigenic map from titer data using Racmacs

github | training material

This covers the process of making an antigenic map using Racmacs and viewing and saving the resulting map and images and interactive plots of the map.

Open resource

Racmacs: Antigenic Cartography Macros

CRAN repository | training material | 2025-07-01

A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.

Open resource

Introduction to Modelling Infectious Diseases in R

GRAPH network | e-learning modules

Interactive video lessons that cover the basics of infectious disease mathematical modelling, starting from the principles of compartmental SIR type models, how to present them theoretically using differential equations, and building and solving the models in the R software.

Open resource

Further Data Analysis with R, Using Real-World Data from HIV, TB and Malaria

GRAPH network | e-learning modules

The curriculum includes: - Working with dates, factors and strings - Techniques for thorough data cleaning - Implementing conditional functions for advanced data examination - Joining data sources for enriched analysis - Employing loops for systematic data processing - Advanced string manipulation

Open resource

Data Reporting with R:

GRAPH network | e-learning modules

The curriculum includes: - Building demographic pyramids to represent population structures - Creating data tables for comprehensive data overview - Developing choropleth maps to visualize patterns across regions - Automating the generation of visualizations - Analyzing time series trends - Parametrizing reports for tailored data presentation

Open resource

General compilation of resources

Dr Gabriela K Hajduk | Blog post with compilation of different R sources

Open resource

Beyond the Basics: Intermediate R Training Part 1-Data Visualization

Afredac | eLearning Course | 2025-08-12

This course is part two of a series program organized by the RForFun Club aimed at empowering health workers with practical R programming skills. The club is dedicated to fostering peer-to-peer learning and collaboration among health researchers and data enthusiasts. Facilitated by experienced professionals, the CoP offers ongoing support, discussions, and hands-on sessions to enhance participants’ R programming skills. Through this community, members can engage with like-minded individuals, share resources, and stay updated on the latest trends and tools in health data analysis.

Open resource

Advanced

Genomics, bioinformatics, single-cell, and antigenic cartography resources.

A repository of free resources on GitHub

GitHub | A repository of resources

A curated collection of free resources to help the aspiring computational biologist learn about the R programming language. It has wider resources on R learning, but what’s most relevant, is the genomics section.

Open resource

Computational Genomics with R

Dr. Altuna Akalin | Textbook | 2020

The aim of this book is to provide the fundamentals for data analysis for genomics. We developed this book based on the computational genomics courses we are giving every year. We have had invariably an interdisciplinary audience with backgrounds from physics, biology, medicine, math, computer science or other quantitative fields. We want this book to be a starting point for computational genomics students and a guide for further data analysis in more specific topics in genomics. This is why we tried to cover a large variety of topics from programming to basic genome biology. As the field is interdisciplinary, it requires different starting points for people with different backgrounds. A biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology. In the same manner, a more experienced person might want to refer to this book when needing to do a certain type of analysis, but having no prior experience.

Open resource

Introduction to R for Bioinformatics

UC Davis Bioinformatics Core | Textbook - practical | 2022-02-01

Get off to a good start in bioinformatics with this three-part online workshop in R. This workshop lays the foundation or successful bioinformatics experiments, including RNA-Seq, single cell RNA-Seq, epigenetics, and more. Get off to a good start in bioinformatics with this three-part online workshop in R. The three three-hour sessions combine lecture and exercises in a survey of the basics of R for bioinformatics. Completion of this material will allow participants to get the most out of our other experiment-centric workshops. We recommend this course of all beginners.

Open resource

Depository of materials

Washington University in St. Louis | Depository - lecture recordings & materials freely available

This is a cloud location

Open resource

Introduction to Single-Cell Analysis with Bioconductor

 Textbook - R & Bioconductor Manual | 2025

Open resource

An introduction to antigenic cartography

Open resource