Data science skills are crucial to analysing ever-growing amounts of health data, but there exists a global inequity in data science skills. Many recent initiatives have focused on building research capacity, however, there is still a big gap in data science skills for researchers and healthcare workers. There is a need for building data science capacity and encouraging local collaboration to further advance data science expertise and sharing of best practices.

Of note – data science capacity building is broad. Data science refers to disciplines such as statistics, machine learning, computer science, mathematics, among others, which techniques are often used to analyse and interpret data. However, data science’s scope is not simply limited to the application of statistical or machine learning techniques on data, but rather, also incorporates a series of interdisciplinary and ethical considerations within the research pipeline as a whole. 

The Global Health Network (TGHN) and Health Data Research (HDR) Global strive to drive equity and build local research capacity across LMICs. To facilitate this capacity building, TGHN and HDR Global have fostered global partnerships in Africa, Asia, and LAC. These partnerships include the African coalition for Epidemic Research, Response and Training (ALERRT), the International Centre for Diarrheal Disease Research, Bangladesh (icddr,b), and Fiocruz. Together, these partners have designed a learning platform to facilitate data science capacity building. This learning platform consists of broad workshop-style training (“Data Clubs”), combined with tailored sessions for discussion and troubleshooting analytic issues (“Data Clinics”).  


The purpose of this toolkit is to provide a structured walkthrough of establishing a Data Science Club and Clinic, informed by the perspective of piloting such activities in Bangladesh by teams in TGHNAsia and icddr,b. We will add another module soon describing the implementation in Bangladesh in detail.

This toolkit is designed to guide professionals in all healthcare settings who are interested in enhancing data science capacity in their institutes or region through the essential steps of creating a space that encourages collaborative learning, sharing of knowledge, and practical application of data science skills. The toolkit covers everything from initial planning and organization to implementation and troubleshooting in regard to setting up Data Science Clubs and Clinics. 


A Data Club is a forum ​where participants get together over multiple repetitive workshop-style sessions to learn a particular set of data science skills. As alluded to in the introduction, these skills are broad – workshops may deliver skills related to statistics, to computer science, or other data science disciplines. Importantly, these workshops are tailored to local priorities, and sit within a research context, meaning the skills don’t have to be delivered in isolation, but rather, delivered alongside important research skills too. These sessions are an opportunity to engage with skills and networking that may not necessarily be offered as part of formal training or incorporated into job responsibilities. 

A Data Clinic is a scheduled session arranged to facilitate an exchange between researchers seeking assistance from data science experts in resolving specific issues encountered during various stages of research data management As such, Data Clinics provide targeted support to challenges identified by researchers on collection, management, analysis, interpretation, preservation, sharing, and health data reuse. Data Clinics can also be focused on a particular set of skills the way Data Clubs are, but they are more tailored towards troubleshooting and discussion, rather than being set up as workshop-style sessions where skills are taught. 

  • Data Clubs and Clinics can be coupled, in that Data Clubs are held to teach particular skills, then followed by Data Clinics to troubleshoot and address specific problems. However, these two don’t have to be coupled – an institution may already have a strong baseline capacity of data science skills, and may choose to simply opt for Data Clinics on their own. 

  • Data Clubs and Clinics hold the potential for fostering mutually beneficial collaborations among these researchers, such as joint publications or research projects. Together, they aim to enhance researchers' data management and analytical abilities, enabling them to better attain their research goals and implementation. The data experts serve as short-term tutors, providing guidance on processes without assuming responsibility for the final outcomes, as execution largely depends on the participants' efforts.


NEXT: Getting Started »

Identifying the need, developing a plan, and setting learning objectives.