Books, literature reviews, datasets, online courses (MOOCs), and other reference material about process-oriented data science for healthcare:

Literature Reviews

  • Process mining in healthcare: A literature review. Eric Rojas, Jorge Munoz-Gama, Marcos Sepúlveda, Daniel Capurro. Journal of Biomedical Informatics 61: 224-236 (2016).  Process Mining focuses on extracting knowledge from data generated and stored in corporate information systems in order to analyze executed processes. In the healthcare domain, process mining has been used in different case studies, with promising results. Accordingly, we have conducted a literature review of the usage of process mining in healthcare. The scope of this review covers 74 papers with associated case studies, all of which were analyzed according to eleven main aspects, including: process and data types; frequently posed questions; process mining techniques, perspectives and tools; methodologies; implementation and analysis strategies; geographical analysis; and medical fields. The most commonly used categories and emerging topics have been identified, as well as future trends, such as enhancing Hospital Information Systems to become process-aware. This review can: (i) provide a useful overview of the current work being undertaken in this field; (ii) help researchers to choose process mining algorithms, techniques, tools, methodologies and approaches for their own applications; and (iii) highlight the use of process mining to improve healthcare processes.

Massive Open Online Courses (MOOCs)

  • Process Mining in Healthcare (FutureLearn): Within healthcare there are thousands of complex and variable processes that generate data including treatment of patients, lab results and internal logistic processes. Analysing this data is vital for improving these processes and ending bottlenecks. On this course you will explore how process mining can help turn this data into valuable insights by looking at different areas of process mining and seeing how it has been applied. You will even get the chance to apply process mining on real life healthcare data.


  • Process Mining in Healthcare – Evaluating and Exploiting Operational Healthcare Processes. Ronny Mans, Wil M. P. van der Aalst, Rob J. B. Vanwersch. Springer Briefs in Business Process Management, Springer 2015, ISBN 978-3-319-16070-2. What are the possibilities for process mining in hospitals? In this book the authors provide an answer to this question by presenting a healthcare reference model that outlines all the different classes of data that are potentially available for process mining in healthcare and the relationships between them. Subsequently, based on this reference model, they explain the application opportunities for process mining in this domain and discuss the various kinds of analyses that can be performed. They focus on organizational healthcare processes rather than medical treatment processes. The combination of event data and process mining techniques allows them to analyze the operational processes within a hospital based on facts, thus providing a solid basis for managing and improving processes within hospitals. To this end, they also explicitly elaborate on data quality issues that are relevant for the data aspects of the healthcare reference model. This book mainly targets advanced professionals involved in areas related to business process management, business intelligence, data mining, and business process redesign for healthcare systems as well as graduate students specializing in healthcare information systems and process analysis.


  • Sepsis Cases. Mannhardt, F. (Felix) (2016) Sepsis Cases – Event Log. Eindhoven University of Technology. Dataset. This real-life event log contains events of sepsis cases from a hospital. Sepsis is a life threatening condition typically caused by an infection. One case represents the pathway through the hospital. The events were recorded by the ERP (Enterprise Resource Planning) system of the hospital. There are about 1000 cases with in total 15,000 events that were recorded for 16 different activities. Moreover, 39 data attributes are recorded, e.g., the group responsible for the activity, the results of tests and information from checklists. Events and attribute values have been anonymized. The time stamps of events have been randomized, but the time between events within a trace has not been altered.
  • Hospital Billing. Mannhardt, F. (Felix) (2017) Hospital Billing – Event Log. Eindhoven University of Technology. Dataset. The ‘Hospital Billing’ event log was obtained from the financial modules of the ERP system of a regional hospital. The event log contains events that are related to the billing of medical services that have been provided by the hospital. Each trace of the event log records the activities executed to bill a package of medical services that were bundled together. The event log does not contain information about the actual medical services provided by the hospital. The 100,000 traces in the event log are a random sample of process instances that were recorded over three years. Several attributes such as the ‘state’ of the process, the ‘caseType’, the underlying ‘diagnosis’ etc. are included in the event log. Events and attribute values have been anonymized. The time stamps of events have been randomized for this purpose, but the time between events within a trace has not been altered.
  • BPIC 2011 Hospital Log van Dongen, B.F. (2012) BPI Challenge 2012. Eindhoven University of Technology. Dataset. We have prepared a real-life log, taken from a Dutch Academic Hospital. This log contains some 150.000 events in over 1100 cases. Apart from some anonymization, the log contains all data as it came from the Hospital’s systems. Each case is a patient of a Gynaecology department. The log contains information about when certain activities took place, which group performed the activity and so on. Many attributes have been recorded that are relevant to the process. Some attributes are repeated more than once for a patient, indicating that this patient went through different (maybe overlapping) phases, where a phase consists of the combination Diagnosis & Treatment.
  • MIMIC. MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35.  MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.