Books, literature reviews, datasets, online courses (MOOCs), and other reference material about process-oriented data science for healthcare:

  • Call for Papers: “Special Issue on Innovative informatics methods for process mining in health care” in the Journal of Biomedical Informatics (JBI). Due date for submissions: June 1, 2021. This special issue provides a high-quality forum for interdisciplinary researchers to propose novel informatics methods for process mining in health care. The submitted papers should center around relevant problems experienced in the medical domain and propose innovative process mining methods to deal with them. Hence, submissions should move beyond using a brief health care application simply to illustrate a highly generic process mining method.

  • Process mining in healthcare: A literature review. Eric Rojas, Jorge Munoz-Gama, Marcos Sepúlveda, Daniel Capurro. Journal of Biomedical Informatics 61: 224-236 (2016).  Process Mining focuses on extracting knowledge from data generated and stored in corporate information systems in order to analyze executed processes. In the healthcare domain, process mining has been used in different case studies, with promising results. Accordingly, we have conducted a literature review of the usage of process mining in healthcare. The scope of this review covers 74 papers with associated case studies, all of which were analyzed according to eleven main aspects, including: process and data types; frequently posed questions; process mining techniques, perspectives and tools; methodologies; implementation and analysis strategies; geographical analysis; and medical fields. The most commonly used categories and emerging topics have been identified, as well as future trends, such as enhancing Hospital Information Systems to become process-aware. This review can: (i) provide a useful overview of the current work being undertaken in this field; (ii) help researchers to choose process mining algorithms, techniques, tools, methodologies and approaches for their own applications; and (iii) highlight the use of process mining to improve healthcare processes.
  • Recommendations for enhancing the usability and understandability of process mining in healthcare. Niels Martin, Jochen De Weerdt, Carlos Fernández-Llatas, Avigdor Gal, Roberto Gatta, Gema Ibáñez, Owen Johnson, Felix Mannhardt, Luis Marco-Ruiz, Steven Mertens, Jorge Munoz-Gama, Fernando Seoane, Jan Vanthienen, Moe Thandar Wynn, David Baltar Boilève, Jochen Bergs, Mieke Joosten-Melis, Stijn Schretlen, and Bart Van Acker. Artificial Intelligence in Medicine (2020) (Free access to the full text of the paper until November, 21st, 2020 by following this link). Healthcare organizations are confronted with challenges including the contention between tightening budgets and increased care needs. In the light of these challenges, they are becoming increasingly aware of the need to improve their processes to ensure quality of care for patients. To identify process improvement opportunities, a thorough process analysis is required, which can be based on real-life process execution data captured by health information systems. Process mining is a research field that focuses on the development of techniques to extract process-related insights from process execution data, providing valuable and previously unknown information to instigate evidence-based process improvement in healthcare. However, despite the potential of process mining, its uptake in healthcare organizations outside case studies in a research context is rather limited. This observation was the starting point for an international brainstorm seminar. Based on the seminar’s outcomes and with the ambition to stimulate a more widespread use of process mining in healthcare, this paper formulates recommendations to enhance the usability and understandability of process mining in healthcare. These recommendations are mainly targeted towards process mining researchers and the community to consider when developing a new research agenda for process mining in healthcare. Moreover, a limited number of recommendations are directed towards healthcare organizations and health information systems vendors, when shaping an environment to enable the continuous use of process mining.


  • Interactive Process Mining in Healthcare. Carlos Fernandez-Llatas (Ed.). Springer 2020, ISBN 978-3-030-53992-4. This book provides a practically applicable guide to the methodologies and technologies for the application of interactive process mining paradigm. Case studies are presented where this paradigm has been successfully applied in emergency medicine, surgery processes, human behavior modelling, strokes and outpatients’ services, enabling the reader to develop a deep understanding of how to apply process mining technologies in healthcare to support them in inferring new knowledge from past actions, and providing accurate and personalized knowledge to improve their future clinical decision-making. Interactive Process Mining in Healthcare comprehensively covers how machine learning algorithms can be utilized to create real scientific evidence to improve daily healthcare protocols, and is a valuable resource for a variety of health professionals seeking to develop new methods to improve their clinical decision-making.
  • Process Mining in Healthcare – Evaluating and Exploiting Operational Healthcare Processes. Ronny Mans, Wil M. P. van der Aalst, Rob J. B. Vanwersch. Springer Briefs in Business Process Management, Springer 2015, ISBN 978-3-319-16070-2. What are the possibilities for process mining in hospitals? In this book the authors provide an answer to this question by presenting a healthcare reference model that outlines all the different classes of data that are potentially available for process mining in healthcare and the relationships between them. Subsequently, based on this reference model, they explain the application opportunities for process mining in this domain and discuss the various kinds of analyses that can be performed. They focus on organizational healthcare processes rather than medical treatment processes. The combination of event data and process mining techniques allows them to analyze the operational processes within a hospital based on facts, thus providing a solid basis for managing and improving processes within hospitals. To this end, they also explicitly elaborate on data quality issues that are relevant for the data aspects of the healthcare reference model. This book mainly targets advanced professionals involved in areas related to business process management, business intelligence, data mining, and business process redesign for healthcare systems as well as graduate students specializing in healthcare information systems and process analysis.

  • Process Mining in Healthcare (FutureLearn): Within healthcare there are thousands of complex and variable processes that generate data including treatment of patients, lab results and internal logistic processes. Analysing this data is vital for improving these processes and ending bottlenecks. On this course you will explore how process mining can help turn this data into valuable insights by looking at different areas of process mining and seeing how it has been applied. You will even get the chance to apply process mining on real life healthcare data.


  • Sepsis Cases. Mannhardt, F. (Felix) (2016) Sepsis Cases – Event Log. Eindhoven University of Technology. Dataset. This real-life event log contains events of sepsis cases from a hospital. Sepsis is a life threatening condition typically caused by an infection. One case represents the pathway through the hospital. The events were recorded by the ERP (Enterprise Resource Planning) system of the hospital. There are about 1000 cases with in total 15,000 events that were recorded for 16 different activities. Moreover, 39 data attributes are recorded, e.g., the group responsible for the activity, the results of tests and information from checklists. Events and attribute values have been anonymized. The time stamps of events have been randomized, but the time between events within a trace has not been altered.
  • Hospital Billing. Mannhardt, F. (Felix) (2017) Hospital Billing – Event Log. Eindhoven University of Technology. Dataset. The ‘Hospital Billing’ event log was obtained from the financial modules of the ERP system of a regional hospital. The event log contains events that are related to the billing of medical services that have been provided by the hospital. Each trace of the event log records the activities executed to bill a package of medical services that were bundled together. The event log does not contain information about the actual medical services provided by the hospital. The 100,000 traces in the event log are a random sample of process instances that were recorded over three years. Several attributes such as the ‘state’ of the process, the ‘caseType’, the underlying ‘diagnosis’ etc. are included in the event log. Events and attribute values have been anonymized. The time stamps of events have been randomized for this purpose, but the time between events within a trace has not been altered.
  • BPIC 2011 Hospital Log van Dongen, B.F. (2012) BPI Challenge 2012. Eindhoven University of Technology. Dataset. We have prepared a real-life log, taken from a Dutch Academic Hospital. This log contains some 150.000 events in over 1100 cases. Apart from some anonymization, the log contains all data as it came from the Hospital’s systems. Each case is a patient of a Gynaecology department. The log contains information about when certain activities took place, which group performed the activity and so on. Many attributes have been recorded that are relevant to the process. Some attributes are repeated more than once for a patient, indicating that this patient went through different (maybe overlapping) phases, where a phase consists of the combination Diagnosis & Treatment.
  • MIMIC. MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35.  MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.