International Workshop on Machine Learning Strategies for Treatment Outcomes

21.-23. February 2018, at INDICASAT, Panama City, Panama

Understanding individual variation in response to disease treatment underpins long-term success in precision medicine: improving outcome, reducing toxicities and optimizing health care. Individual responses are currently being analyzed by obtaining samples for genomics and other high throughput analyses and have generated large amounts of data. There is a need to use information on patients to build robust predictors of treatment related outcome such as toxicity and drug responses in order to translate artificial intelligence into clinic, which ultimately allow for personalized treatment strategies.
This workshop will explore the use of integrative methodologies and machine learning (ML) techniques for exploitation of clinical and biological Big Data. This involves baseline and longitudinal data including genomics, transcriptomics, metabolomics, metagenomics and other assay data with patient characteristics, disease severity and treatment response. Attendees are expected to already be working with data relevant to this theme.


This workshop is aimed for PhD students in machine learning and bioinformatics with applications to disease and preferably treatment outcomes. Active participation during the workshop is required during workshop sessions, PhD talks and poster sessions. Applicants already working with machine learning in disease or treatment outcomes will be prioritised. Early application is encouraged.

Due to limited space, we may not be able to accept all applicants in which case selections will be made by the programme committee. Accepted applications will be announced November 24th, 2017. When the application is accepted we will send a link to registration.

In order to participate we ask for a reference from supervisor, project description, indication of which session you would like to present in, and preferred presentation type: oral (10-15 minutes) or poster.

Application Deadline: November 15th, 2017

Please send application to:

When the application is accepted we will send a link to registration.

Registration Fee: €150


Marylyn D Ritchie, PhD, University of Pennsylvania, USA

Andrea Califano, PhD, Columbia University, USA

Jason Moore, PhD, University of Pennsylvania, USA

Anders Gorm Pedersen, PhD, Technical University of Denmark, Denmark

Bjarne Ersbøll, PhD, Technical University of Denmark, Denmark


The workshop is held at:
Building 219, City of Knowledge
Panama City, Panama


Rooms for conference participants are to be booked individually. A workshop rate for hotels will be provided at a later stage.


INDICASAT - Scientific Investigation Institute and High End Technology Services
DTU Bioinformatics – Department of Bio and Health Informatics at Technical University of Denmark
DTU Compute – Department of Applied Mathematics and Computer Science at Technical University of Denmark


Agnes Martine Nielsen, PhD student, Technical University of Denmark, Denmark

Rikke Linnemann Nielsen, PhD student, Technical University of Denmark, Denmark

Bjarne Ersbøll, PhD, Technical University of Denmark, Denmark

Ramneek Gupta, PhD, Technical University of Denmark, Denmark

Carlos Mario Restrepo Arboleda, Ph.D, INDICASAT, Panama

Supported by: Poul V. Andersen’s Foundation


Challenges in Moving from Associations to Predictions
Genomics and other high throughput analyses have, to a large extent, focused on associations and statistical modelling between a phenotypic outcome and genetic markers [1,2]. These approaches have led to increased understanding of factors likely associated with phenotypic outcomes, however these associations are at the population level, and are often weak for predictive value for individual outcome. Translating the impact of discovered genetic variants to personal risk assessment is a recurrent challenge [3]. Alternative methods and predictive frameworks are emerging [0] that has its focus shifted from learning about a population towards prediction for the individual or subgroup level. This is not a straightforward change since the goals are fundamentally different and this also means an evolution in accepted methods, statistics and data representation. Furthermore, a strong association is not a guarantee for being a strong predictor and even very clear population level association does not necessarily lead to individual predictions or learnings that are implementable in the clinic. This session aims to highlight the importance of the shift, the challenges therein including differences in evaluation metrics and emerging solutions and successes.

Incorporating High-dimensional Data in Prediction Models
Genetics and other high-throughput omics data such as transcriptomics, metabolomics, metagenomics is increasingly high dimensional and far higher than its sample sizes. Few prediction models naively work well with these amounts of variables, and overfitting is a frequent challenge [1]. Various strategies for addressing this are emerging including intelligent feature selection [1,4], functional groupings of features and multi-omics scaffolding. However, challenges also arise when the selected variables are integrated with other types of variables in a prediction model. This session will focus on feature selection as well as ways to integrate data from of different types.

Use of Prior Knowledge in Associations/Predictions
Biology is affected from multiple interactions in the cellular space. Hence, single marker genomics is limited, since the only information used is genomic level linkage. Domain knowledge can be used for improving predictive power of a model. This may include knowledge from pathways, protein-protein interactions or genetic interaction networks. Studies applying prior knowledge in their models have improved predictions based on genetic markers and clinical biomarkers [2,5]. However, a limitation of including prior knowledge is that models may become biased towards known processes and therefore might miss out on novel interactions with weaker signals [2]. This session will focus on how to include known cellular pathways, networks and/or protein-protein interactions into prediction models.

Operational Data Challenges
Disease-associated data comes with a number of challenges to solve before it can be presented to machine learning models. By its nature, observational data contain missing data, which leads to a loss of samples, input features and power. A strategy for dealing with missing data is imputation. However, this may introduce bias and needs suitable assumptions. Another cleanliness aspect from patient health records is that samples and data are collected at different time points, which presents challenges for comparing time points. In order to combine heterogeneous data types, data must be scaled and encoded in a proper manner for the model to learn patterns. This session will focus on how to clean and prepare data for machine learning models by imputation strategies, handling longitudinal measurements and encoding of input features of different types of data.


[0] Tien YinWong , Neil M. Bressler, Artificial Intelligence With Deep Learning Technology

Looks Into Diabetic Retinopathy Screening. JAMA. 2016;316(22):2366-2367.


[1] M. L. Bermingham, R. Pong-Wong, A. Spiliopoulou, C. Hayward, I. Rudan, H. Campbell, A. F. Wright, J. F. Wilson, F. Agakov, P. Navarro, C. S. Haley. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports 5, Article number: 10312 (2015).


[2] Sebastian Okser, Tapio Pahikkala, and Tero Aittokallio. Genetic variants and their interactions in disease risk prediction – machine learning and network perspectives. BioData Min. 2013; 6: 5.


[3] Nguyen, Tuan V., Eisman, John A. Genetics and the Individualized Prediction of Fracture. Current Osteoporosis Reports2012, Volume 10, Issue 3, pp. 236-244


[4] Saeys, Yvan, Iñaki Inza, and Pedro Larrañaga. "A review of feature selection techniques in bioinformatics." bioinformatics 23.19 (2007): 2507-2517.


[5] Pedersen, H. K., Gudmundsdottir, V., Pedersen, M. K., Brorsson, C. A., Brunak, S., & Gupta, R. (2016). Ranking factors involved in diabetes remission after bariatric surgery using machine-learning integrating clinical and genomic biomarkers. Genome Medicine, 1, [16035]. DOI: 10.1038/npjgenmed.2016.35




Marylyn D Ritchie, PhD, University of Pennsylvania, USA
Dr. Ritchie’s research focuses on improving our understanding of the underlying genetic architecture of common diseases and pharmacogenomic traits among others. The approaches involve development and application of new statistical and computational methods which involve the integration of multiple types of ‘omics data.

Andrea Califano, PhD, Columbia University, USA
Dr. Califano’s interests reside in the assembly and interrogation of gene regulatory models for the elucidation of mechanisms presiding over cell physiology and their dysregulation in disease, with specific applications to cancer, stem cells, and neurodegenerative disease.

Jason Moore, PhD, University of Pennsylvania, USA

Dr. Moore’s research focuses on the development and application of artificial intelligence and machine learning methods for analysis of big biomedical data from research studies aimed at improving our understanding of human health. Recent work has focused on automated machine learning and accessible artificial intelligence.

Anders Gorm Pedersen, PhD, Technical University of Denmark, Denmark

The research of Anders Gorm Pedersen has two main focus areas: pathogen evolution (e.g., identifying and understanding molecular determinants of virulence and species specificity) and Bayesian statistical modeling. Bayesian modeling can be used to address a wide range of problems and makes it possible to integrate diverse data types in a stringent manner, while quantifying the uncertainty about inferences and predictions. Examples of recent work includes inference of microbial species interactions and persistence probabilities from gut metagenomic data, analysis of plague bacteria in bronze age DNA, and risk factors for childhood asthma.

Bjarne Ersbøll, PhD, Technical University of Denmark, Denmark
Ersbøll’s work is mainly on applied statistics and data analysis. He has considerable experience in the application of these disciplines in industrial, medical projects. His research and teaching is largely inspired by finding solutions to actual problems in industry and other institutions - and often in collaboration with these.


Wed 21 Feb 18 8:00 -
Fri 23 Feb 18


DTU Compute
DTU Bioinformatik


21.-23. February 2018, at INDICASAT, Panama City, Panama
21 NOVEMBER 2017