Invited Session Program

The IBC2022 Invited Session Program has been Confirmed. We are thrilled to announce that 21 Invited Sessions have been selected to present during the International Biometric Conference (IBC2022) live, and in-person in Riga, Latvia on 11-15 July 2022.

The 20 sessions include a wide range of topics, including ecology, clinical trials, general modelling approaches, health, epidemiology, and environmental health. Congratulations to the following!

IS.01 Recent Developments in Probabilistic Machine Learning Methods for Causal Inference

Session Chair: Arman Oganisian, Department of Biostatistics, Brown University (USA)

SESSION INFORMATION

Motivation

Probabilistic machine learning (pML) models leverage Bayesian nonparametric priors to flexibly estimate outcome distributions. Prominent examples include Gaussian Process regressions, Dirichlet Process mixtures, and Bayesian Additive Regression Trees. These models are increasingly employed for estimating causal effects from complex, observational data due to several unique advantages. In our session, we present new developments of pML in causal inference including mediation estimation, analysis of sequential adaptive treatment strategies, causal survival analysis, and semi-parametric estimation under non-random missingness. Applications range from health economics and policy to randomized trials and medicine. Thus, we believe this topic will have broad appeal within the IBC community.

A key advantage of pML is full posterior inference. Once a causal parameter is identified, pML yields inference for it under very minimal assumptions about the data-generating mechanism. Thus, it combines the flexibility of classical ML with the uncertainty quantification that is so valued by statisticians. For these reasons, pML for causal inference is becoming increasingly popular and, we believe, will be a major growth area in the future – making it a relevant topic for IBC.

The rationale for this proposed session is threefold:

In recent years, pML for causal inference has produced important new advances and the community can benefit from an invited session that helps summarize, discuss, and disseminate this knowledge to practitioners, methodologists, and theorists – thus narrowing the theory-practice knowledge gaps in the community.
This topic lies at the intersection of several areas: statistical computing, machine learning, causality, Bayesian modeling. An invited session will help bring together researchers from a variety of fields in biometrics who may otherwise not collaborate.
The speakers and discussant in this session have a strong record of research in the area and will be presenting work that is on the research frontier.

Proposed Speakers

Maria Josefsson, Statistics Unit; Umeå School of Business, Economics and Statistics; Umeå University (Sweden)

Bayesian semi-parametric inference for estimating an incremental intervention effect form longitudinal cohort data with non-ignorable dropout and death

Arman Oganisian, Department of Biostatistics, Epidemiology, and Informatics; Pereleman School of Medicine; University of Pennsylvania (USA)

Nonparametric Bayesian estimation of optimal sequential treatment rules with missing time-varying confounders

Michael J. Daniels, Department of Statistics; University of Florida (USA)

A Bayesian nonparametric approach to causal mediation in cluster randomized trials

Joseph W. Hogan, Department of Biostatistics; School of Public Health; Brown University (USA)

A flexible approach for causal inference with multiple treatments and clustered survival outcome

IS.02 New Advances in Software bring Joint Models to the Statisticians' Toolbox

Session Chair: Nicole Erler, Department of Biostatistics, Erasmus University Medical Center, Rotterdam (Netherlands)

SESSION INFORMATION

Motivation

Joint models for longitudinal and time-to-event data constitute a class of statistical models applicable in survival analysis with endogenous time-varying covariates and longitudinal data analysis with non-random missing data. Joint models have seen considerable development in recent years. They have been extended to accommodate multiple longitudinal outcomes of different types (continuous, categorical, account for limits of detection), nonlinear shapes of longitudinal profiles, competing risks and multi-state processes, and combinations of recurrent events and terminating events. An even more exciting development of joint models has been their successful application in the context of precision medicine. The individualized dynamic predictions derived from joint models have been prominently used in shared decision-making and scheduling of invasive procedures.

Despite these developments, applied researchers have not been using joint models in their everyday practice. The main difficulty has been the lack of user-friendly and robust statistical software to streamline such analyses. This session aims to present a set of newly developed software packages that fills this gap. Based on the lessons learned from the first iteration of software for joint models, the new breed of packages is more versatile, covering almost all of the extensions of joint models presented in the literature. Apart from fitting the models, these packages also provide a suite of supporting functions that enable users to depict
results, evaluate the model fit, statistically compare models, and calculate (dynamic) predictions and assess their accuracy.

This session will showcase to IBC participants the wide range of settings in which joint models are applicable and illustrate how they can be used in practice.

The session brings together the developers of the three most prominent software packages for fitting joint models and a discussant with long experience applying these models in practice. In particular,

Dr. Michael Crowther (Karolinska Institutet, Sweden) will showcase the capabilities of the merlin package in Stata. The package implements shared random effects joint models using maximum likelihood. Extensions to multiple longitudinal outcomes (of different types), multi-state processes, recurrent events, and custom models are covered.

Dr. Cecile Proust-Lima (University of Bordeaux, France) will present the capabilities of the lcmm package in R. The package implements joint models based on latent classes and latent processes using maximum likelihood. Extensions to multiple longitudinal outcomes of different types and, competing risks are covered.
Pedro Miranda Afonso (Erasmus MC, the Netherlands) will showcase the capabilities of the JMbayes2 package in R. The package implements shared random effects joint models under the Bayesian paradigm. It can fit models with multiple longitudinal outcomes of different types, competing risks and multi-state processes, recurrent events and terminating events.
Dr. Francesca Little (University of Cape Town, South Africa) will provide an overview of what applied researchers have been missing so far and how these packages fulfill these needs. She will also offer suggestions for future developments and how joint models could be further promoted.

The makeup of the session reflects diversity in different dimensions.

Open source and commercial statistical software: R and Stata
Types of joint models: Shared random effects and latent classes
Estimation frameworks: Maximum Likelihood and Bayesian
Regions of speakers: Nordic-Baltic, French, the Netherlands, and South African
Gender of speakers: Two females and two males
Proposed Speakers / Discussant

Proposed Speakers & Discussant

Dr. Michael Crowther, Karolinska Institute (Sweden)

Extended multivariate mixed effects models with merlin in Stata

Dr. Cecile Proust-Lima, University of Bordeaux (France)

Joint Models based on Latent Classes and Latent Processes using lcmm

Pedro Miranda Afonso, Erasmus University MC (Netherlands)

Extended Joint Models under the Bayesian Approach using JMbayes2

Francesca Little, University of Cape Town

New Advances in Software bring Joint Models to the Statisticians’ Toolbox Discussant

IS.03 Estimands in clinical trials - causal inference to the rescue

Session Chair: Jonathan Bartlett, University of Bath (UK)

SESSION INFORMATION

Motivation

The ICH E9 Addendum on estimands was recently published, formulating a framework for defining the treatment effect estimand of interest in randomized clinical trials. The need for such a framework is due to the occurrence of different types of events (termed intercurrent in the Addendum) which complicates the interpretation of treatment effect estimates from trials, such as patients discontinuing randomized treatment, receiving rescue medication, or dying before the endpoint of interest can be ascertained. The Addendum qualitatively describes a framework for defining estimands, but remains mostly silent as to what assumptions would be required to estimate a given estimand and which statistical methods should be used to estimate them.

Thus far standard methods for handling missing data have been advocated and used for estimating a range of different estimands. Conversely, relatively little has been written about whether and how concepts, assumptions, and estimation methods from the field of modern causal inference can be fruitfully applied to tackle estimands in trials. This proposed session will bring together three speakers and a discussant who have recently been working to attempt to bridge this gap. The talks will be of interest both to statisticians involved in trials who may be less familiar with causal inference methods and to researchers in causal inference, whose research has historically tended to be focused on applications in observational non-randomized studies.

Proposed Speakers & Discussant

Camila Olarte Parra, University of Bath (UK)

Hypothetical estimands in clinical trials - a unification of causal inference and missing data methods

Hege Michiels, University of Ghent (Belgium)

A novel estimand to adjust for rescue treatment in randomized clinical trials

Jack Bowden, University of Exeter

Connecting Instrumental Variable methods for causal inference to the Estimand Framework

Discussant: Rhian Daniel, Cardiff University

IS.04 Obtaining Valid Insights & Inference with Electronic Health Records Data

Session Chair: Benjamin Goldstein, Duke University (USA)

SESSION INFORMATION

Motivation

Electronic health records (EHR) data have become a key data source for clinical research, being used for a wide variety of studies including clinical trials, comparative effectiveness research and clinical risk prediction. As real world, observational data, EHR data have particular challenges that biostatisticians need to consider when analyzing them. These challenges typically revolve around the complex selection process that impacts what data are observed and for whom. In this series of talks, we will discuss how what we observe may impact the insights we draw and strategies for rectifying these challenges across different analytic domains. Examples will be drawn from a range of application domains such as clinical trials, comparative effectiveness and risk prediction.

This session will provide an indepth look at a widely used data source: electronic health records data. Many biostatisticians have likely used EHR data either in their applied or methodological work. While these data share many characteristics with other forms of observational data, there are also unique aspects underlying the data generating distribution that can make working with them, and deriving valid inference and insights, challenging. We will take a unique approach by examining this question from a number of angles: comparative effectiveness research, clinical trials, and risk prediction. While most biostatisticians engage in one or more of these domains, it is not as common for talks to cross these application areas. By combining these talks together, and centering the theme around EHR data, the audience will develop an appreciation for the breadth of questions that arise from working with these data.

Our panel is diverse with regards to region (US and Europe), gender, and departmental affiliations (traditional biostatistics, school of medicine, data science/informatics).

Proposed Speakers & Discussant

Rebecca Hubbard, University of Pennsylvania (USA)

Combining EHR and clinical trial data to improve validity and generalizability of treatment effectiveness estimates

Sebastien Haneuse, Harvard University (USA)

A unified framework for robust causal inference in EHR-based cohort studies with missing confounder data

Matthew Sperrin, University Manchester (UK)

Challenges in prediction using EHR data

Benjamin Goldstein, Duke University (USA)

Identifying and accounting for algorithmic bias in clinical prediction models

Discussant: Angela Wood, University of Cambridge

IS.05 New horizons in disease mapping: scalable models and multivariate proposals

Session Chair: Lola Ugarte, Universidad Pública de Navarra (Spain)

SESSION INFORMATION

Motivation

Spatial and spatio-temporal techniques have been widely explored in biometrical applications leading to fruitful research. In areas such as disease mapping, the methodology has contributed enormously to a better understanding of cancer, one of the most serious diseases of modern societies with a high impact on economy, society and individual lives. However, new problems emerge bringing in new challenges.

Though the research on cancer has been intense, underlying risk factors in many cancer locations remain unknown. On one hand, unveiling spatial and spatio-temporal patterns is the first step to identifying risk factors, and research is abundant. However, when the number of small areas is large and the time periods increase, techniques are not scalable. Hence, new methodology to deal with “large” data sets are required. On the other hand, establishing relationships between different cancer sites becomes necessary and multivariate modelling can discover unmeasured associations between different types of tumors contributing to a better understanding of cancer. Research on multivariate modelling in disease mapping is now becoming popular, but there are still many problems to solve such as guaranteeing positive definiteness of ovariance matrices, studying the impact of the order in which diseases enter the model, computational burden, interpretability, and also scalability.

Finding solutions to these problems will promote new methodological advances that will have direct impact on public health policies and on the research of cancer and many other non-transmissible diseases as well as in other realms involving spatio-temporal data.

The idea behind this session is to bring together experts on different areas of spatial and spatio-temporal modelling of areal count data to share new proposals on the topic. This session will look into different models and strategies to improve inference in multivariate modelling of different cancer types, but also on new scalable proposals that can be applied to large data sets providing accurate inference.

The speakers will provide insight on the topic from different perspectives. A first speaker will propose a “divide and conquer” based methodology to deal with large data sets. A second speaker will explain a different class of spatial models that can be scalable to large data sets. The third speaker will present a multivariate approach to disease mapping using Integrated Nested Laplace Approximations (INLA), and the last speaker will present multivariate methods to analyze a large number of diseases with direct applications in public health. An excellent discussant with an outstanding research career on the topic will close the session.

The session is very convenient and timely as inference in spatio-temporal disease mapping can be onerous if the number of small areas and time periods increase. Multivariate modelling includes additional challenges such as guaranteeing positive definite covariance matrices, including covariates, or defining models whose parameters have practical and useful interpretations.

Proposed Speakers & Discussant

Aritz Adin, Public University of Navarre (Spain)

Scalable spatio-temporal models for disease risk smoothing

Abhirup Datta, John Hopkins University (USA)

Directed Acyclic Graph Auto-Regressive (DAGAR) models for downscaling of disease incidence data

Anna Freni Sterrantino, Imperial College London (UK)

Multivariate Conditional Autoregressive models with penalized priors

Miguel Ángel Martínez Beneito, University of Valencia (Spain)

On M-models in multivariate disease mapping

Discussant: Ying MacNab, School of Public Health, University of British Columbia (Canada)

IS.06 Recent Advancement in Endpoints for observational Studies and clinical trials

Session Chair: Tobias F. Chirwa, University of the Witwatersrand, Wits Medical School, School of Public Health (South Africa)

SESSION INFORMATION

Motivation

In order to obtain a complete overview of treatment benefit for patients, a multimodal approach is often required with the collection of several key endpoints such as survival time, quality of life, biomarkers and measures of daily functioning. These endpoints are often highly correlated with one another, and, in many instances, longitudinal biomarkers or intermediate outcomes could act as potential surrogate markers for definite clinical events such as death, disease onset or hospitalization. In practice, however, these endpoints are virtually always considered separately and their interrelationships remain largely underutilized. Combined assessment of several endpoints could provide additional insights in the treatment effect and explain a large part of the variability, which may lead to more precise estimates in clinical trials or observational studies, and to a better use of available information. In addition, what a patient considers the most important endpoint may differ from individual to individual, and a treatment effect observed on one endpoint may not be considered relevant by the majority of patients.

In this session, therefore, we evaluate several strategies to assess surrogacy, account for the interrelationships between endpoints collected in observational research and clinical trials, evaluate combined treatment effect estimates and incorporate the patient preference when defining treatment benefit.

In this session, we will discuss innovative theory and application of statistics in the estimation of treatment effects on multiple endpoints simultaneously for clinical trials and observational studies. The topic has received considerable interest in the literature the last decade, but uptake in real-world settings has been limited. Given its high potential and possible impact on important aspects in the field of medicine, especially in the area of personalized medicine and for rare diseases, we believe this topic is important for the general audience of the IBS to advance the use of information and disentanglement of treatment effects in various settings.

For this session, we have comprised a wide diversity of speakers and discussant with respect to gender (3:2 male:female ratio), geographic region (United States, Spain, Netherlands, Australia, Africa), field of expertise and background, and level of seniority (ranging from early-career researcher to established investigator). In order to make the topics accessible to a ‘non-expert’ audience, the session will be a mix of methodological advances combined with illustration of real-world applications to address specific subject-matter challenges.

Proposed Speakers & Discussant

Ying Lu, Department of Biomedical Data Science, Stanford University School of Medicine (USA)

A Composite Endpoint Approach to Evaluate Treatment Benefits Based on Patient Preferences

María del Carmen Pardo Llorente, Department of Statistics and O.R. Faculty of Mathematics, Complutense University of Madrid (Spain)

A family of entropy measure for assessing surrogacy in clinical trials

Marissa Lassere, Faculty of Medicine, University of NSW, Sydney, New South Wales, Australia Department of Rheumatology (Australia)

A systematic review and meta-regression analytic approach to assess surrogacy

Ruben P.A. van Eijk, Department of Neurology, Department of Biostatistics, UMC Utrecht (the Netherlands)

Joint Modeling of Longitudinal and Time-to-event Endpoints in Clinical Trials

Discussant: Tobias F. Chirwa, Division of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand (South Africa)

IS.07 Recent Advances in Joint Species Distribution Modeling in Ecology

Session Chair: David Dunson, Department of Statistical Science, Duke University (USA)

SESSION INFORMATION

Motivation

In ecology, joint species distribution models (JSDMs) are a standard tool for studying biodiversity, biological communities and the impact of species traits and environmental covariates including climate. Data typically consist of samples over time at different spatial locations and take one of two forms: (1) a multivariate binary indicator vector denoting which of p species are present in each sample; or (2) counts of abundances for each species in each sample. There is also commonly measurement error in inferring species presence and/or counts based on images, audio or DNA. Modern automated data collection methods are revolutionizing our understanding of biodiversity and our ability to collect broad scale data simultaneously for 100,000s of species. However, statistical methods for analyzing such data are lagging behind the data collection. Critical question include: (1) how to reliably fit realistic JSDMs that take into account the spatiotemporal structure and covariates while providing realistic uncertainty quantification? (2) how to interpret results for data from moderate to large numbers of species including covariate effects and species dependence? (3) how to take into account the fact that there are large numbers of rare species with many currently unknown to science (particularly for insects and fungi)? Beyond the pressing interest in ecology and in understanding how global species biodiversity is impacted by environmental change, there are many interesting and fundamental statistical questions related to multivariate analysis for high-dimensional sparse binary and count data having a complex dependence structure.

This session provides an overview of this interesting and important area through some recent advances by both ecologists and statisticians having a variety of backgrounds. There is substantial diversity represented with two female speakers, speakers from multiple continents, and backgrounds in statistics vs ecology vs mathematics.

Although joint species distribution modeling is a canonical topic in ecology of increasing impact and focus, it is relatively unknown in the statistics and biometrics communities. In addition to the very interesting and important applications, there are many open statistical methodology questions, providing an intellectually stimulating topic for an IBC session. In addition, we have chosen speakers with a high degree of diversity. We have two female speakers, and speakers based in the US, Europe and Australia. We have speakers who publish primarily in statistics journals and others who publish in the ecology literature. The technical level is designed to be accessible to a general audience of biostatisticians including those who may not have background in ecology or in models for high-dimensional multivariate data.

Proposed Speakers & Discussant

Otso Ovaskainen, University of Jyväskylä (Finland)

Joint species distribution modelling: where are we now and where should we go?

Yuqi Gu, Department of Statistics, Columbia University (USA)

Fine-grained discrete latent variable models for joint species distribution modeling: integrating covariates, clustering, and variable selection

David Warton, University of New South Wales, Australia (Australia)

Using automatic differentiation for faster estimation of joint species distribution models

Sara Taskinen, University of Jyväskylä (Finland)

Fast analysis of multivariate abundance data with generalized linear latent variable models

Discussant: David Dunson, Duke University (USA)

IS.08 Integrative and Comprehensive Methods in (gen)omic data analysis

Session Chair: Dr. Hae-Won Uh, Department of Data Science and Biostatistics, Div Julius Center, UMC Utrecht (Netherlands)

SESSION INFORMATION

Motivation

Comprehensive understanding of human health and diseases requires interpretation of molecular intricacy and variations at multiple levels such as genome, epigenome, transcriptome, proteome, and metabolome. Conventional approaches like genome-wide association studies (GWAS) do not capture the complexity of biological mechanisms and measured data. In recent years, many innovative methods to understand complex systems based on these data have been proposed. On the one hand, machine learning (ML) techniques to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase have been proposed [1]. Alternatively, various statistical approaches have been developed to summarize and incorporate information from different levels and sources of the biological system: genome, multiple omics, and functional databases. To leap forward, a 'rigorous' statistical framework in conjunction with algorithmic ML techniques is needed. Hence, a critical assessment of the underlying methodologies and their applicability to the specific areas targeted is due.

Although the successes of GWAS have revealed thousands of genetic loci, these identified variants only explain a small proportion of the genetic contributions to these diseases. There is still much debate regarding the best model for how heritability varies across the genome. Prof. David Balding, et al. evaluated and improved the heritability model in a series of articles and software. They examined assumptions about the distribution of heritability across the genome and showed how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty [2]. The LDAK Model formulates the expected heritability contributed by a SNP depending on its MAF and local LD levels. Using summary statistics from GWAS, software SumHer estimates confounding bias, SNP heritability, enrichments of heritability and genetic correlations [3, 4].

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. Dr. Said el Bouhaddani, et al. introduced novel methods for statistical data integration of heterogeneous omics data. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. He showed the potential of data integration in biomedical studies with two-way orthogonal partial least squares (O2PLS), implemented in OmicsPLS R package [5]. Since algorithmic O2PLS lacks a statistical framework to quantify statistical evidence, a general framework, probabilistic two-way partial least squares (PO2PLS), has been proposed [6, 7].

For clinical translation of an evidence-based approach, the attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. Prof. Xihong Lin, et al. extended unsupervised single scoring frameworks, such as EIGEN [8] and GenoCanyon [9], to Multi-dimensional Annotation Class Integrative Estimation (MACIE) [10]. MACIE summarizes different aspects of variant function by integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. MACIE constructs multi-dimensional integrative scores capable of capturing multiple facets of variant function simultaneously.

Our focus in this session is on the interface between statistics and (gen)omics: statistical framework, computational biology, and application of tractable methods to unravel the genetic basis of complex disease and traits.

Proposed Speakers & Discussant

Prof. David Balding, Statistical Genomics, Melbourne Integrative Genomics (MIG) (Australia)

Evaluating and Improving Heritability Models using Summary Statistics

Dr. Said el Bouhaddani, Data Science and Biostatistics, Div Julius Center, UMC Utrecht (Netherlands)

Statistical Integration of Heterogeneous Data with PO2PLS

Prof. Xihong Lin, Biostatistics, Harvard T. H. Chan School of Public Health (USA)

A Multi-dimensional Integrative Scoring Framework for Predicting Functional Regions in the Human Genome

Discussant: Prof. Jeanine Houwing-Duistermaat, Data Analytics and Statistics, Leeds University (UK)

IS.09 Surrogate Markers: Evaluation and Use in Clinical Studies

Session Chair: Layla Parast, RAND Corporation (USA)

SESSION INFORMATION

Motivation

This proposed session consists of four speakers and the topics have been selected to logically flow in terms of content. All four speakers are excellent presenters, with extensive experience presenting at conferences and teaching to large audiences. The overall aim of the session is to describe recent novel work on the topic of surrogate marker evaluation and use, a topic which has been an active area of research for the last 30 years. The presentations will cover both methodological details (including theoretical arguments) and practical application of the methods (via available code and illustrations using real clinical trial data). The topics will flow in logical sequence. First, Dr. Cai will discuss methods to identify and evaluate surrogate markers which will include methods appropriate for censored and noncensored outcomes. Second, Dr. Tian will present an approach that allows one to use a surrogate marker for early testing of a treatment effect. This method is specific to a time-to-event outcome and will include discussions regarding power/sample size implications. Dr. Parast will then discuss how to test for heterogeneity in the utility of a surrogate marker; for example, a surrogate marker may appear to be “good” overall but may in fact be excellent for one subgroup and poor for another subgroup (this is illustrated using data from a diabetes clinical trial where the subgroups are males and females). Lastly, Dr. Elliott will present a method to assess the surrogacy paradox in subpopulations. The surrogacy paradox is an especially important topic to consider as almost all methods that evaluate or use surrogate markers rely on strict assumptions that are needed to ensure one is not in a surrogate paradox situation. As such assumptions are generally untestable, it is important to have and utilize methods that can examine the potential for violations of these assumptions.

This topic is timely and extremely relevant to IBS. Now, more than ever, there is increased pressure to make decisions regarding the effectiveness of a treatment, intervention, or vaccine and often, studies needed to evaluate these treatments require long-term follow-up of patients. If an identified surrogate marker can appropriately be used to make decisions about a treatment effect, this could potentially accelerate the acquisition of clinical information. The content of this proposal directly reflects the mission of IBS with respect to our focus on the development and application of novel robust statistical methods in biomedical science and public health.

Proposed Speakers & Discussant

Tianxi Cai, Harvard University, Department of Biostatistics (USA)

Identification and Evaluation of Surrogate Markers

Lu Tian, Stanford University, Department of Biomedical Data Science (USA)

Using a Surrogate Marker for Early Testing of a Treatment Effect

Layla Parast, RAND Corporation, Statistics Group (USA)

Testing for Heterogeneity in the Utility of a Surrogate Marker

Michael Elliott, University of Michigan, Department of Biostatistics (USA)

Assessing the Surrogacy Paradox in Subpopulations

IS.11 Statistical Methods and Considerations for Cancer Screening Programs and Biomarker Research in Early Detection

Session Chair: Ping Hu, ScD, SM, U.S. National Cancer Institute (USA)

SESSION INFORMATION

Motivation

This invited session will present newly developed statistical methodologies and their applications to the design and analysis of cancer screening trials, biomarker research in early detection, and database analysis. These are some very important and challenging areas related to the conduct of successful cancer screening trials and programs.

The purpose of cancer screening is to find cancer early and increase the chance of successful treatment. Most cancer screening trials involve special statistical considerations not found in therapeutic trials. More generally, cancer screening has its own special features, such as the sensitivity of the test, the prevalence and incidence of the disease, the sojourn time in the preclinical stage, the lead time and length bias, and overdiagnosis. The statistical methods presented in this session will address these (plus more) special features of cancer screening trials and programs, and they target some optimality considerations.

With the recent advances and rapid growth of biomarker studies in cancer screening trials, statistical considerations in clinical biomarker research have also grown considerably. We will show that the statistical methods are not only important for the design and analysis of cancer screening trials and programs, but that they are also crucial for biomarker research in early detection within cancer screening trials.

In this session, we focus on the following topics: designing a new generation of studies of screening; the development and implementation of statistical models for learning about the diseases history from screening data, including the latent tumor growth process; and statistical models and considerations for biomarker studies in cancer screening. The speakers will discuss the challenges that arise in model development and implementation, and address future developments.

The screening setting involves many statistical aspects that are central to the interests of participants of the IBS conferences, both from a methodological and from an applied perspective. There are crucial aspects of design, modeling, biomarker research in early detection and estimation with a large array of features and complications in cancer screening research. Relevant problems include the identification of optimal screening strategies, the development of risk-based screening, the design of trials to evaluate the impact of screening, the effect of treatment advances on the cost-effectiveness of screening, and biomarker research in early detection within cancer screening . It strongly supports the mission of IBS “devoted to the development and application of statistical and mathematical theory and methods in the biosciences”.

This session will present both methodology research and applications, and it is meant to be of interest to academic scientists and researchers, particularly those working on the area of clinical trials, statistical modeling, and biomarker research in early detection. We expect that the many relevant issues that are associated with screening will make the session appealing to a wide and diverse array of participants.

Also, the ongoing pandemic has increased the general awareness of the issues associated with monitoring and detection (for infectious diseases), so we may expect this session to be of interest to many.

We have three male speakers and one female who is a speaker and discussant (as well as the organizer of the session). Two speakers are from the USA and two speakers are from European countries. All speakers are actively involved in international collaborations and cancer screening research. Two speakers from the USA are also actively involved in design and analysis of cancer screening trials – PLCO, NLST, and TMIST.

Proposed Speakers & Discussant

Constantine Gatsonis, PhD, Brown University (USA)

Designing a new generation of studies of screening: Experience from the TMIST trial of breast cancer screening

Marco Bonetti, PhD, Bocconi University (Italy)

Statistical models for the natural history of breast cancer: Likelihood and likelihood-free estimation

Keith Humphreys, PhD, Karolinska Institutet (Sweden)

Continuous growth, random effects models for breast cancer screening data -studies of aggressive breast cancer and screening performance

Ping Hu, ScD, SM, U.S. National Cancer Institute (USA)

Statistical Considerations for Biomarker Research in Planning a Cancer Screening trial

Discussant: Ping Hu, ScD, SM

IS.12 Recent advances in ROC analysis

Session Chair: Christos T. Nakas, University of Thessaly, Volos, Greece; Inselspital, Bern University Hospital, (Switzerland)

SESSION INFORMATION

Motivation

Our session includes novel methodological work in two exciting research areas in diagnostic testing, namely covariate-adjusted ROC surfaces for three-class classification problems and use of the length of the ROC curve and its properties in two-class classification problems.

Accurate diagnosis of disease is of great importance in clinical practice and medical research. The receiver operating characteristic (ROC) surface is a popular tool for evaluating the discriminatory ability of continuous diagnostic test outcomes when there are three ordered disease classes. The first speaker will present approaches for incorporating covariates in ROC surface analysis in order to potentially enhance information gathered from the diagnostic test as its discriminatory ability may depend on these, developing on her previous research (Rodriguez-Alvarez, Inacio, arXiv:2003.13111, 2020). A Bayesian distributional regression approach for covariate-specific ROC surface estimation will be presented where, in the model specification, the covariate-specific ROC surface is indirectly modelled using probabilistic distributional models capturing location, scale, shape, and possibly other aspects of the diagnostic test's distribution in each of the three groups. Covariate effects are modelled flexibly through penalised splines. Changes with age and gender in the capacity of several Alzheimer's disease biomarkers for discriminating between subjects showing no clinical symptoms, subjects with mild disease impairment, and subjects suffering from dementia, will be shown.

The length of the ROC curve has recently been proposed as an alternative index for assessing the diagnostic performance of markers by the second speaker (Franco-Pereira, Nakas, Pardo, AStA Adv Stat Anal 104, 625–647, 2020). Two estimation procedures for this summary measure based on (1) normal assumptions; (2) transformations to normality will be presented. These are compared in terms of bias and root mean square error in an extensive simulation study. Testing procedures for the assessment of a single marker and for the comparison of biomarkers will be shown. Furthermore, cases in which the length of the ROC curve outperforms the AUC and the Youden index are illustrated. Finally, an illustration through a real-world application will be provided.

The third speaker further expounds on the use of the length of the ROC curve. His recent publication revolves around the theoretical foundations and utility of the length index (Bantis et al, Stat Med, 2021, https://doi.org/10.1002/sim.8869). During the early stage of biomarker discovery, high throughput technologies allow for simultaneous input of thousands of biomarkers that attempt to discriminate between healthy and diseased subjects. In such cases, proper ranking of biomarkers is highly important. Common measures, such as the area under the receiver operating characteristic (ROC) curve (AUC), as well as affordable sensitivity and specificity levels, are often taken into consideration. Strictly speaking, such measures are appropriate under a stochastic ordering assumption, which implies that higher (or lower) measurements are more indicative for the disease. Such an assumption is not always plausible and may lead to rejection of extremely useful biomarkers at this early discovery stage. The length of a smooth ROC curve as a measure for biomarker ranking is not subject to a single directionality. The length corresponds to a φ divergence, is identical to the corresponding length of the optimal (likelihood ratio) ROC curve, and is an appropriate measure for ranking biomarkers. A complete framework for the evaluation of a biomarker in terms of sensitivity and specificity through a proposed ROC analogue for use in improper settings will be considered. Applications on real data that relate to pancreatic and
esophageal cancer will be shown.

Overall, recent advances in ROC analysis pertaining to the development of an active area of research with an extremely large number of applications will be presented. ROC related topics are useful in a very wide range of applied research problems and contribute to the interdisciplinarity and usefulness of biometry in the advancement of science overall.

We have an international set of speakers from three countries and two continents (UK, Spain, USA). The discussant and the organizer further represent another three countries from the Eastern Mediterranean (Israel, Greece) and Central Europe (Switzerland), while the panel is well balanced both with respect to gender and academic seniority.

The whole panel is comprised of researchers renowned in this field of research with a large number of methodological and applied contributions alike. A nice balance regarding the experience of the panel members also exists.

Proposed Speakers & Discussant

Vanda Inacio, School of Mathematics, University of Edinburgh (UK)

Distributional ROC surface regression

Alba Maria Franco Pereira, Department of Statistics and Operational Research, Faculty of Mathematics,
Complutense University of Madrid (Spain)

The length of the ROC curve for assessing diagnostic markers

Leonidas Bantis, Department of Biostatistics and Data Science, University of Kansas Medical Center (USA)

The length of the ROC curve and the two cutoff Youden index within a board framework for discovery, evaluation, and cutoff estimation in biomarker studies involving improper ROCs

Discussant: Benjamin Reiser, Department of Statistics, University of Haifa (Israel)

IS.13 Prediction with observational data: STRATOS perspective

Session Chairs: Michal Abrahamowicz, Department of Epidemiology and Biostatistics, McGill University (Canada) & Willi Sauerbrei, Institute of Medical Biometry and Statistic, University Medical Center Freiburg (Germany)

SESSION INFORMATION

Motivation

Prediction is one of the most practically important but also most challenging tasks of statistical analyses. Indeed, the ultimate objective of many modern data analyses is to predict how, for example, the expected future health outcomes may be affected by alternative treatments, exposures and/or prognostic factors. Yet, to ensure the accuracy of the resulting predictions and the valid use of such predictions, researchers must address several complex analytical challenges, many of which require specialized statistical methods, often developed within different branches of statistical research. From this perspective, the proposed session aims at presenting and confronting separate views on some of the most practically relevant issues related to predictive modelling, by experts in different areas of modern biostatistics.

This approach is consistent with the overarching goal of the STRATOS (STRengthening Analytical Thinking for Observational Studies) Initiative, created in 2013 with the aim to systematically evaluate existing methodologies, identify unresolved issues, stimulate research in these areas, and develop guidance to enhance methodological accuracy of real-life data analyses. At present, STRATOS involves >100 researchers with expertise in statistical and epidemiological methods, from 19 countries worldwide, who work in 9 topic groups (TGs) and 11 panels. Each TG includes experts in a specific area of statistical research, who often have not worked together before and may represent different views and/or favor different methods but try to reach consensus through internal discussions and new joint projects. Furthermore, by promoting inter-TG collaborations, STRATOS creates opportunities for developing new interdisciplinary approaches to complex analytical challenges that require cutting-edge expertise in various areas of modern statistics.

In this spirit, the proposed session will involve speakers from 6 STRATOS TGs who will discuss a range of methodological challenges related to different stages of predictive modelling. Specifically, the 4 talks will address: (i) the need for careful initial data analysis to enhance the accuracy and reproducibility of prediction; (ii) the choice of the analytical approach for prediction based on high-dimension data; (iii) criteria and methods to validate and evaluate predictions for survival outcomes; (iv) the use of the predicted values as either exposures or outcomes in further statistical analyses.

Proposed Speakers & Discussant

Marianne Huebner, Department of Statistics and Probability, Michigan State University (USA)

Initial data analysis to support prediction modelling in observational studies

Georg Heinze, Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna (Austria)

Initial data analysis to support prediction modelling in observational studies

Jörg Rahnenführer, Department of Statistics, TU Dortmund University, Dortmund (Germany)

Statistical and machine learning techniques: relative advantages and weaknesses for prediction with high-dimensional data

Lara Lusa, Department of Mathematics, Faculty for Mathematics, Natural Sciences and Information Technologies, University of Primorska & Institute for Biostatistics and Medical Informatics, Medical Faculty, University of Ljublijana (Slovenia)

Statistical and machine learning techniques: relative advantages and weaknesses for prediction with high-dimensional data

David McLernon, Medical Statistics Team, Institute of Applied Health Sciences, University of Aberdeen (UK)

Assessing performance of survival predictions models

Terry Therneau, Division of Biomedical Statistics and Informatics, Mayo Clinic (USA)

Assessing performance of survival predictions models

Pamela Shaw, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Perelman School of Medicine (USA)

Cautionary notes for regression analyses that use a predicted value as either an outcome or an exposure

Laurence Freedman, Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center

Cautionary notes for regression analyses that use a predicted value as either an outcome or an exposure

IS.14 Latent variable and mediation approaches in environmental health and exposomics research

Session Chair: Shelley H. Liu, PhD, Assistant Professor, Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai (USA)

SESSION INFORMATION

Motivation

I. Summary: Our proposal will highlight novel methodology development and application of latent variable and mediation models to environmental health research. Methods we will present include Bayesian factor analysis, item response theory, multiple mediator models, and Bayesian joint latent class models, each of which is motivated by and applied to environmental health datasets.

II. Scientific background on environmental health: A key question in environmental health research is how chemicals we encounter in daily life, (which we may be exposed to through diet, water, air pollution, etc.) can affect health. Traditionally, environmental health focuses on risk assessment for a single chemical at a time. However, this is not reflective of daily life, as we are exposed jointly to many chemicals, so called chemical “mixtures”. There is mounting evidence that chemical mixture exposures are worse for health than exposure to a single chemical. At the same time, rapid technological advances in exposure assessment have enabled researchers to measure high-dimensional internal and external chemical and biomarker exposures (known collectively as exposomics).

III. The need for methods development: However, there are still gaps in statistical methods for environmental health. Methods that can optimally characterize chemical mixtures by accounting for extreme dose response and similarities in exposure profiles within a family; identify synergistic relationships between individual chemical exposures; estimate exposure burden; and account for multiple exposures and multiple mediators; are needed. Our speakers will present their methodological innovations to these and other research questions.

III. Brief description of each talk:

Dr. Albert will describe latent class models which account for extreme types of dose response, in which disease risk may be influenced by a few chemical exposures at a high dose, or a small dose of many chemicals. These methods are applied to the National Cancer Institute’s Agricultural Health Study to investigate links between farmers’ pesticide use and cancer risk.
Dr. Colicino will describe Bayesian factor analysis to study how the maternal serum prenatal exposome affects birthweight. She will identify metabolites that contribute nonlinearly and synergistically to inter-individual variation of newborn fetal growth. These methods are applied to the PRISM, a prospective cohort study of mother-child dyads.
Dr. Chen will describe Bayesian hierarchical approaches for multiple exposure, multiple mediator models, using regularization through a two-stage approach, a product-of-coefficient approach, and a difference-of-coefficient approach. These methods are applied to epidemiological studies of human reproduction and women’s health.
Dr. Liu will describe item response theory and latent variable approaches to quantify latent exposure burden to chemical classes, and use of these methods to harmonize across cohorts which have measured a set of common, but also distinct chemicals. These methods will be applied to recent data from the US National Health and Nutrition Examination Survey.
Dr. Hwang will describe a Bayesian joint latent risk class modeling framework that describes interaction in chemical exposure patterns between male and female partners of a couple, andhow they affect infertility risk. This is applied to the Longitudinal Investigation of Fertility and the Environment (LIFE) study.
Dr. Ryan, our discussant, is an expert in statistical methods for environmental health research and has extensive publications in methodology and applications of latent variable models and mediation analysis.

IV. Goals of this session: Our aim is to share insights on environmental health and exposomics research to encourage other IBC researchers, who may not be familiar with this field, to work on methods development in this area. Further, we hope to share our methods, as well as software implementation, which could be also applied to other fields. We look forward to connecting with other researchers working in fields where there is a need for latent variable and mediation methods.

Proposed Speakers & Discussant

Paul Albert, Senior Investigator and Branch Chief, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute (USA)

Latent class modeling approaches for analyzing chemical mixtures in epidemiologic studies

Shelley H. Liu, Assistant Professor, Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai (USA)

Quantifying latent exposure burden to chemical mixtures using item response theory

Zhen Chen, Senior Investigator, Biostatistics and Bioinformatics Branch, National Institute of Child Health and Human Development (USA)

Mediation analyses with multiple exposures and multiple mediators

Elena Colicino, Assistant Professor, Department of Environmental Medicine and Public Health, Division of Biostatistics, Icahn School of Medicine at Mount Sinai (USA)

Bayesian factor analysis for interaction to identify non-linear and non-additive associations between the prenatal exposome and birthweight

Beom Seuk Hwang, Associate Professor, Department of Applied Statistics, Chung-Ang University (Korea)

A Bayesian multi-dimensional couple-based latent risk model with an application to infertility

Discussant: Louise Ryan, University of Technology Sydney (Australia)

IS.15 Innovative Complex Designs for Confirmatory Clinical Trials with Multiple Primary Research Questions

Session Chair: James Carpenter, London School of Hygiene and Tropical Medicine (UK)

SESSION INFORMATION

Motivation

There is an urgent need for new methods in the clinical trials arena. Confirmatory clinical trials are often large, slow, expensive and inflexible. Therefore, the primary challenge is finding feasible new designs that efficiently evaluate multiple new therapies under one protocol. Such designs, which allow testing of multiple primary research hypotheses will lead to faster decisions about experimental treatments.

This session will consider four distinct designs, highlighting why they represent substantial progress in the confirmatory setting. Their utility and practicality will be illustrated with a range of trials, including the STAMPEDE in prostate and COVID-19 trials. The presentations will also explore the statistical methodology for error rate control when adding or dropping arms, and for determining power. Important issues such as the availability of software and what type of error rate control (family wise error rate, pair-wise error rate, false discovery rate) is appropriate for a confirmatory trial will be discussed.

The four designs that will be included are:

DURATIONS designs [1]: The development of this design was motivated by antibiotic resistance. Most antibiotic treatments have a very limited, sometimes non-existent, evidence base: to minimise antibiotic resistance we need to minimise treatment durations while retaining efficacy. The DURATIONS design involves recruitment to several comparator durations and flexible modelling of the duration-response relationship to identify a regimen that is close to optimal.
Multi-Arm Multi-Stage (MAMS) designs [2]: MAMS Group Sequential designs are generalizations of Two-arm Multi-stage Group Sequential designs for comparing more than one treatment arm to a common control arm with possible early stopping for efficacy or futility. Adaptive MAMS group sequential designs permit, in addition to early stopping, treatment selection and sample size re-estimation at one or more interim analysis time points. In the adaptive setting, the group sequential stopping boundaries may be constructed initially, but they can be modified in the presence of adaptive changes. We show how the family wise type-1 error rate can be controlled in this adaptive setting.
Error rate control in platform trials [3]: Platform trials have a single master protocol in which multiple treatments are evaluated over time. They include flexible features such as early stopping of accrual to treatments for lack of benefit, as well as the opportunity to add new treatment arms over the course of the trial. This approach has been used in a number of trials for the treatment of COVID-19, as well as in cancer, infectious diseases such as TB, neurodegenerative diseases and surgical wound infection. This talk with address the issues of error rate control when designing a (MAMS) platform trial and adding a new research arm with applications to STAMPEDE and RAMPART platform trials.
Correlated Endpoints and Graphical Multiplicity Control [4] – We apply the Maurer and Bretz (2013) graphical method for controlling tests of multiply hypotheses in group sequential designs. We provide tools to take advantage of known correlations to relax group sequential bounds. Various complexities regarding timing and spending are dealt with. Both testing in multiple populations and multiple arms versus a common control as well as combinations of these are supported.

The confirmatory trial designs which are the focus of this session require novel statistical research to resolve key questions about their operating characteristics (with a range of interim and outcome measures), in order that they can be applied in a broad range of disease settings. Both the methodology for resolving these questions, and the applications themselves, are of interest to biostatisticians from both low and middle income, and high income settings.

The session speakers come from a diverse background of industry, pharma, and academic institutions. They are also in different stages of their career.

Professor Cyrus Mehta is a prominent statistician and a fellow of the American Statistical Association. He is also a co-founder of Cytel Inc. He has developed the underlying methodology for many adaptive trial designs, including those for the MAMS designs. These methods have been implemented in the Cytel’s EAST software. He is also an adjunct professor of biostatistics at the Harvard T.H. Chan School of Public Health.
Professor Keaven Anderson heads methodology research group at Merck & Co., Inc, and has interest in adaptive trials with a focus on group sequential designs. He has implemented these methods in gsDesign package in R.
Dr Babak Choodari-Oskooei is a senior statistician at the MRC Clinical Trials Unit at UCL. He is interested in clinical trials methodology, and is an expert in adaptive multi-arm multi-stage (MAMS) platform randomised clinical trials. He has been a member of the UK-wide adaptive design working group (ADWG), and an independent statistician on data monitoring and trial steering committees of many randomised trials.
Dr Matteo Quartagno is a senior statistician in clinical methodology research, with a particular interest in the design of adaptive randomized clinical trials. He has developed the methodology underlying the DURATIONS design and implemented these methods in an R package.
James Carpenter is professor of medical statistics at the London School of Hygiene and Tropical Medicine. Besides missing data, a key focus of his research is finding practical methodological solutions to challenges in Phase III clinical trials and observational research.

Proposed Speakers & Discussant

Dr. Matteo Quartagno, Institute of Clinical Trials and Methodology at UCL (UK)

The DURATIONS design: a practical trial design to optimise treatment duration

Prof. Cyrus Mehta, Cytel Inc (USA)

Design of Adaptive Multi-arm Multi-stage Clinical Trials

Dr. Babak Choodari-Oskooei, MRC Clinical Trials Unit at UCL (UK)

Error rate control (FWER and PWER) when designing a platform trial and adding a new research arm: with application to STAMPEDE and RAMPART trials

Prof. Keaven Anderson, Merck & Co., Inc (USA)

Correlated Endpoints and Graphical Multiplicity Control

Discussant: Prof. James Carpenter, London School of Hygiene and Tropical Medicine (UK)

IS.16 New Advances in Bayesian Modeling

Session Chair: Pere Puig, Universitat Autònoma de Barcelona (Spain)

SESSION INFORMATION

Motivation

The Web of Science (WOS) reported 286 papers including in their titles the term “Bayesian modeling” or “Bayesian modelling”, since 2017. Moreover, the journal Biometrics has published more than 100 papers on Bayesian models since 2020. Clearly Bayesian modeling (methods and applications) is an important topic in medical sciences, biology and ecology as well. One of the proposed invited speakers (Simon Wood) is the author of one of these recent papers on Bayesian modeling published in Biometrics:
https://onlinelibrary.wiley.com/doi/10.1111/biom.13462

The purpose of this invited session is twofold. Firstly, we want to bring together researchers with recent publications on Bayesian models in order to present their methodological advances and applications. Secondly, we want to give a larger audience access to these developments and their potentials, sensibilizing and attracting researchers to new statistical models requiring Bayesian methods. Two of the invited speakers will present Bayesian models applied to medical sciences, and the other two speakers will present applications in biology and ecology. In this way, we will reach a wide audience.

Proposed speakers are the following (in alphabetic order):

Sara Martino; she is associate professor at the Norwegian University of Science and Technology, Norway. She is an expert in spatial Bayesian modeling and in approximate Bayesian Inference, publishing papers in journals like Atmospheric Environment and Spatial Statistics. Her paper in JRSS B (2009), a seminal paper on INLA, coauthored with Havard Rue and Jo Eidsvik, has currently 1915 citations (WOS).
Dorota Młynarczyk; Universitat Autònoma de Barcelona, Spain. She is a PhD student, working on Bayesian modeling, supervised by Pere Puig (UAB), Carmen Armero (UV) and Virgilio Gómez (UCM).
Janet van Niekerk; she is a Post-Doctoral fellow In Statistics at the King Abdullah University of Science and Technology (KAUST), under the supervision of Professor Havard Rue in his research group BAYESCOMP. She was the youngest PhD graduate at the University of Pretoria in 2017. She has published papers in journals like Statistical Methods in Medical Research and the Journal of Statistical Software.
Simon Wood; chair of Computational Statistics at the School of Mathematics in the University of Edinburgh. He has published papers in journals like Biometrics, JASA, JRSS B and Biometrika. He introduced the synthetic likelihood methodology in his seminal paper published in Nature (2010).

The discussant is Sebastien JPA Haneuse, professor of Biostatistics at the Harvard T.H. Chan School of Public Health, Boston, USA. He has published papers in journals like Biometrics, Statistical Methods in Medical Research, JASA, JRSS A and the American Journal of Epidemiology.

Proposed Speakers & Discussant

Sara Martino, Norwegian University of Science and Technology (Norway)

A new computational approach to fit hierarchical models with INLA

Dorota Młynarczyk, Universitat Autònoma de Barcelona (Spain)

A Bayesian inverse non-linear regression model applied to Biodosimetry

Janet van Niekerk, King Abdullah University of Science and Technology (KAUST) (Saudi Arabia)

Complex joint survival models with INLA

Simon Wood, University of Edinburgh (UK)

Empirical Bayes methods for semi-mechanistic biological dynamic models

Discussant: Sebastien JPA Haneuse, Harvard T.H. Chan School of Public Health (USA)

IS.17 Statistical Methods for MODERN TEMPORAL devices

Session Chair: Jeanine Houwing-Duistermaat, Department of Statistical Sciences, University of Bologna (Italy)

SESSION INFORMATION

Motivation

With advances in technology, temporal often high-dimensional datasets have emerged in epidemiological studies on diseases and health, for example data from wearable devices and 3D brain images over time. The goal of these studies is to relate these types of datasets to disease and health outcomes. The time scale, missing data mechanisms, and high dimensionality require new statistical models.

One set of tools is functional data analysis (FDA), which can be used for modelling data varying over a continuum such as dense time observations, curves and surfaces. With the increasing number of dense datasets, FDA is a popular research area. In addition, it can also be used to model irregular sparse datasets, which might arise from observational biomarker studies where patients might skip visits. State-of-the-art methods for modeling functional data are, for instance, functional principal component analysis (FPCA) which aims to carry out dimension reduction and functional regression analysis (FRA) which contains a number of approaches for investigating the relationship (linear) between response and predictors. However, currently available datasets require new developments to include different types of datasets in one model, to deal with high dimensions and to account for the presence of nonrandom missingness.

In this session, we will bring together statisticians with expertise in modeling temporal and functional data. The invited speakers will share their novel methods and data applications on how to extract physical activity patterns and model them (Mei-Cheng), how to model longitudinal high dimensional datasets (Jeanine) and how to account for observations subject to detection limits in temporal datasets (Haiyan).

Proposed Speakers & Discussant

Jeanine Houwing-Duistermaat, University of Leeds (UK)

A novel statistical model for longitudinal high dimensional datasets.

Haiyan Liu, University of Leeds (UK)

Local maximum likelihood estimation and inference of FPCA with missing values

Mei-Cheng Wang, The Johns Hopkins University (USA)

Time-dependent activities and physical profiles from wearable device data

IS.18 Recent advances in methods for ordinal data and its applications in health-related data sets

Session Chair: Daniel Fernández, Polytechnic University of Catalonia-Barcelona Tech (Spain)

SESSION INFORMATION

Motivation

In contrast to the usual inference problems we solve for continuous variables, statistical inference for ordinal data has another layer of complexity. Consequently, common ideas and concepts may not readily apply to the analysis of such data. This lack of generality demands special treatment to address the inference challenges, especially in studies relevant to medicine as well as public health.

The topics of this session will cover a comprehensive range of those challenges. In the first presentation, Dr. Spiess will discuss a new methodology of modelling longitudinal ordinal data. Dr. Dungang will then talk about association measures to assess the relation measures between continuous and ordinal variables. As missing data commonly arise in many studies that are particularly relevant in IBC2022, Dr. Yucel will summarize his recent work on computationally efficient multiple imputation inference in incomplete ordinal data in complex high dimensional data structures. Finally, the presentation by Dr. Liu will focus on recent methods on model validation for ordinal data.

We believe that these topics represent the frontier of research on ordinal data stimulating new lines of research while disseminating the state-of-the-art techniques in ordinal data to the audience of IBC2022. We would like to emphasize that theoretical and methodological concepts of all talks will be communicated in a coherent and complimenting manner making them accessible to a diverse audience. In addition, all talks will illustrate a variety of applications in medicine and public health demonstrating the translational aspect of these statistical methods.

Proposed Speakers & Discussant

Dr. Martin Spiess, Universitaet Hamburg (Germany)

GEE estimation of the ordered stereotype panel logit model with application to an arthritis data set

Dr. Dungang Liu, University of Cincinnati (USA)

Evaluating partial association between continuous and ordinal outcomes: an application to assessing college student wellbeing during COVID-19 pandemic

Dr. Recai Yucel, Temple University (USA)

Calibration-based sequential methods for multiple imputation inference for ordinal clustered data with application to cancer registry data

Dr. Ivy Liu, Victoria University of Wellington (New Zealand)

New methods of goodness-of-fit for ordinal data with an application to health

Discussant: Dr. Daniel Fernández, Polytechnic University of Catalonia-BarcelonaTech (Spain)

IS.19 Biostatistical Methods in Toxicology

Session Chair: Christian Ritz, National Institute of Public Health, SDU, Copenhagen (Denmark)

SESSION INFORMATION

Motivation

Appropriate biostatistical methods are of fundamental importance in pharmacological and environmental toxicology. Therefore, many sophisticated statistical approaches tailor-made for the analysis of toxicological data have been developed in the last decades. In general, this applies to non-clinical experiments in drug development as well as to environmental toxicological studies. However, in practice, in the planning, execution, and evaluation of practical experiments, the available new statistical methods and tools are used rarely and sometimes incorrectly. Modern and innovative statistical methods are usually not implemented in e.g. standard lab software and statistical support is not available. Thus, investigators often hesitate to use scientifically up-to-date statistical strategies due to practical limitations.

Therefore, there is urgent need to transfer advances in statistical methodology, software development and visualization more efficiently to the practitioners. The first talk of this session directly discusses these issues in the context of statistical software for dose-response analysis and addresses challenges towards robust approaches that may be used routinely, for instance risk assessment in a regulatory context, screening of harmful compounds, and monitoring of climate changes through changes in seed behaviour.

Both pharmacological and environmental toxicology benefit from the rapid advancements in information technology and molecular biology that lead to the generation of data sets with multiple high-dimensional omics measurements. However, the new opportunities pose even more challenges for the practical procedures used in toxicological experiments. Thereby, they harbour the risk to further increase the gap between the statistical method development and the insight of the practitioner. The second talk addresses this challenge in the field of pharmacological toxicology. Statistical methods to best exploit biological structures in high-dimensional data for dose-response analysis and for constructing classifiers of compounds are introduced and evaluated with respect to practical usefulness and robustness.

In environmental toxicology, effects of natural and of synthetic chemicals on health and the environment are studied. The third talk explains a profitable use of high-dimensional data for the construction of risk scores in this context. An important question is which procedures are most adequate for the generation of genetic risk scores, in particular in the analysis of gene-exposure, gene-gene, and gene-environment interactions. As in pharmacological studies, it is crucial to assess the additional value of the high-dimensional data and evaluate the usefulness of the score on other data sets not used for their construction. Such procedures require robust protocols to be applied more routinely in toxicological research.

The last talk presents an industry-specific view on current analysis practice in toxicology. It is addressed which procedures are used and in which order, and traditional approaches such as the Ames test are contrasted with new applications and requirements, e.g. regarding in-silico models and biomarkers. A hot topic, as in clinical studies, is the inclusion of historical controls. Here, it is also extremely important to transfer suitable statistical strategies to practice, and to be aware of and take into account difficulties and possible analysis biases.

Overall, the benefit for the audience of this session will be two-fold. First, up-to-date overviews of aspects of current developments in the area of statistical method development in pharmacological and environmental toxicology will be presented. Second, and at least as important, the session will contribute ideas, how to robustify toxicological analyses and make them less susceptible to false conclusions, especially for practitioners.

Proposed Speakers & Discussant

Christian Ritz, Department of Nutrition, Exercise and Sports, University of Copenhagen (Denmark)

Advances and challenges in dose-response studies

Franziska Kappenberg, Department of Statistics, TU Dortmund University, Dortmund (Germany)

Exploiting high-dimensional genetic data in pharmacological toxicology

Anke Huels, Department of Epidemiology, Rollins School of Public Health, Emory University (USA)

Methylation risk scores for environmental toxicology

Bernd-Wolfgang Igl, Non-Clinical Statistics, Biostatistics and Data Sciences Europe, Boehringer Ingelheim Pharma GmbH & Co.KG (Germany)

Statistical evaluation of toxicological assays in drug development

IS.20 Recent Advances in Dynamic Risk Prediction Using Longitudinal Data

Session Chair: Paul S. Albert, National Cancer Institute (USA)

SESSION INFORMATION

Motivation

Assessing the risk of disease is important for identifying individuals for increased disease monitoring or possible early intervention. Examples include using longitudinal biomarkers to determine risk of cancer, cardiovascular disease, pre-term birth, and liver disease. In recent years, there has been substantial biostatistical research on the inclusion of longitudinal data (e.g., repeated biomarkers) to provide a dynamic assessment of disease risk. Different modeling frameworks have been proposed to formulate the association between the longitudinal and the disease outcome used to quantify risk (e.g. survival). These include shared random parameter models, landmark analysis, and pattern mixture models. Understanding the relationships between these different modeling strategies will provide an understanding of the advantages and disadvantages of the different approaches. There are a number of difficult methodological challenges that remain, including incorporating high-dimensional longitudinal processes (e.g., a panel of biomarkers measured repeatedly in time), assessing the quality of the prediction, and choosing individualized measurement times for the longitudinal data to optimize the overall prediction of risk.

The goal of this session is to bring together a diverse, internationally recognized list of speakers to discuss the latest research findings in this area of methodological research. The session will provide a series of talks that present state-of-the-art methodology with an interesting range of applications where dynamic prediction can be applied. The work should be easily assessable to a large group of IBS members.

The session brings together a group of leading researchers in this area. The session has a mix of topics that reflect novel methodological contributions as well as innovative applications of these methods to advance public health. All speakers have an internationally recognized reputation. We have a diverse set of speakers from every perspective. Four speakers represent the four nations of UK, Netherlands, France, and the United States.

Proposed Speakers & Discussant

Jessica Barrett, University of Cambridge (UK)

A landmarking approach for the dynamic scheduling of cardiovascular risk assessments

Danping Liu, National Cancer Institute (USA)

Dynamic Risk Prediction for Cervical Precancer Screening with Continuous and Binary Longitudinal Biomarkers

Robin Genuer, University of Bordeaux (France)

Extension of random forests to compute individual dynamic prediction using large dimensional longitudinal data

Hein Putter, Leiden University (Netherlands)

Landmarking 2.0: Bridging the gap between joint models and landmarking

IS.21 Flexible Extensions of the AFT model

Session Chair: Ingrid Van Keilegom, KU Leuven, ORSTAT (Belgium)

SESSION INFORMATION

Motivation

The proposed session focuses on new statistical developments in survival analysis. The three talks will present different new flexible extensions of the multivariable Accelerated Failure Time (AFT) model, which has recently gained increasing attention in the statistical and causal literature. Because most health outcomes depend on many predictors (risk/prognostic factors, exposures and/or treatments), survival analyses in clinical and health research rely on multivariable regression models. Here, one of the main analytical challenges is to specify the model that accurately represents the effects of multiple predictors and develop efficient estimation methods. For the past four decades, Cox’s proportional hazards (PH) model has become the ‘default’ model for the vast majority of real-life applications and the main focus of statistical research in survival analysis. Yet, the AFT model provides a plausible alternative way of accounting for predictors effects, which are assumed to accelerate or slow down the survival process (rather than act multiplicatively on the hazard, as in the PH model). Indeed, the AFT model avoids some limitations of the PH model such as the attenuation of the effects due to omitted predictors (that is, non-collapsibility), and built-in-selection bias, making the AFT model increasingly popular e.g. in the causal inference literature. On the other hand, similar to the constant hazard ratio assumption imposed by the PH model, the classical AFT model relies on a priori assumption that the ‘acceleration factors’, i.e. event time ratios associated with a given change in the predictor, are constant across the entire follow-up duration. It is plausible that, in multivariable analyses, this assumption is violated by some predictors. Similarly, another conventional assumption that continuous predictors have linear relationships with the log event time ratio may be also inconsistent with the true mode of action for some predictors of interest. Yet, up to date, there are only very few statistical publications that propose some methods to relax either the constant time ratio or the linearity assumptions within the AFT framework, in contrast to the vast literature on relaxing the PH and/or linearity assumptions in flexible extensions of the Cox model. We are aware of only very few real-life applications of these methods. Furthermore, AFT modeling requires also estimating the baseline distribution of event times, and almost all existing AFT implementation restricts this distribution to a few conventional parametric models. To address these important limitations of the current AFT literature, the proposed session will present 3 different new flexible extensions of the AFT model that permit relaxing the constant time ratio and/or the linearity assumptions as well as assumption-free modeling of the survival time distribution. Each talk will present a different, novel estimation procedure. The proposed methods will be validated in simulations and their potential to yield new insights about the role of specific risk/prognostic factors will be illustrated by real-life clinical applications. In conclusion, we believe that the proposed session will be of considerable interest for IBS participants who work on the new methods for survival analysis as well as for those who face challenges regarding modeling of complex real-life processes in their collaborative research. We also hope that the presentations will stimulate both further statistical research on Accelerated Failure Time modeling and its more widespread applications.

Relevance
Survival analysis is one of the most active areas of methodological research in modern Biostatistics and has enormous impact on real-life applications in studies of health. To accurately account for complex relationships of several exposures and risk or prognostic factors with time to event, it is important to develop and validate viable alternatives to PH model, that has dominated both the statistical developments in survival analyses and their applications for the past half-century. Thus, research on AFT model is of high relevance for IBS. Furthermore, from a more general perspective, one of the paramount challenges of modern biometrics and biostatistics is to continue refining the existing models to relax their conventional restrictive assumptions in order to match the complexity of the underlying real-life processes of disease occurrence, progression and outcomes. From this angle, the proposed session will present 3 new, alternative, flexible ways to extend the AFT model, each using a different novel estimation approach.

Proposed Speakers & Discussant

Grace Yi, Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario (Canada)

Parametric and semiparametric estimation methods for survival data under partially linear single index AFT models

Mark Clements, Department of Medical epidemiology and Biostatistics, Karolinska Institutet (Sweden)

A flexible parametric accelerated failure time model

Michal Abrahamowicz, Department of Epidemiology & Biostatistics, McGill University, Montreal, Quebec (Canada)

Flexible extension of AFT model to account for non-linear effects and time-dependent effects of covariates on the hazard

Discussant: Ingrid Van Keilegom, KU Leuven, ORSTAT