Skip to contents

This function performs longitudinal clustering with flexmix. To get robust results, the data is subsampled and the clustering is performed on this subsample. The results are combined in a consensus matrix and a final hierarchical clustering step performed on this matrix. In this, it follows the approach from the ConsensusClusterPlus package.


  data = NULL,
  id_column = NULL,
  max_k = 3,
  reps = 10,
  p_item = 0.8,
  model_list = NULL,
  flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
  title = "untitled_consensus_cluster",
  final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
    "median", "centroid"),
  seed = 3794,
  verbose = FALSE



a data.frame with one or several observations per subject. It needs to contain one column that specifies to which subject the entry (row) belongs to. This ID column is specified in id_column. Otherwise, there are no restrictions on the column names, as the model is specified in flexmix_formula.


name (character vector) of the ID column in data to identify all observations of one subject


maximum number of clusters, default is 3


number of repetitions, default is 10


fraction of samples contained in subsampled sample, default is 0.8


either one flexmix driver or a list of flexmix drivers of class FLXMR


a formula object that describes the flexmix model relative to the formula in the flexmix drivers (the dot in the flexmix drivers is replaced, see the example). That means that you usually only specify the right-hand side of the formula here. However, this is not enforced or checked to give you more flexibility over the flexmix interface


name of the clustering; used if writeTable = TRUE


linkage used for the last hierarchical clustering step on the consensus matrix; has to be average, ward.D, ward.D2, single, complete, mcquitty, median or centroid. The default is average


seed for reproducibility


boolean if status messages should be displayed. Default is FALSE


An object (list) of class lcc with length maxk. The first entry general_information contains the entries:

consensus_matricesa list of all consensus matrices (for all specified clusters)
cluster_assignmentsa data.frame with an ID column named after id_column and a column for every specified number of clusters, e.g. assignment_num_clus_2
callthe call/all arguments how longitudinal_consensus_cluster was called

The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:

consensus_matrixthe consensus matrix
consensus_treethe result of the hierarchical clustering on the consensus matrix
consensus_classthe resulting class for every observation
found_flexmix_clustersa vector of the actual found number of clusters by flexmix (which can deviate from the specified number)


The data types longitudinal_consensus_cluster can handle depends on how the flexmix models are set up, in principle all data types are supported for which there is a flexmix driver with the desired outcome variable.

If you follow the dimension reduction approach outlined in vignette("Example clustering analysis", package = "longmixr"), the input data types depend on what FAMD from the FactoMineR package can handle. FAMD accepts numeric variables and treats all other variables as factor variables which it can handle as well.


test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
#> 2 : *
#> 2 : *
#> 2 : *
# not run
# plot(clustering)
# end not run