Skip to contents

This function performs longitudinal clustering with flexmix. To get robust results, the data is subsampled and the clustering is performed on this subsample. The results are combined in a consensus matrix and a final hierarchical clustering step performed on this matrix. In this, it follows the approach from the ConsensusClusterPlus package.

Usage

longitudinal_consensus_cluster(
  data = NULL,
  id_column = NULL,
  max_k = 3,
  reps = 10,
  p_item = 0.8,
  model_list = NULL,
  flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
  title = "untitled_consensus_cluster",
  final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
    "median", "centroid"),
  seed = 3794,
  verbose = FALSE
)

Arguments

data

a data.frame with one or several observations per subject. It needs to contain one column that specifies to which subject the entry (row) belongs to. This ID column is specified in id_column. Otherwise, there are no restrictions on the column names, as the model is specified in flexmix_formula.

id_column

name (character vector) of the ID column in data to identify all observations of one subject

max_k

maximum number of clusters, default is 3

reps

number of repetitions, default is 10

p_item

fraction of samples contained in subsampled sample, default is 0.8

model_list

either one flexmix driver or a list of flexmix drivers of class FLXMR

flexmix_formula

a formula object that describes the flexmix model relative to the formula in the flexmix drivers (the dot in the flexmix drivers is replaced, see the example). That means that you usually only specify the right-hand side of the formula here. However, this is not enforced or checked to give you more flexibility over the flexmix interface

title

name of the clustering; used if writeTable = TRUE

final_linkage

linkage used for the last hierarchical clustering step on the consensus matrix; has to be average, ward.D, ward.D2, single, complete, mcquitty, median or centroid. The default is average

seed

seed for reproducibility

verbose

boolean if status messages should be displayed. Default is FALSE

Value

An object (list) of class lcc with length maxk. The first entry general_information contains the entries:

consensus_matricesa list of all consensus matrices (for all specified clusters)
cluster_assignmentsa data.frame with an ID column named after id_column and a column for every specified number of clusters, e.g. assignment_num_clus_2
callthe call/all arguments how longitudinal_consensus_cluster was called

The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:

consensus_matrixthe consensus matrix
consensus_treethe result of the hierarchical clustering on the consensus matrix
consensus_classthe resulting class for every observation
found_flexmix_clustersa vector of the actual found number of clusters by flexmix (which can deviate from the specified number)

Details

The data types longitudinal_consensus_cluster can handle depends on how the flexmix models are set up, in principle all data types are supported for which there is a flexmix driver with the desired outcome variable.

If you follow the dimension reduction approach outlined in vignette("Example clustering analysis", package = "longmixr"), the input data types depend on what FAMD from the FactoMineR package can handle. FAMD accepts numeric variables and treats all other variables as factor variables which it can handle as well.

Examples

set.seed(5)
test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
#> 2 : *
#> 2 : *
#> 2 : *
# not run
# plot(clustering)
# end not run