# Longitudinal consensus clustering with flexmix

Source:`R/longitudinal_consensus_cluster.R`

`longitudinal_consensus_cluster.Rd`

This function performs longitudinal clustering with flexmix. To get robust
results, the data is subsampled and the clustering is performed on this
subsample. The results are combined in a consensus matrix and a final
hierarchical clustering step performed on this matrix. In this, it follows
the approach from the `ConsensusClusterPlus`

package.

## Usage

```
longitudinal_consensus_cluster(
data = NULL,
id_column = NULL,
max_k = 3,
reps = 10,
p_item = 0.8,
model_list = NULL,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),
title = "untitled_consensus_cluster",
final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",
"median", "centroid"),
seed = 3794,
verbose = FALSE
)
```

## Arguments

- data
a

`data.frame`

with one or several observations per subject. It needs to contain one column that specifies to which subject the entry (row) belongs to. This ID column is specified in`id_column`

. Otherwise, there are no restrictions on the column names, as the model is specified in`flexmix_formula`

.- id_column
name (character vector) of the ID column in

`data`

to identify all observations of one subject- max_k
maximum number of clusters, default is

`3`

- reps
number of repetitions, default is

`10`

- p_item
fraction of samples contained in subsampled sample, default is

`0.8`

- model_list
either one

`flexmix`

driver or a list of`flexmix`

drivers of class`FLXMR`

- flexmix_formula
a

`formula`

object that describes the`flexmix`

model relative to the formula in the flexmix drivers (the dot in the flexmix drivers is replaced, see the example). That means that you usually only specify the right-hand side of the formula here. However, this is not enforced or checked to give you more flexibility over the`flexmix`

interface- title
name of the clustering; used if

`writeTable = TRUE`

- final_linkage
linkage used for the last hierarchical clustering step on the consensus matrix; has to be

`average, ward.D, ward.D2, single, complete, mcquitty, median`

or`centroid`

. The default is`average`

- seed
seed for reproducibility

- verbose
`boolean`

if status messages should be displayed. Default is`FALSE`

## Value

An object (list) of class `lcc`

with length `maxk`

.
The first entry `general_information`

contains the entries:

`consensus_matrices` | a list of all consensus matrices (for all specified clusters) |

`cluster_assignments` | a `data.frame` with an ID column named after `id_column` and a column for every specified number of clusters, e.g. `assignment_num_clus_2` |

`call` | the call/all arguments how `longitudinal_consensus_cluster` was called |

The other entries correspond to the number of specified clusters (e.g. the second entry corresponds to 2 specified clusters) and each contains a list with the following entries:

`consensus_matrix` | the consensus matrix |

`consensus_tree` | the result of the hierarchical clustering on the consensus matrix |

`consensus_class` | the resulting class for every observation |

`found_flexmix_clusters` | a vector of the actual found number of clusters by `flexmix` (which can deviate from the specified number) |

## Details

The data types `longitudinal_consensus_cluster`

can handle depends on
how the `flexmix`

models are set up, in principle all data types are
supported for which there is a `flexmix`

driver with the desired
outcome variable.

If you follow the dimension reduction approach outlined in
`vignette("Example clustering analysis", package = "longmixr")`

, the
input data types depend on what `FAMD`

from the `FactoMineR`

package can handle. `FAMD`

accepts `numeric`

variables and treats
all other variables as `factor`

variables which it can handle as well.

## Examples

```
set.seed(5)
test_data <- data.frame(patient_id = rep(1:10, each = 4),
visit = rep(1:4, 10),
var_1 = c(rnorm(20, -1), rnorm(20, 3)) +
rep(seq(from = 0, to = 1.5, length.out = 4), 10),
var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +
rep(seq(from = 1.5, to = 0, length.out = 4), 10))
model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),
flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))
clustering <- longitudinal_consensus_cluster(
data = test_data,
id_column = "patient_id",
max_k = 2,
reps = 3,
model_list = model_list,
flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))
#> 2 : *
#> 2 : *
#> 2 : *
# not run
# plot(clustering)
# end not run
```