Cluster mapping data into mineral species

cluster_xmap(
  xmap,
  centers,
  elements = intersect(names(xmap), colnames(centers)),
  saving = TRUE,
  suffix = "_.*",
  ...
)

Arguments

xmap

A qm_xmap class object returned by read_xmap()

centers

c-by-p matrix returned by find_centers() or by manually; c clusters and p features. Used to guess initial centers (or centroids) of clusters. A value returned by , typically data.frame or matrix, indicating initial guess centers (or centroids) or clusters. See find_centers().

elements

A character vector to chose elements to be utilized in cluster analysis. NULL (default) selects as much elements as possible.

saving

TRUE or FALSE to save result. Specifying xte coerces saving to be FALSE.

suffix

A regular expression of suffix of cluster names. Clusters with the same prefix comprise a super cluster. For example, "Pl_NaRich" and "Pl_NaPoor" becomes "Pl" cluster if suffix = "_.*" (default).

...

Arguments passed on to PoiClaClu::Classify

xte

A m-by-p data matrix: m test observations and p features. The classifier fit on the training data set x will be tested on this data set. If NULL, then testing will be performed on the training set.

rho

Tuning parameter controlling the amount of soft thresholding performed, i.e. the level of sparsity, i.e. number of nonzero features in classifier. Rho=0 means that there is no soft-thresolding, i.e. all features used in classifier. Larger rho means that fewer features will be used.

beta

A smoothing term. A Gamma(beta,beta) prior is used to fit the Poisson model. Recommendation is to just leave it at 1, the default value.

rhos

A vector of tuning parameters that control the amount of soft thresholding performed. If "rhos" is provided then a number of models will be fit (one for each element of "rhos"), and a number of predicted class labels will be output (one for each element of "rhos").

type

How should the observations be normalized within the Poisson model, i.e. how should the size factors be estimated? Options are "quantile" or "deseq" (more robust) or "mle" (less robust). In greater detail: "quantile" is quantile normalization approach of Bullard et al 2010 BMC Bioinformatics, "deseq" is median of the ratio of an observation to a pseudoreference obtained by taking the geometric mean, described in Anders and Huber 2010 Genome Biology and implemented in Bioconductor package "DESeq", and "mle" is the sum of counts for each sample; this is the maximum likelihood estimate under a simple Poisson model.

prior

Vector of length equal to the number of classes, representing prior probabilities for each class. If NULL then uniform priors are used (i.e. each class is equally likely).

transform

Should data matrices x and xte first be power transformed so that it more closely fits the Poisson model? TRUE or FALSE. Power transformation is especially useful if the data are overdispersed relative to the Poisson model.

alpha

If transform=TRUE, this determines the power to which the data matrices x and xte are transformed. If alpha=NULL then the transformation that makes the Poisson model best fit the data matrix x is computed. (Note that alpha is computed based on x, not based on xte). Or a value of alpha, 0<alpha<=1, can be entered by the user.