| Title: | Estimate Chla Concentrations of Phytoplankton Groups |
|---|---|
| Description: | Determine the chlorophyll a (Chl a) concentrations of different phytoplankton groups based on their pigment biomarkers. The method uses non-negative matrix factorisation and simulated annealing to minimise error between the observed and estimated values of pigment concentrations (Hayward et al. (2023) <doi:10.1002/lom3.10541>). The approach is similar to the widely used 'CHEMTAX' program (Mackey et al. 1996) <doi:10.3354/meps144265>, but is more straightforward, accurate, and not reliant on initial guesses for the pigment to Chl a ratios for phytoplankton groups. |
| Authors: | Alexander Hayward [aut, cre, cph], Tylar Murray [aut], Sebastian Di Geronimo [aut], Mohd Aasim Maqsood Khan [aut], Andy McKenzie [aut] |
| Maintainer: | Alexander Hayward <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 3.0.0 |
| Built: | 2026-05-10 09:20:24 UTC |
| Source: | https://github.com/phytoclass/phytoclass |
Add weights to the data, bound at a maximum.
Bounded_weights(S, weight.upper.bound = 30)Bounded_weights(S, weight.upper.bound = 30)
S |
Sample data matrix – a matrix of pigment samples |
weight.upper.bound |
Upper bound for weights (default is 30) |
A vector with upper bounds for weights
Bounded_weights(Sm, weight.upper.bound = 30)Bounded_weights(Sm, weight.upper.bound = 30)
Cluster things
Cluster( Data, minSamplesPerCluster, row_ids = NULL, dist_method = "euclidean", hclust_method = "ward.D2" )Cluster( Data, minSamplesPerCluster, row_ids = NULL, dist_method = "euclidean", hclust_method = "ward.D2" )
Data |
S (sample) matrix |
minSamplesPerCluster |
the minimum number of samples required for a cluster |
row_ids |
A vector of custom row names to be added to dendrogram |
dist_method |
Distance metric to be used in |
hclust_method |
Cluster method to be used in |
A named list of length two. The first element "cluster.list" is a list of clusters, and the second element "cluster.plot" the cluster analysis object (dendogram) that can be plotted.
Cluster.result <- Cluster(Sm, 14) Cluster.result$cluster.list plot(Cluster.result$cluster.plot)Cluster.result <- Cluster(Sm, 14) Cluster.result$cluster.list plot(Cluster.result$cluster.plot)
A figure to show the pigment ratios for each phytoplankton group for each iteration.
convergence_figure(fm_iter, niter = NULL)convergence_figure(fm_iter, niter = NULL)
fm_iter |
A data.frame with columns of iter, phyto, pigment and ratio |
niter |
Optional: the number of iterations on the x axis.
If |
A figure with each pigment ratio per iteration per group
# ADD_EXAMPLES_HERE# ADD_EXAMPLES_HERE
Fm data
FmFm
FmA data frame with 9 rows and 15 columns:
XX
XX
XX
...
XX
Fp data
FpFp
FpA data frame with 9 rows and 15 columns:
XX
XX
XX
...
XX
Some checks applied:
drops columns with 0 values
drops taxa with missing major pigments, which are indicated with a '2'
drops pigments with < 1% in samples
Matrix_checks(S, Fmat)Matrix_checks(S, Fmat)
S |
Sample data matrix – a matrix of pigment samples |
Fmat |
Pigment to taxa matrix |
Named list with new S and Fmat matrices
MC <- Matrix_checks(Sm, Fm) Snew <- MC$SnewMC <- Matrix_checks(Sm, Fm) Snew <- MC$Snew
min_max data
min_maxmin_max
min_maxA data frame with 51 rows and 4 columns:
XX
XX
XX
max
...
XX
Performs the non-negative matrix factorisation for given phytoplankton pigments and pigment ratios, to attain an estimate of phytoplankton class abundances.
NNLS_MF(Fn, S, S_weights = NULL)NNLS_MF(Fn, S, S_weights = NULL)
Fn |
Pigment to Chl a matrix |
S |
Sample data matrix – a matrix of pigment samples |
S_weights |
Weights for each column |
A list containing
The F matrix (pigment: Chl a) ratios
The root mean square error (RMSE)
The C matrix (class abundances for each group)
Fmat <- as.matrix(phytoclass::Fm) S <- as.matrix(phytoclass::Sm) S_weights <- as.numeric(phytoclass:::Bounded_weights(S)) place <- which(Fmat[, seq(ncol(Fmat) - 2)] > 0) num.loops <- 2 # Run Steepest_Descent result <- phytoclass:::Steepest_Descent(Fmat, place, S, S_weights, num.loops)Fmat <- as.matrix(phytoclass::Fm) S <- as.matrix(phytoclass::Sm) S_weights <- as.numeric(phytoclass:::Bounded_weights(S)) place <- which(Fmat[, seq(ncol(Fmat) - 2)] > 0) num.loops <- 2 # Run Steepest_Descent result <- phytoclass:::Steepest_Descent(Fmat, place, S, S_weights, num.loops)
This function plots the class abundances as output by simulated_annealing.
phyto_figure(c_matrix)phyto_figure(c_matrix)
c_matrix |
C matrix of class abundance concentrations |
A stacked line plot with sample number on x axis, chl a concentrations on y axis, and phytoplankton groups as colors
# ADD_EXAMPLES_HERE# ADD_EXAMPLES_HERE
This is the main phytoclass algorithm. It performs simulated annealing algorithm for S and F matrices. See the examples (Fm, Sm) for how to set up matrices, and the vignette for more detailed instructions. Different pigments and phytoplankton groups may be used.
simulated_annealing( S, Fmat = NULL, user_defined_min_max = NULL, do_matrix_checks = TRUE, niter = 500, step = 0.009, weight.upper.bound = 30, verbose = TRUE, seed = NULL, check_converge = 100, alt_pro_name = NULL )simulated_annealing( S, Fmat = NULL, user_defined_min_max = NULL, do_matrix_checks = TRUE, niter = 500, step = 0.009, weight.upper.bound = 30, verbose = TRUE, seed = NULL, check_converge = 100, alt_pro_name = NULL )
S |
Sample data matrix – a matrix of pigment samples |
Fmat |
Pigment to Chl a matrix |
user_defined_min_max |
data frame with some format as min_max built-in data |
do_matrix_checks |
This should only be set to TRUE when using the default values. This will remove pigment columns that have column sums of 0. Set to FALSE if using customised names for pigments and phytoplankton groups |
niter |
Number of iterations (default is 500) |
step |
Step ratio used (default is 0.009) |
weight.upper.bound |
Upper limit of the weights applied (default value is 30). |
verbose |
Logical value. Output error and temperature at each iteration. Default value of TRUE |
seed |
Set number to reproduce the same results |
check_converge |
TRUE/FALSE/integer; set the number of F matrices to for convergence checking |
alt_pro_name |
Optional: additional alternate versions of divinyl-chlorophyll-a spellings used to detect prochlorococcus (Default: "dvchl", "dvchla", "dv_chla") |
A list containing
Fmat matrix
RMSE (Root Mean Square Error)
condition number
Class abundances
Figure (plot of results)
MAE (Mean Absolute Error)
Error
F_mat_iter
converge_plot
# Using the built-in matrices Sm and Fm set.seed(5326) sa.example <- simulated_annealing(Sm, Fm, niter = 5) sa.example$Figure# Using the built-in matrices Sm and Fm set.seed(5326) sa.example <- simulated_annealing(Sm, Fm, niter = 5) sa.example$Figure
Perform simulated annealing algorithm for samples with divinyl chlorophyll and prochlorococcus. Chlorophyll must be the final column of both S and F matrices, with Divinyl Chlorophyll a the 2nd to last column. See how the example Sp and Fp matrices are organised.
simulated_annealing_Prochloro( S, Fmat = NULL, user_defined_min_max = NULL, do_matrix_checks = TRUE, niter = 500, step = 0.009, weight.upper.bound = 30, verbose = TRUE, seed = NULL, check_converge = 100 )simulated_annealing_Prochloro( S, Fmat = NULL, user_defined_min_max = NULL, do_matrix_checks = TRUE, niter = 500, step = 0.009, weight.upper.bound = 30, verbose = TRUE, seed = NULL, check_converge = 100 )
S |
Sample data matrix – a matrix of pigment samples |
Fmat |
Pigment to Chl a matrix |
user_defined_min_max |
data frame with some format as min_max built-in data |
do_matrix_checks |
This should only be set to TRUE when using the default values. This will remove pigment columns that have column sums of 0. Set to FALSE if using customised names for pigments and phytoplankton groups |
niter |
Number of iterations (default is 500) |
step |
Step ratio used (default is 0.009) |
weight.upper.bound |
Upper limit of the weights applied (default value is 30). |
verbose |
Logical value. Output error and temperature at each iteration. Default value of TRUE |
seed |
Set seed number to reproduce the same results |
check_converge |
TRUE/FALSE/integer; set the number of F matrices to for convergence checking |
A list containing
Fmat matrix
RMSE (Root Mean Square Error)
condition number
Class abundances
Figure (plot of results)
MAE (Mean Absolute Error)
Error
# Using the built-in matrices Sp and Fp. set.seed(5326) sa.example <- simulated_annealing_Prochloro(Sp, Fp, niter = 1) sa.example$Figure# Using the built-in matrices Sp and Fp. set.seed(5326) sa.example <- simulated_annealing_Prochloro(Sp, Fp, niter = 1) sa.example$Figure
Sm data
SmSm
SmA data frame with 29 rows and 15 columns:
XX
XX
XX
...
XX
Sp data
SpSp
SpA data frame with 29 rows and 15 columns:
XX
XX
XX
...
XX
Stand-alone version of steepest descent algorithm. This is similar to the CHEMTAX steepest descent algorithm. It is not required to use this function, and as results are not bound by minimum and maximum, results may be unrealistic.
Steepest_Desc(Fmat, S, num.loops)Steepest_Desc(Fmat, S, num.loops)
Fmat |
Pigment to Chl a matrix |
S |
Sample data matrix – a matrix of pigment samples |
num.loops |
Number of loops/iterations to perform (no default) |
A list containing
The F matrix (pigment: Chl a) ratios
RMSE (Root Mean Square Error)
Condition number
class abundances
Figure (plot of results)
MAE (Mean Absolute Error)
MC <- Matrix_checks(Sm,Fm) Snew <- MC$Snew Fnew <- MC$Fnew SDRes <- Steepest_Desc(Fnew,Snew, num.loops = 20)MC <- Matrix_checks(Sm,Fm) Snew <- MC$Snew Fnew <- MC$Fnew SDRes <- Steepest_Desc(Fnew,Snew, num.loops = 20)