Title SOM networks for comparing patterns with peak shifts
Maintainer Ron Wehrens <[email protected]>
Description SOM networks for comparing patterns with peak shifts.
bucket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
cepha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
classvec2classmat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
degelder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
expand.som . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
plot.wccsom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
predict.wccsom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
summary.wccsom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
unit.distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wcc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wccassign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wccmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wccsom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . wccxyf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Variable averaging (bucketing) for data matrices
Function bucket decreases the size (i.e., the number of columns) of a data matrix by averagingvariables. Function debucket achieves the reverse by linear interpolation.
Data matrix: each variable corresponds with a column.
Bucket factor: this number of variables will be averaged.
Required number of variables after debucketing.
Returns a data matrix of the new dimensions.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)system.time(x <- wccsom(cepha$patterns, grid=gr, trwidth=20,
X <- bucket(cepha$patterns, 4)system.time(x <- wccsom(X, grid=gr, trwidth=5,
X-ray powder patterns of 20 cephalosporin / antibiotic complexes.
This yields a list with three components: the first component, ’"patterns"’, is a matrix of 20 rowsand 425 variables, containing the powder patterns; the second component is ’"class.names"’, andgives information on the class of the crystal structure. The final component, "thetas", contains the2theta values at which intensities have been measured.
R. de Gelder, R. Wehrens, and J.A. Hageman. J. Comput. Chem., 22(3), 273-289, 2001.
data(cepha)plot(cepha$thetas, cepha$patterns[1,], type="l", xlab="2theta", ylab="Counts")matplot(cepha$thetas, t(cepha$patterns), type="l",xlab="2theta", ylab="Counts",
col=as.integer(factor(cepha$class.names))+1, lty=1)
Convert a classification vector into a matrix or the other way around.
Functions toggle between a matrix representation, where class membership is indicated with one’1’ and for the rest zeros at each row, and an class vector (maybe integers or class names). Theclassification matrix contains one column per class. Conversion from a class matrix to a classvector assigns each row to the column with the highest value. An optional argument can be used toassign only those objects that have a probability higher than a certain threshold (default is 0).
classvec2classmat(yvec)classmat2classvec(ymat, threshold=0)
class vector. Usually integer values, but other types are also allowed.
class matrix: every column corresponds to a class.
only classify into a class if the probability is larger than this threshold.
classvec2classmat returns the classification matrix, where each column consists of zeros andones; classmat2classvec returns a class vector (integers).
classes <- c(rep(1, 5), rep(2, 7), rep(3, 9))classmat <- classvec2classmat(classes)classmatclassmat2classvec(classmat)
X-ray powder patterns of 131 crystallographic structures, contributed by Rene de Gelder.
This yields a list with three components: the first component, ’"patterns"’, is a matrix of 131 rowsand 441 variables, containing the powder patterns; the second component is "thetas", the 2theta val-ues at which intensities have been measured. The final component, ’"properties"’, gives informationon the crystallographic properties of the structures.
Rene de Gelder, Institute of Molecules and Materials, Radboud University Nijmegen.
data(degelder)set.seed(1)geldermap <- wccsom(degelder$patterns, somgrid(6, 6, "hexagonal"))options(digits = 3)summary(geldermap, "unit", nr=1, properties = degelder$properties)
Increases the size of a network by a factor of 2 in both x and y directions; code vectors for new unitsare interpolated.
Object of class "wccsom", generated by wccsom.
If TRUE, plots the new network with the original units coloured white, the newones red.
For hexagonal grids, the six closest units are used for interpolation, weighted by distance; for rect-angular grids, the four closest are used.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid=gr, trwidth=20, rlen=50)x2 <- expand.som(x, plotit=TRUE)
Plot self-organising map, obtained from function wccsom. Several types of plots are supported.
## S3 method for class ’wccsom’plot(x, type = c("changes", "codes", "counts", "mapping",
"prediction", "property", "quality"),
classif = NULL, labels = NULL, pchs = NULL, main = NULL,palette.name = heat.colors, ncolors, unit.colors,unit.bgcol = NULL, zlim = NULL, property = NULL,heatkey=TRUE, contin, .)
classification object, as returned by wccassign. Only needed if type equals’"mapping"’, ’"quality"’, and ’"counts"’. Also a vector of class numbers maybe given. If the network was trained with keep.data equal to TRUE, then theobject already contains this information.
(optional) labels to plot when type equals ’"mapping"’.
(optional) plotting symbols to use when type equals ’"mapping"’.
colors to use as unit background for ’"codes"’, ’"counts"’, ’"prediction"’, ’"prop-erty"’ and ’"quality"’ plotting types.
number of colors to use in the palette.
explicit definition of the colors for the units in the ’"codes"’, ’"counts"’, ’"pre-diction"’, ’"property"’ and ’"quality"’ plotting types.
background color for units if no other color information is to be plotted. Forplotting type is ’"codes"’, the default is ’"transparent"’; in other cases the defaultis ’"gray"’.
Optional range for color coding of unit backgrounds.
Values to use if the ’"property"’ plotting type. Can be used for colouring unitsin general.
Whether or not to generate a heatkey at the left side of the plot in the ’"quality"’,’"counts"’, and ’"property"’ plotting types.
Whether the heatkey should show a range of values (TRUE) or a series of dis-crete values. The function tries to make a good guess; in case of strange-lookingresults it may pay to explicitly provide a value for this argument.
Other graphical parameters, e.g. colours of labels in the ’"mapping"’ plottingtype.
Several different types of plots are supported:
’"changes"’ Shows the mean change in similarity to the best matching codebook vector for each
epoch. Since codebook vectors become more similar to the data that are mapped to them, thechanges should always be positive. Upon convergence, the changes should be very small.
’"codes"’ Shows the codebook vectors.
’"counts"’ Shows the number of objects mapped to the individual units. Empty units are depicted
’"mapping"’ Show where a set of objects is mapped. It needs a ’"labels"’ argument: a string name
’"prediction"’ Shows predictions for units; if no ’"property"’ argument is given, the function
will for supervised maps the predictions for every unit; for unsupervised maps, where thisinformation is not available, it gives an error.
’"property"’ Plot a map with the units coloured according to a specific property. The standard
application is to precompute the similarities of an object to all units in the map, and plot thesewith a colour key. Also other quantities may be used to colour the units: see the example ofdistances below.
’"quality"’ Shows the units coloured according to the mean agreement (WCC) of mapped objects
to the unit vector. A colour key is plotted on the left. The variation in the WCCs of the mappedobjects is indicated by the blue line: if it is pointing downwards it indicates low variation, ifpointing upwards large variation.
If type equals ’"property"’, the wcc values for all units are returned.
R. Wehrens, W.J. Melssen, L.M.C. Buydens and R. de Gelder. Representing Structural Databasesin a Self-Organising Map. Acta Cryst. B61, 548-557, 2005.
data(cepha)gr <- somgrid(3, 3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid = gr, trwidth = 20, rlen = 100)
par(mfrow = c(3,2))plot(x, type = "codes", main = "Codebook vectors")plot(x, type = "changes", main = "Convergence")plot(x, type = "counts", main = "Mapping counts")plot(x, type = "quality", main = "Mapping quality")plot(x, type = "mapping", main = "Mapping",
labels = cepha$class.names, col = as.integer(factor(cepha$class.names)))
plot(x, type = "mapping", main = "Mapping",
pchs = as.integer(factor(cepha$class.names)),col = as.integer(factor(cepha$class.names)))
par(mfrow=c(1,1))obj1.wccs <- wccmap(x, cepha$patterns[1,])plot(x, type = "property", property = obj1.wccs)
Predict properties from self-organising maps
Function to predict property values for every unit in a supervised or unsupervised SOM. These, inturn, are used to provide predictions for individual objects.
## S3 method for class ’wccsom’predict(object, newdata,
If new objects are supplied (in the form of a matrix), they are mapped to theSOM; predictions for the new data are the predicted values associated with theunits to which they are mapped. In order to calculate these the maps should beeither supervised, or the training data should be stored in the map, or these datashould be provided through arguments ’"trainX"’ and ’"trainY"’, or finally, theunit predictions can be explicitly given (’"unit.predictions"’).
Training data, only used when they have not been stored in the trained map.
Dependent values for the training data.
Alternatively, one can provide predictions for every unit.
For supervised SOMs, predictions per unit are available after training. For unsupervised SOMs,these predictions can be obtained from the average values of the properties of training set objectsmapping to specific units. New objects that are mapped to the SOM will receive the predicted valueof the unit to which they are mapped.
The function returns a list with components
Property predictions per unit of the map.
Property predictions for the new data.
~put references to the literature/web site here ~
data(degelder)gr <- somgrid(5, 5, "hexagonal")set.seed(7)x <- wccxyf(degelder$patterns, degelder$properties[,"cell.vol"],
predicted.volumes <- predict(x)plot(degelder$properties[,"cell.vol"], predicted.volumes$predictions,
xlab="Cell volume", ylab="Predicted cell volume")
Summary for objects of class ’"wccsom"’
## S3 method for class ’wccsom’summary(object, type = c("unit", "object", "smoothness", "quality"),
nr, labels, data = object$data,classif = object$unit.classif,wccs = object$wccs, properties = NULL, .)
One of "unit", "object", "smoothness" or "quality".
Number of the unit or object that is to be summarised (only needed for type"unit" and "object").
Labels for all objects (not needed for type equaling "smoothness" and "qual-ity"). Defaults to integer numbers.
Data matrix (not needed for type equaling "smoothness"). Default is to take thetraining objects (if available).
Mapping of all objects (not needed for type equaling "smoothness"). Defaultis to take the mapping of the training objects (if available). Alternatively, amapping object (output of wccassign may be given for this argument.
WCC values of all mapped objects (not needed for type equaling "smooth-ness"). Default is to take the training objects (if available).
Other properties of the crystals that should be shown in the summary, as e.g. reduced cell parameters.
Further arguments. Currently ignored.
Several types of summary are calculated and printed to the screen. If type equals "unit", a summaryis given of all objects mapped to that unit. In case type equals "object", the summary indicates theunit to which the object is mapped and shows other objects mapped to it. For a type of "smooth-ness", the function returns the ratio of WCC values between neighbouring and non-neighbouringunits. Values larger than 1 show that neighbouring units are "more alike" than non-neighbouringunits. Type "quality" does something similar: it compares all data objects and takes the ratio ofWCCs between objects mapped to the same unit and objects mapped to different units.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid=gr, trwidth=20, rlen=100)
summary(x, "unit", nr=1, labels=paste(cepha$class.names, 1:20, sep=""))summary(x, "object", nr=1, labels=paste(cepha$class.names, 1:20, sep=""))summary(x, type = "smoothness")summary(x, "quality")
Function to calculate distances between units in a SOM
Function calculates Euclidean distances between units in a SOM; if argument ’"toroidal"’ is TRUE,the edges of the map are considered to be joined so that the overal shape of the map is a torus. Thedistances are calculated correspondingly.
For toroidal maps, equal to TRUE. Default is FALSE.
gr <- somgrid(3, 3, "hexagonal")x <- list(grid = gr)class(x) <- "wccsom"
par(mfrow = c(1,2))unit.dists <- unit.distances(gr, toroidal = FALSE)plot(x, type = "property", property = unit.dists[1,],
main = "Distances to unit 1", zlim = c(0,2.75), contin = TRUE)
unit.dists <- unit.distances(gr, toroidal = TRUE)plot(x, type = "property", property = unit.dists[1,],
main = "Toroidal distances to unit 1", zlim = c(0,2.75), contin = TRUE)
Agreement between patterns including peak shifts
Weighted cross correlation and autocorrelation, as described in De Gelder et al. (2001), for assess-ing similarities in spectra-like data containing peak shifts. Euclidean distances are useless in thissituation.
wcc(pattern1, pattern2, trwdth, wghts, acors)wac(pattern1, trwdth, wghts)wacmat(patterns, trwdth, wghts, do.transpose = TRUE)
Pattern matrix: rows correspond with patterns.
Triangle width, given in the number of data points.
Optional weights vector, will be calculated from triangle width if necessary. Sometimes it is more efficient to pre-calculate it and give it as an argument.
Autocorrelation, also optional to speed up calculations.
Internally, columns should correspond with patterns, so normally one shouldleave this value to its default: TRUE. If a matrix is already in the correct format,one can avoid unnecessary double transpose operations.
Function wcc returns the WCC value, a similarity value between 0 and 1. Functions wac and wacmatreturn weighted autocorrelations for one pattern and a matrix of patterns, respectively.
R. de Gelder, R. Wehrens, and J.A. Hageman. A generalized expression for the similarity spectra:application to powder diffraction pattern classification. J. Comput. Chem., 22(3), 273-289, 2001.
data(cepha)wac(cepha$patterns[1,], 20)wacmat(t(cepha$patterns), 20)wcc(cepha$patterns[1,], cepha$patterns[2,], 20)
Assign patterns to nodes in a SOM network by WCC value
KNN assignment of patterns to units in a Kohonen map, with maximal WCC as the criterion.
Unit numbers to which rows in the data matrix are assigned
wcc value of rows in the data matrix and the units to which they are assigned.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid = gr, trwidth = 20, rlen = 50,
sombins <- wccassign(x, cepha$patterns)
Map one powder pattern to a trained Kohonen map
Function calculates the agreement - as measured by WCC - of a powder pattern to all units in anetwork (SOM or XYF).
Kohonen network, either from wccsom or wccxyf
Returns a vector of length equal to the number of units in the network, containing all WCC values,i.e. similarities of the new pattern to every unit.
R. Wehrens, W.J. Melssen, L.M.C. Buydens and R. de Gelder. Representing Structural Databasesin a Self-Organising Map. Acta Cryst. B61, 548-557, 2005.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid=gr, trwidth=20, rlen=100)
wccs1 <- wccmap(x, cepha$patterns[1,])par(mfrow=c(1,2))plot(x, "property", property = wccs1, main="Unit similarities to object 1")plot(x, "property", property = wccs1,
main="Unit similarities to object 1", zlim=c(0.96, 1))
Mapping spectra with self-organising maps
Self-organising maps for mapping high-dimensional spectra or patterns to 2D; instead of Euclideandistance, the weighted cross correlation (WCC) similarity measure is used. Modelled after the SOMfunction in package ’class’. wccsom takes ’continous’ patterns, i.e. datapoints are equidistant.
wccsom(data, grid=somgrid(), rlen = 100, alpha = c(0.05, 0.01),
radius = quantile(nhbrdist, 0.7), init, nhbrdist, trwidth = 20,toroidal = FALSE, FineTune = TRUE, keep.data = TRUE)
Spectra or patterns to be mapped: a matrix, with each row representing a com-pound.
A grid for the representatives: see ’somgrid’.
the number of times the complete data set will be presented to the network.
a vector of two numbers indicating the amount of change. Default is to declinelinearly from 0.05 to 0.01 over rlen updates.
the initial radius of the neighbourhood to be used for each update: the decreaseis exponential over rlen updates in such a way that after one-third of the updatesonly the winning unit is updated. The default is to start with a value that covers2/3 of all units.
the initial representatives, represented as a matrix. If missing, chosen (withoutreplacement) randomly from ’data’.
optionally, the distance matrix for the units.
width of the triangle function used in the WCC measure, given in the number ofdata points.
if TRUE, then the edges of the map are joined. Note that in a toroidal hexagonalmap, the number of rows must be even.
apply kmeans for fine-tuning the codebook vectors.
store training data and their mapping in the network.
an object of class ’"wccsom"’ with components
the grid, an object of class ’"somgrid"’.
vector of mean average deviations from code vectors
the triangle width used for the WCC measure
autocorrelations of the code vectors.
setting of parameter ’toroidal’.
setting of parameter ’FineTune’.
mapping of training data: a vector of unit numbers. Only if keep.data equalsTRUE.
WCC values of all training data, compared to the best matching codebook vec-tor. Only if keep.data equals TRUE.
WAC values for training data. Only if keep.data equals TRUE.
R. Wehrens, W.J. Melssen, L.M.C. Buydens and R. de Gelder. Representing Structural Databasesin a Self-Organising Map. Acta Cryst. B61, 548-557, 2005.
data(cepha)gr <- somgrid(3,3, "hexagonal")set.seed(7)x <- wccsom(cepha$patterns, grid=gr, trwidth=20, rlen=100)
Supervised mapping of spectra with self-organising maps
Supervised self-organising maps for mapping high-dimensional spectra or patterns to 2D; insteadof Euclidean distance, the weighted cross correlation (WCC) similarity measure is used. Modelledafter the SOM function in package ’class’. wccxyf takes ’continous’ patterns, i.e. datapoints areequidistant.
At this point, no facilities are implemented for growing networks or k-means-like fine-tuning of themaps, such as in function wccsom.
wccxyf(data, Y, grid=somgrid(), rlen = 100, alpha = c(0.05, 0.01),
radius = quantile(nhbrdist, 0.67), xweight = 0.5, trwidth = 20,toroidal = FALSE, keep.data = TRUE)
Spectra or patterns to be mapped: a matrix, with each row representing a com-pound.
Property for each pattern, either a numerical vector or matrix, or a class matrix. In the latter case, the Tanimoto distance is used for Y; in all other cases (also forcombinations of numerical and class properties) the Euclidean distance is used.
A grid for the representatives: see ’somgrid’.
the number of times the complete data set will be presented to the network.
a vector of two numbers indicating the amount of change. Default is to declinelinearly from 0.05 to 0.01 over rlen updates.
the initial radius of the neighbourhood to be used for each update: the decreaseis exponential over rlen updates in such a way that after one-third of the updatesonly the winning unit is updated. The default is to start with a value that covers2/3 of all units.
weight of X matrix in determining the distances of objects to units.
width of the triangle function used in the WCC measure, given in the number ofdata points.
if TRUE, then the edges of the map are joined. Note that in a toroidal hexagonalmap, the number of rows must be even.
store training data and their mapping in the network.
an object of class ’"wccsom"’ with components
the grid, an object of class ’"somgrid"’.
vector of mean average deviations from code vectors
the triangle width used for the WCC measure
autocorrelations of the code vectors.
setting of parameter ’toroidal’.
setting of parameter ’FineTune’.
mapping of training data: a vector of unit numbers. Only if keep.data equalsTRUE.
WCC values of all training data, compared to the best matching codebook vec-tor. Only if keep.data equals TRUE.
WAC values for training data. Only if keep.data equals TRUE.
FIXME: this page is a copy of wccsom, should be edited further
data(degelder)gr <- somgrid(5, 5, "hexagonal")set.seed(7)x <- wccxyf(degelder$patterns, degelder$properties[,"cell.vol"],
cepha, classmat2classvec (classvec2classmat), classvec2classmat,
plot.heatkey (plot.wccsom), plot.wccchanges (plot.wccsom), plot.wcccodes (plot.wccsom), plot.wcccounts (plot.wccsom), plot.wccmapping (plot.wccsom), plot.wccpred (plot.wccsom), plot.wccprop (plot.wccsom), plot.wccquality (plot.wccsom), plot.wccsom, predict.wccsom,
TRIMIDINETM POWDER Australia: FOR THE TREATMENT OF INFECTIONS DUE TO ORGANISMS SUSCEPTIBLE TO THE COMBINATION OF SULFADIMIDINE AND TRIMETHOPRIM IN HORSES, CALVES, PIGS AND POULTRY. New Zealand: FOR THE TREATMENT OF INFECTIONS DUE TO ORGANISMS SUSCEPTIBLE TO THE COMBINATION OF SULFADIMIDINE AND TRIMETHOPRIM IN HORSES. PRESENTATION: Powder. ACTIVE CONSTITUENTS: Each gra
Evaluating Security Products with Clinical Trialsdate malware signatures, poorly written software, com-placent users. . . security experts can pontificate at lengthOne of the largest challenges faced by purchasers of se-regarding the weaknesses of current systems. However,curity products is evaluating their relative merits. Whilemoving from this subjective, qualitative list to more con-custo