by

J. Podani

Department of Plant Taxonomy and Ecology, Eotvos University, H-1083 Ludovika
ter, Budapest, Hungary Fax: +36 1 1338 764 (From October 30, 1996: +36 1
3338 764). *Email:* PODANI@LUDENS.ELTE.HU

A unique user-friendly interactive environment with pictorial and text menus to facilitate choice among methods. Parameter windows to specify details of the analysis. On-line help. Easy access to DOS, Utilities and Graphics from any part of the interactive session. The programs are also executable in form of more than 50 stand-alone applications in batch mode.

You may copy a User Interface DEMO program which is a self-extractable STD.EXE file contained in a ZIP file. After unzipping, you will have to run STD.EXE to obtain two files: STDEMO.EXE and STDEMO.OVR. Just execute STDEMO in the presence of STDEMO.OVR in the same folder to see the menu system of SYN-TAX.

* NEW:* Click here to see the additional features in Version 5.1, released on August 17, 1997.

**1. CLASSIFICATION**

**1.A Hierarchical clustering.**

*Agglomerative methods:* Single and complete link, centroid and median, beta flexible, simple and
group average (WPGMA and UPGMA), minimization of variance, sum of squares
or average within cluster distance in new clusters, minimization of the
increase of variance, sum of squares or average within cluster distance.
Minimizing the ratio of within- and between-cluster average distances and
dissimilarities. Agglomeration using 4 information statistics. Treatment
of ties in the input matrix.

*Coefficients:* Yule, Jaccard, Sorensen, Ochiai. Anderberg I-II, Simple matching coefficient,
Russell-Rao, Rogers- Tanimoto, Kulcyznski symmetric, Sokal-Sneath, PHI,
Baroni-Urbani-Buser I-II. Correlation, Percentage dissimilarity, Ruzicka,
Similarity Ratio, City block metric, Mean character difference, Canberra
metric, Euclidean distance, Mahalanobis generalized distance, Chord distance,
Angular separation, Penrose size and shape, Balakrishnan-Shangvi, Horn.
User supplied distance matrix. Gower general index for mixed data.

*Run time standardization and tansformation:* Range, Unit variance, Logarithmic, Power, Clymo, Arc sin.

*Divisive methods:* Monothetic divisions based on mutual information or information fall for
variables.

Minimum spanning trees.

**1.B Non-hierarchical classification.**

K-means clustering, multiple partitioning (hierarchical setup for k-means),
minimizing the ratio of within- and between-cluster average distances and
dissimilarities to obtain partitions using the coefficients listed above.
Quick clustering of very large data sets, using most of the coefficients
listed above. Fuzzy c-means clustering.

Principal components analysis based on cross products, covariances or correlations.
PCA biplots. Percentage contributions.

Canonical correlation analysis. Variances and redundancy. Interset and
intraset correlations. Bartlett tests. Scatterplots for various combinations
of axes. Correspondence analysis with symmetric and asymmetric weighting
of rows and columns. COA Joint plots.

Principal coordinates analysis (metric multidimensional scaling) and nonmetric
multidimensional scaling with the same coefficients as agglomerative clustering.
Shepard diagrams. Flexible shortest path adjusment of input distances to
diminish arch effect. Scatterplots. Canonical variates analysis. Discriminant
functions, spherized on non-spherized scores, Bartlett tests, biplots visualizing
correlation between variables and functions, isodensity and confidence circles.
Eigenanalysis of symmetric matrices.

Optimization of the block structure of data matrices based on chi-square, sum of squares or entropy (block clustering). Seriation (diagonalization) of data and distance matrices to maximize diagonal structure. Row and column contributions. Analysis of concentration.

Ordering of variables based on cross products, sum of squares, variances or information statistics. Elimination of residuals or simple ranking. Save of reordered matrices.

Pairwise and multiple comparison of partitions (12 coefficients), fuzzy
partitions, dendrograms (16 coefficients). Procrustes analysis of ordinations.
Mantel tests for the comparison of distance matrices. Cophenetic correlation
to compare dendrograms and distance matrices. Hierarchical, nonhierarchical
and fuzzy consensus of partitions. Generalized Procrustes analysis for consensus
ordinations.

Simulation of the distribution of coefficients of partition agreement,
dendrogram dissimilarity and Procrustes distance.

Evaluation of the importance of variables in hierarchical and non-hierarchical
clustering. Finding the optimum number of clusters in dendrograms based
on dissimilarity- compatible ranking methods. Probability ellipses to enhance
ordination interpretability.

Simulated sampling based on digitized coordinates of individuals (user-specified plot size, shape, arrangement and sample size). Expectation of inter-plot similarities to evaluate scale of point pattern. Information theory measures of scale dependence in point patterns.

Data analysis routines automatically visualize graphics results, which can be saved in TIF or PCX formats, or printed on HP-laserjets, EPSON 24-pin dot matrix printers and compatibles. Graphics data may be saved for future visualization by separate graphics routines. Main features include: dendrograms with and without labels, minimum spanning tree topologies, scattergrams for objects, biplots and joint plots, rotating plots (to create an illusion of a 3D ordination of grouped or ungrouped objects), canonical variates scatterplots indicating group memberships and isodensity circles, superposition of probability ellipses, partition convex hulls, minimum spanning trees over ordination scattergrams, Shepard diagrams and graphical matrix comparisons, point patterns with or without species labels, histograms, line diagrams.

Permanent data standardization and transformation: range, unit variance,
unit length, logarithmic, power, Clymo, arc sin, row or column total, row
or column maximum, centring, double centring, binarization. Built-in full
screen file editor with mouse support. Simple stepwise entering of data,
distance or other special files (dendrogram merge matrices, partition cluster
membership files, etc).

Listview (file browser) program. Conversion of distance matrix formats.
Conversion of Cornell data formats. Transposing matrices.

**System requirements: **IBM-pc or compatibles with DOS 3.0 or higher. Icons provided for WINDOWS
3.1 support. CGA, Hercules, EGA or VGA monitor for graphics. Math co- processor
not required but used of present. Min. 520 kbytes of free RAM, 300 kbytes
of EMS for the interface. Support of HP Laserjets and Epson 24-pin dot matrix
printers and compatibles.

**Cost: **200 USD (educational), 150 USD (private), 300 USD (site license with 5 manuals),
100 USD (upgrade from ver. 4.0 or older), 300 USD (PC and MAC versions together).

**Packlist: **Two 3.5 in diskettes with programs and sample data in form of self-extracting
compressed files, a 104- page User's Manual.

**Further reference **(not included)

Podani, J. 1994. *Multivariate Data Analysis in Ecology and Systematics. A methodological
guide to the SYN-TAX 5.0 package.* SPB Publishing, The Hague, The Netherlands.

This page last updated on Sept. 30, 1996