SYN-TAX 5.02 Macintosh version


 

Computer program for multivariate data analysis


For research and education


by
 

J. Podani
Department of Plant Taxonomy and Ecology, Eotvos University, H-1083 Ludovika ter, Budapest, Hungary. Fax: +36 1 1338 764 (From October 30, 1996: +36 1 3338 764). Email: PODANI@LUDENS.ELTE.HU
 

0. MAIN FEATURES

A package of six independent applications, each including a certain set of methods of data exploration (according to sections 1-6 below). The structure of applications follow the Mac standard (menu bar, File, Edit, Fonts standard menus and the program-specific menus: Options, Graphics, Utilities, Settings, standard file- and print- dialogs, alerts, etc.). The output window is used by the simple text editor. The programs use regular ASCII data files.

1. HIERCLUS: HIERARCHICAL CLUSTERING.

Agglomerative methods: Single and complete link, centroid and median, beta flexible, simple and group average (WPGMA and UPGMA), minimization of variance, sum of squares or average within cluster distance in new clusters, minimization of the increase of variance, sum of squares or average within cluster distance. Minimizing the ratio of within- and between-cluster average distances and dissimilarities. Agglomeration using 4 information statistics. Treatment of ties in the input matrix.
Coefficients: Yule, Jaccard, Sorensen, Ochiai. Anderberg I-II, Simple matching coefficient, Russell-Rao, Rogers- Tanimoto, Kulcyznski symmetric, Sokal-Sneath, PHI, Baroni-Urbani-Buser I-II. Correlation, Percentage dissimilarity, Ruzicka, Similarity Ratio, City block metric, Mean character difference, Canberra metric, Euclidean distance, Mahalanobis generalized distance, Chord distance, Angular separation, Penrose size and shape, Balakrishnan-Shangvi, Horn. User supplied distance matrix. Gower general index for mixed data.
Run time standardization and tansformation: Range, Unit variance, Logarithmic, Power, Clymo, Arc sin.
Divisive methods: Monothetic divisions based on mutual information or information fall for variables.
Minimum spanning trees.

2. NONHIER: PARTITIONS.

K-means clustering, multiple partitioning (hierarchical setup for k-means), minimizing the ratio of within- and between-cluster average distances and dissimilarities to obtain partitions using the coefficients listed above. Quick clustering of very large data sets, using most of the coefficients listed above. Fuzzy c-means clustering.

3. ORDIN: ORDINATION.

Principal components analysis based on cross products, covariances or correlations. PCA biplots. Percentage contributions.
Canonical correlation analysis. Variances and redundancy. Interset and intraset correlations. Bartlett tests. Scatterplots for various combinations of axes. Correspondence analysis with symmetric and asymmetric weighting of rows and columns. COA Joint plots. Principal coordinates analysis (metric multidimensional scaling) and nonmetric multidimensional scaling with the same coefficients as agglomerative clustering. Shepard diagrams. Flexible shortest path adjusment of input distances to diminish arch effect. Scatterplots. Canonical variates analysis. Discriminant functions, spherized on non-spherized scores, Bartlett tests, biplots visualizing correlation between variables and functions, isodensity and confidence circles. Eigenanalysis of symmetric matrices. Scree graphs showing relative magnitude of eigenvalues.

4. MATRANK: MATRIX REARRANGEMENTS and CHARACTER RANKING

Optimization of the block structure of data matrices based on chi-square, sum of squares or entropy (block clustering). Seriation (diagonalization) of data and distance matrices to maximize diagonal structure. Row and column contributions. Analysis of concentration.
Ordering of variables based on cross products, sum of squares, variances or information statistics. Elimination of residuals or simple ranking. Save of reordered matrices.

5. EVAL: EVALUATION AND COMPARISON OF RESULTS

Pairwise and multiple comparison of partitions (12 coefficients), fuzzy partitions, dendrograms (16 coefficients). Procrustes analysis of ordinations. Mantel tests for the comparison of distance matrices. Cophenetic correlation to compare dendrograms and distance matrices. Hierarchical, nonhierarchical and fuzzy consensus of partitions. Generalized Procrustes analysis for consensus ordinations.
Simulation of the distribution of coefficients of partition agreement, dendrogram dissimilarity and Procrustes distance.
Evaluation of the importance of variables in hierarchical and non-hierarchical clustering. Finding the optimum number of clusters in dendrograms based on dissimilarity- compatible ranking methods. Probability ellipses to enhance ordination interpretability.

6. MULPATT: ANALYSIS OF MULTISPECIES POINT PATTERNS

Simulated sampling based on digitized coordinates of individuals (user-specified plot size, shape, arrangement and sample size). Expectation of inter-plot similarities to evaluate scale of point pattern. Information theory measures of scale dependence in point patterns.

7. GRAPHICS

Graphics results are automatically visualized, each diagram in a separate, resizeable window. The drawing can be printed, saved in file or copied into the clipboard for use by other programs (e.g. word-processors).
Graphics data may be saved for future visualization. Font type and size, and the colors in graphs may be specified by the user. Main features available in modules wherever appropriate: dendrograms with and without labels, minimum spanning tree topologies, scattergrams for objects, biplots and joint plots, ternary plots for 3-group fuzzy clustering results, canonical variates scatterplots indicating group memberships and isodensity circles, superposition of probability ellipses, partition convex hulls, minimum spanning trees over ordination scattergrams, scree graphs, Shepard diagrams and graphical matrix comparisons, shaded data or distance matrices, point patterns with or without species labels, histograms, line diagrams.

8. UTILITIES

The options below are available in modules wherever appropriate.
Permanent data standardization and transformation: range, unit variance, unit length, logarithmic, power, Clymo, arc sin, row or column total, row or column maximum, centring, double centring, binarization. Simple stepwise entering of data, distance or other special files (dendrogram merge matrices, partition cluster membership files, etc). Conversion of distance matrix formats. Transposing matrices.

System requirements: Mac OS 6.0 or higher. The program runs on all Macs. Color monitor is preferred. Memory requirement is at least 2.5 Mb RAM.

Cost: 200 USD (educational), 150 USD (private), 300 USD (site license with 5 manuals), 100 USD (upgrade from ver. 4.0 or older), 300 USD (PC and MAC versions together).

Packlist: One 3.5 in diskette with programs and sample data in form of self-extracting compressed files, a 91-page User's Manual describing technical details.

Further reference (not included)
Podani, J. 1994. Multivariate Data Analysis in Ecology and Systematics. A methodological guide to the SYN-TAX 5.0 package. SPB Publishing, The Hague, The Netherlands.

 

This page last updated on Oct. 1, 1996


Thanks for your interest in SYN-TAX !