SYN-TAX 5.02 Macintosh version
Computer program for multivariate data analysis
For research and education
by
J. Podani
Department of Plant Taxonomy and Ecology, Eotvos
University, H-1083 Ludovika ter, Budapest, Hungary.
Fax: +36 1 1338 764 (From October 30, 1996: +36 1 3338
764). Email:
PODANI@LUDENS.ELTE.HU
0. MAIN FEATURES
A package of six independent applications, each including
a certain set of methods of data exploration (according
to sections 1-6 below). The structure of applications
follow the Mac standard (menu bar, File, Edit, Fonts
standard menus and the program-specific menus: Options,
Graphics, Utilities, Settings, standard file- and print-
dialogs, alerts, etc.). The output window is used by the
simple text editor. The programs use regular ASCII data
files.
1. HIERCLUS: HIERARCHICAL CLUSTERING.
Agglomerative methods: Single and complete link, centroid
and median, beta flexible, simple and group average
(WPGMA and UPGMA), minimization of variance, sum of
squares or average within cluster distance in new
clusters, minimization of the increase of variance, sum
of squares or average within cluster distance. Minimizing
the ratio of within- and between-cluster average
distances and dissimilarities. Agglomeration using 4
information statistics. Treatment of ties in the input
matrix.
Coefficients: Yule, Jaccard, Sorensen, Ochiai. Anderberg
I-II, Simple matching coefficient, Russell-Rao, Rogers-
Tanimoto, Kulcyznski symmetric, Sokal-Sneath, PHI,
Baroni-Urbani-Buser I-II. Correlation, Percentage
dissimilarity, Ruzicka, Similarity Ratio, City block
metric, Mean character difference, Canberra metric,
Euclidean distance, Mahalanobis generalized distance,
Chord distance, Angular separation, Penrose size and
shape, Balakrishnan-Shangvi, Horn. User supplied distance
matrix. Gower general index for mixed data.
Run time standardization and tansformation: Range, Unit
variance, Logarithmic, Power, Clymo, Arc sin.
Divisive methods: Monothetic divisions based on mutual
information or information fall for variables.
Minimum spanning trees.
2. NONHIER: PARTITIONS.
K-means clustering, multiple partitioning (hierarchical
setup for k-means), minimizing the ratio of within- and
between-cluster average distances and dissimilarities to
obtain partitions using the coefficients listed above.
Quick clustering of very large data sets, using most of
the coefficients listed above.
Fuzzy c-means clustering.
3. ORDIN: ORDINATION.
Principal components analysis based on cross products,
covariances or correlations. PCA biplots. Percentage
contributions.
Canonical correlation analysis. Variances and redundancy.
Interset and intraset correlations. Bartlett tests.
Scatterplots for various combinations of axes.
Correspondence analysis with symmetric and asymmetric
weighting of rows and columns. COA Joint plots.
Principal coordinates analysis (metric multidimensional
scaling) and nonmetric multidimensional scaling with the
same coefficients as agglomerative clustering. Shepard
diagrams. Flexible shortest path adjusment of input
distances to diminish arch effect. Scatterplots.
Canonical variates analysis. Discriminant functions,
spherized on non-spherized scores, Bartlett tests,
biplots visualizing correlation between variables and
functions, isodensity and confidence circles.
Eigenanalysis of symmetric matrices.
Scree graphs showing relative magnitude of eigenvalues.
4. MATRANK: MATRIX REARRANGEMENTS and CHARACTER RANKING
Optimization of the block structure of data matrices
based on chi-square, sum of squares or entropy (block
clustering). Seriation (diagonalization) of data and
distance matrices to maximize diagonal structure. Row and
column contributions. Analysis of concentration.
Ordering of variables based on cross products, sum of
squares, variances or information statistics. Elimination
of residuals or simple ranking. Save of reordered
matrices.
5. EVAL: EVALUATION AND COMPARISON OF RESULTS
Pairwise and multiple comparison of partitions (12
coefficients), fuzzy partitions, dendrograms (16
coefficients). Procrustes analysis of ordinations. Mantel
tests for the comparison of distance matrices. Cophenetic
correlation to compare dendrograms and distance matrices.
Hierarchical, nonhierarchical and fuzzy consensus of
partitions. Generalized Procrustes analysis for consensus
ordinations.
Simulation of the distribution of coefficients of
partition agreement, dendrogram dissimilarity and
Procrustes distance.
Evaluation of the importance of variables in hierarchical
and non-hierarchical clustering. Finding the optimum
number of clusters in dendrograms based on dissimilarity-
compatible ranking methods. Probability ellipses to
enhance ordination interpretability.
6. MULPATT: ANALYSIS OF MULTISPECIES POINT PATTERNS
Simulated sampling based on digitized coordinates of
individuals (user-specified plot size, shape, arrangement
and sample size). Expectation of inter-plot similarities
to evaluate scale of point pattern. Information theory
measures of scale dependence in point patterns.
7. GRAPHICS
Graphics results are automatically visualized, each
diagram in a separate, resizeable window. The drawing can
be printed, saved in file or copied into the clipboard
for use by other programs (e.g. word-processors).
Graphics data may be saved for future visualization. Font
type and size, and the colors in graphs may be specified
by the user. Main features available in modules wherever
appropriate: dendrograms with and without labels, minimum
spanning tree topologies, scattergrams for objects,
biplots and joint plots, ternary plots for 3-group fuzzy
clustering results, canonical variates scatterplots
indicating group memberships and isodensity circles,
superposition of probability ellipses, partition convex
hulls, minimum spanning trees over ordination
scattergrams, scree graphs, Shepard diagrams and
graphical matrix comparisons, shaded data or distance
matrices, point patterns with or without species labels,
histograms, line diagrams.
8. UTILITIES
The options below are available in modules wherever
appropriate.
Permanent data standardization and transformation: range,
unit variance, unit length, logarithmic, power, Clymo,
arc sin, row or column total, row or column maximum,
centring, double centring, binarization.
Simple stepwise entering of data, distance or other
special files (dendrogram merge matrices, partition
cluster membership files, etc).
Conversion of distance matrix formats.
Transposing matrices.
System requirements: Mac OS 6.0 or higher. The program
runs on all Macs. Color monitor is preferred. Memory
requirement is at least 2.5 Mb RAM.
Cost: 200 USD (educational), 150 USD (private), 300 USD
(site license with 5 manuals), 100 USD (upgrade from ver.
4.0 or older), 300 USD (PC and MAC versions together).
Packlist: One 3.5 in diskette with programs and sample
data in form of self-extracting compressed files, a
91-page User's Manual describing technical details.
Further reference (not included)
Podani, J. 1994. Multivariate Data Analysis in Ecology
and Systematics. A methodological guide to the SYN-TAX
5.0 package. SPB Publishing, The Hague, The Netherlands.
This page last updated on Oct. 1, 1996
Thanks for your interest in SYN-TAX !