SYN-TAX 5.02 PC DOS version

Computer program for multivariate data analysis

For research and education


by

J. Podani
Department of Plant Taxonomy and Ecology, Eotvos University, H-1083 Ludovika ter, Budapest, Hungary Fax: +36 1 1338 764 (From October 30, 1996: +36 1 3338 764). Email: PODANI@LUDENS.ELTE.HU

0. MAIN FEATURES

A unique user-friendly interactive environment with pictorial and text menus to facilitate choice among methods. Parameter windows to specify details of the analysis. On-line help. Easy access to DOS, Utilities and Graphics from any part of the interactive session. The programs are also executable in form of more than 50 stand-alone applications in batch mode.

DEMO version

You may copy a User Interface DEMO program which is a self-extractable STD.EXE file contained in a ZIP file. After unzipping, you will have to run STD.EXE to obtain two files: STDEMO.EXE and STDEMO.OVR. Just execute STDEMO in the presence of STDEMO.OVR in the same folder to see the menu system of SYN-TAX.

NEW: Click here to see the additional features in Version 5.1, released on August 17, 1997.

1. CLASSIFICATION

1.A Hierarchical clustering.

Agglomerative methods: Single and complete link, centroid and median, beta flexible, simple and group average (WPGMA and UPGMA), minimization of variance, sum of squares or average within cluster distance in new clusters, minimization of the increase of variance, sum of squares or average within cluster distance. Minimizing the ratio of within- and between-cluster average distances and dissimilarities. Agglomeration using 4 information statistics. Treatment of ties in the input matrix.
Coefficients: Yule, Jaccard, Sorensen, Ochiai. Anderberg I-II, Simple matching coefficient, Russell-Rao, Rogers- Tanimoto, Kulcyznski symmetric, Sokal-Sneath, PHI, Baroni-Urbani-Buser I-II. Correlation, Percentage dissimilarity, Ruzicka, Similarity Ratio, City block metric, Mean character difference, Canberra metric, Euclidean distance, Mahalanobis generalized distance, Chord distance, Angular separation, Penrose size and shape, Balakrishnan-Shangvi, Horn. User supplied distance matrix. Gower general index for mixed data.
Run time standardization and tansformation: Range, Unit variance, Logarithmic, Power, Clymo, Arc sin.
Divisive methods: Monothetic divisions based on mutual information or information fall for variables.
Minimum spanning trees.

1.B Non-hierarchical classification.

K-means clustering, multiple partitioning (hierarchical setup for k-means), minimizing the ratio of within- and between-cluster average distances and dissimilarities to obtain partitions using the coefficients listed above. Quick clustering of very large data sets, using most of the coefficients listed above. Fuzzy c-means clustering.

2. ORDINATION

Principal components analysis based on cross products, covariances or correlations. PCA biplots. Percentage contributions.
Canonical correlation analysis. Variances and redundancy. Interset and intraset correlations. Bartlett tests. Scatterplots for various combinations of axes. Correspondence analysis with symmetric and asymmetric weighting of rows and columns. COA Joint plots.
Principal coordinates analysis (metric multidimensional scaling) and nonmetric multidimensional scaling with the same coefficients as agglomerative clustering. Shepard diagrams. Flexible shortest path adjusment of input distances to diminish arch effect. Scatterplots. Canonical variates analysis. Discriminant functions, spherized on non-spherized scores, Bartlett tests, biplots visualizing correlation between variables and functions, isodensity and confidence circles. Eigenanalysis of symmetric matrices.

3. MATRIX REARRANGEMENTS

Optimization of the block structure of data matrices based on chi-square, sum of squares or entropy (block clustering). Seriation (diagonalization) of data and distance matrices to maximize diagonal structure. Row and column contributions. Analysis of concentration.

4. CHARACTER RANKING

Ordering of variables based on cross products, sum of squares, variances or information statistics. Elimination of residuals or simple ranking. Save of reordered matrices.

5. EVALUATION AND COMPARISON OF RESULTS

Pairwise and multiple comparison of partitions (12 coefficients), fuzzy partitions, dendrograms (16 coefficients). Procrustes analysis of ordinations. Mantel tests for the comparison of distance matrices. Cophenetic correlation to compare dendrograms and distance matrices. Hierarchical, nonhierarchical and fuzzy consensus of partitions. Generalized Procrustes analysis for consensus ordinations.
Simulation of the distribution of coefficients of partition agreement, dendrogram dissimilarity and Procrustes distance.
Evaluation of the importance of variables in hierarchical and non-hierarchical clustering. Finding the optimum number of clusters in dendrograms based on dissimilarity- compatible ranking methods. Probability ellipses to enhance ordination interpretability.

6. ANALYSIS OF MULTISPECIES POINT PATTERNS

Simulated sampling based on digitized coordinates of individuals (user-specified plot size, shape, arrangement and sample size). Expectation of inter-plot similarities to evaluate scale of point pattern. Information theory measures of scale dependence in point patterns.

7. GRAPHICS

Data analysis routines automatically visualize graphics results, which can be saved in TIF or PCX formats, or printed on HP-laserjets, EPSON 24-pin dot matrix printers and compatibles. Graphics data may be saved for future visualization by separate graphics routines. Main features include: dendrograms with and without labels, minimum spanning tree topologies, scattergrams for objects, biplots and joint plots, rotating plots (to create an illusion of a 3D ordination of grouped or ungrouped objects), canonical variates scatterplots indicating group memberships and isodensity circles, superposition of probability ellipses, partition convex hulls, minimum spanning trees over ordination scattergrams, Shepard diagrams and graphical matrix comparisons, point patterns with or without species labels, histograms, line diagrams.

8. UTILITIES

Permanent data standardization and transformation: range, unit variance, unit length, logarithmic, power, Clymo, arc sin, row or column total, row or column maximum, centring, double centring, binarization. Built-in full screen file editor with mouse support. Simple stepwise entering of data, distance or other special files (dendrogram merge matrices, partition cluster membership files, etc).
Listview (file browser) program. Conversion of distance matrix formats. Conversion of Cornell data formats. Transposing matrices.

System requirements: IBM-pc or compatibles with DOS 3.0 or higher. Icons provided for WINDOWS 3.1 support. CGA, Hercules, EGA or VGA monitor for graphics. Math co- processor not required but used of present. Min. 520 kbytes of free RAM, 300 kbytes of EMS for the interface. Support of HP Laserjets and Epson 24-pin dot matrix printers and compatibles.

Cost: 200 USD (educational), 150 USD (private), 300 USD (site license with 5 manuals), 100 USD (upgrade from ver. 4.0 or older), 300 USD (PC and MAC versions together).

Packlist: Two 3.5 in diskettes with programs and sample data in form of self-extracting compressed files, a 104- page User's Manual.

Further reference (not included)
Podani, J. 1994. Multivariate Data Analysis in Ecology and Systematics. A methodological guide to the SYN-TAX 5.0 package. SPB Publishing, The Hague, The Netherlands.

This page last updated on Sept. 30, 1996


Thanks for your interest in SYN-TAX !