ArrayMiner - FAQs

ArrayMiner

FAQs

ArrayMiner FAQ

ArrayMiner 2 FAQ

Troubleshooting

ArrayMiner FAQ's

Q1	Why should I buy another method of clustering, given that there are many of them available in the public domain ?
AM	Unlike other classic clustering tools, ArrayMiner is a rigorous optimization tool, which means that it will find the best possible clusters instead of using a simple algorithm and supplying a suboptimal classification. This means that no important similarity between genes goes unnoticed, and no bogus cluster is produced.

Q2	Is there a demo version of ArrayMiner ?
AM	Of course, you can request a copy here.

Q3	Are there differences between the PC and the Macintosh version ?
AM	The two versions have a large amount of common code and the algorithm is exactly the same. The current Macintosh version uses a file-based communication with GeneSpring, due to Operating System issues.

Q4	Is there a Mac OS 9 version of ArrayMiner ?
AM	Due to our large body of cross-platform code between Macintosh OS X and Windows PC version, we are not planning to support Macintosh OS 9.

Q5	Is it possible to use ArrayMiner without GeneSpring ?
AM	Yes, your data are easily imported and exported in popular file formats. That being said, the seamless integration of ArrayMiner with GeneSpring offers you an integrated solution for exploitation of your geneomics/proteomics data.

ArrayMiner 2 FAQ's

What are the advantages of the new Gaussian clustering model ?

AM2

The Gaussian model is able to take into account the fact that different functional groups of genes may (and usually do) have different spread (variance) of their observed expression profiles, which is something that purely distance-based clustering tools (such as k-Means) cannot do, because they were not designed to do so. Also unique to the Gaussian clustering model is that it can identify outliers, i.e. expression profiles that cannot be reasonably clustered with other profiles.
In practice, this means that the Gaussian model is better able to identify the true structure of the data, which is illustrated by the fact that the clusters supplied by this model are typically stable whatever the requested number of clusters.

The white paper says the Gaussian model eliminates the problem of specifying the number of clusters, but ArrayMiner2 still asks for it. How come?

AM2

Only the user knows how much detail he or she wants in their clusters, and they convey this information by specifying the number of the clusters. However, giving the "wrong" number of clusters will merely yield a too detailed or too sketchy set of clusters, the clusters will still make sense and be consistent with a result obtained with a different number of clusters.
This is in contrast with most other tools, which typically supply inconsistent clusters with different number of clusters requested - identifying a stable and trustworthy set of clusters is often extremely difficult with those tools.

In the Gaussian model, the clusters are quite stable when I request different numbers of clusters - when I load the successive classifications into the Classification Compare tool, I can actually see a hierarchy of clusters. Is ArrayMiner2 some kind of hierarchical clustering ?

AM2

No. ArrayMiner always performs a non-hierarchical clustering of your data - in particular, when it clusters into, say, 10 clusters, it does not cluster into 9 first. The fact that the clusters are very stable despite this, is a very desirable "byproduct" of the tool's ability to identify the true structure of the data.
Hierarchical clustering tools are very different: the stability of the "clusters" they supply is only obtained by explicitly building the tree from bottom up. The major disadvantage of this approach is that the tree is fully contingent on the quality of clustering at the lowest levels, which can be shown to be dubious in many cases.

Q4	What distance measures are available with the new Gaussian clustering model?
*AM2*	All of the three most widely used distances are available, namely Euclidean, Pearson Coefficient and the standard Correlation. The distance measure is used to evaluate the probability of a given expression profile to belong to a given cluster represented by a Gaussian distribution.

Q5	How do I use the new Gaussian clustering model ?
*AM2*	After your data have been loaded, simply select the second tab, "Gaussian clustering" in the Analysis selection window.

Q6	Are clusters obtained with the Gaussian model readily comparable with clusters obtained with other tools, e.g. k-Means or dendrogram ?
*AM2*	Not in general. We are not aware of another gene expression clustering tool that would take cluster variance into account, which means that the other tool most probably uses a simpler clustering model. Consequently, the clusters it supplies probably lack the additional details of ArrayMiner2's clusters, and will therefore be difficult to compare with them.

Q7	Computing the Gaussian model seems to take longer than the classic minimum variance. Is this normal ?
*AM2*	The Gaussian model is a more complete model of your data than the classic minimum variance model (used e.g. by k-Means). As a result, there are more parameters to be estimated, i.e. the computational burden is somewhat greater.

Troubleshooting

The installer reports a command failure.

AM*

This may happen in the following circumstances : You are probably running under Win95,Win98 or WinMe and your installation target folders have very long names.

Solution : You should install the application to a location with a shorter name and also skip the GeneSpring wrapper installation. In the latter case, use the clipboard to exchange data beween the two softwares (See the appropriate help pages in ArrayMiner online help).

I am getting the data from GeneSpring in the wrong order.

AM*

In some cases, the data received by ArrayMiner from GeneSpring are not the ones expected - the experiments (data columns) may be in the wrong order or there may be supplementary experiments. This is a glitch in the current implementation of GeneSpring's External Programs interface.

Solutions

Use the reorder option of ArrayMiner (available as of version 2.6)
Within Genespring, select the View as Spreadsheet option in the View menu, and check the Normalized checkbox only. Your data should now appear in the proper order, one column per experiment. Click the Copy All button (if the button is not enabled, click Clear Selection first). This copies the data into the system Clipboard, and can be retrieved in ArrayMiner by using the "Get Data from Clipboard" in the File Menu.
Create a new Experiment within GeneSpring, adding the appropriate data in the desired order. You can then run ArrayMiner on this Experiment as usual.

The ArrayMiner Macintosh wrapper stalls

AM*

This may happen in the following circumstances : You are probably running a large dataset and the virtual machine reports a memory full exception.

Solution : The problem comes from the GeneSpring launch application which ignores the amount of memory you specify in GeneSpring. If you are familiar with OS X, simply open the GeneSpring.app package, localize the file named : MRJApp.properties and open it in a plain text editor (Word or TextEdit are not suitable) and add the following line "com.apple.mrj.application.vm.options=-Xmx953m" at the begining of the file. Restart GeneSpring, your data should now load successfully. If you you don't master OS X, contact GeneSpring support here.