Tissue Microarray Software -- Combiner PCL File Format

Tissue Microarray

Combiner Home

System Requirements

System requirements for using the TMA-Combiner

Explore

Explore combined TMA datasets.

Download

Download the programs and documentation

Revision History

FAQ

Frequently Asked Questions

Links

Site Index

Authors

TMA Home

Web Portal

TMA-Combiner File Format

File format requirements

The standard file format for the TMA-Combiner is the PCL format. The PCL format is the Pre CLuster format as described in the TMA-Deconvoluter walkthrough. It is one of the two output formats of the TMA-Deconvoluter. Since the TMA-Combiner is designed to work with the TMA-Deconvoluter, this should not present a problem to most users.

Below is an example of the PCL format:

PCL File layout -- screenshot

Click on the image for a larger view of the picture.

Column A: UID (for Unique IDentifier; required). If you use Stainfinder, this column contains the image filename and antibody stains that are passed into the Stainfinder program. The way this is done can be found here under the Stainfinder walkthrough.
Column B: NAME (required). This is the most important column in the file, since the TMA-Combiner uses this as the basis for identifying replicates, for subsequent "compression". Each cell contains a case number followed by various descriptors, all of them each separated by a "pipe" ("|") delimiter. For example:

1208 | breast | malignant | carcinoma | ductal | invasive

Your NAME column must contain the case number as the very first item (1208 in this example), and your NAME column must use the "pipe" ("|") character as the delimiter. Again, this is the standard format used in the TMA-Deconvoluter output files, so this should not pose any problems for most users.
Column C: GWEIGHT (optional but highly recommended). This is the "GWEIGHT" column used by the Cluster program for providing the option of weighting cases differently (see TMA website and Cluster manual for details). The PCL file format incorporates this column by default; if it is absent in the input file, it will be inserted by the TMA-Combiner.

Column D, etc.: Antibodies. Row 1 contains the name of the target protein for the antibody stain. If different TMAs contain the same antibody, and/or if a given TMA contains replicates or multiple score sets (e.g. by different pathologists), the name of the target protein should be separated with an underscore ("_") from the initials of the scorer or other unique identifying information. This is very important because TMA-Combiner will use that as the basis for determining what columns are to be combined, and any names that are not identical will be treated as different entities that will not be combined. For example:

Column	Before	After
D	bcl2	bcl2
E	mib1	mib1
F	er_mv-10-00	(2)er
G	er_lt-03-01	--
H	ER	ER
I	mib2_yv-10_03	mib2

Note that Column H will NOT be combined with Columns F and G, because the name matching is case sensitive. Furthermore, any annotations after the leftmost ("_") will be truncated (such as for Column I), even if the column is not combined with any other columns in the final dataset.

Row 2: EWEIGHT (optional but highly recommended). This is the "EWEIGHT" row used by the Cluster program for providing the option of weighting antibodies differently (see TMA website and Cluster manual for details). The PCL file format incorporates this row by default; if it is absent in the input file, it will be inserted by the TMA-Combiner.

Note: the TMA-Combiner will output files in PCL format, regardless of the format of the input files.

Other formats

While the PCL file format is the native format of the TMA-Combiner, it will recognize two other formats, for ease of convenience to the user. Note: if problems arise, the user is requested to use the TMA-Deconvoluter to output into the PCL format.

The K-M format. A detailed description of this format is present here. An example of this format is shown below.

K-M File layout -- screenshot

Click

on the image for a larger view of the picture.

A simple text tab-delimited format. This would be equivalent to the PCL format, except that the UID and GWEIGHT columns and the EWEIGHT row are missing.

Last edited by Chih Long Liu on August 15, 2005