Inter-species Array Design

Download and Usage for Inter-species Array Probe Design program

Generate Agilent custom gene expression array probes commonly used for genes in homolog groups from a input file.
Contact: Jun Sese <sesejun_AT_cs.titech.ac.jp>
Download: ArrayDesign.zip. Zipped archive containing a java executable file and sample files.

1. Requirements

  • Java 1.5 or later
  • Memory: depend on sequence size. Large memory is required.
  • Two types of input files
    • (a) A multi-fasta file for each target species.e.g.).  For generating probes commonly used for human and mouse,you need two multi-fasta files: one for human and the other for mouse.
    • (b) A file including homolog groups. Tab-delimited format.
      • 1st column: homolog group name
      • 2nd column: gene name belonging to the homolog group in 1st species.
      • 3rd column: gene name belonging to the homolog group in 2nd species.
      • Note that: the program prioritize generating probe sequence from 1st    column to last column. So, the order may affect the final input.

Usage:

In the directory ArrayDesign, run the following command.

$ java -jar ArrayDesign.jar HumanGenes.fasta,RatGenes.fasta,MouseGenes.fasta GroupData.txt HumanGenes.fasta,RatGenes.fasta,MouseGenes.fasta result.xls

Arguments:

  1. multifasta files for each target species. Separated by commas.   The order is corresponding to the column order in homolog group file.
  2. homolog group file
  3. multifasta files to check crosshybrid. Most case this argment is the same as 1st argment. If you want to design probes which measure rat and mouse but do not measure some bacterial genes, you can put the bacterial genes’ file on this argument adding the first argument files.
  4. output filename

Example: This archive include the following test data.

  • Multifasta files: HumanGenes.fasta, RatGenes.fasta and MouseGenes.fasta
  • Homolog group file: GroupData.txt

Let us generate probes commonly used for human, rat and mouse. In GroupData.txt, each line contains group name, human gene annotation, rat gene annotation and mouse gene annotation. We put gene sequence filenames according to the order as first argument:   HumanGenes.fasta,RatGenes.fasta,MouseGenes.fasta Second argument is homolog group: GroupData.txt Third argument is the same as first argument. Forth argument is output filename.  Finally, we run the following command:

$ java -jar ArrayDesign.jar HumanGenes.fasta,RatGenes.fasta,MouseGenes.fasta GroupData.txt HumanGenes.fasta,RatGenes.fasta,MouseGenes.fasta result.xls

In this case, the dataset is very small, so we did not specify the memory size. However, most case you need to specify the memory size with -Xmx option for Java VM.

Result file format:

Result file is tab delimitted file and contains a probe in each line.

  • 1st column: group name associated with the probe
  • 2nd column: 60bp probe sequence

Example: The result.xls contains

group 		seq	Group1		AAATTTAGAGCTTAACACCAGTTGAAAAATAAAACTCACAGCTCCAACGATTTTAGCAGG

The sequence is common probe for Group1 genes. GroupData.txt include two different groups “Group1″ and “Group2″,  but this file only contains Group1 because no common probe can be designed from “Group2.” Note that: there are no identical sequence to the common probe for Group1  in both mouse and rat gene sequences bcause our experimental result determine the probe can hybridize to the mouse and rat genes even when they have mutations. The reason is described in our papers (submitted).

 

Human-Rat-Mouse inter-species array

Probe information and experimental result will be available on NCBI GEO.

Posted in Uncategorized | Comments Off