_images/UnivLogo_Stack_2C_Dark.png

Use Splice aware aligner, Tophat2 to align short reads

  1. We are using Paired end reads and set the “Is this library mate-paired?” pulldown to “Paired-end”, then a second pulldown will appear to specify the 2nd FASTQ.

  2. “Mean Inner Distance between Mate Pairs” value for this parameter should obtained from the person incharge of the sequencing.

  3. Mean Inner Distance between Mate Pairs = length of the Fragments used for sequencing – (Length of Illumina adapters (often 120bp) + part sequenced (76+76))

  4. Genome should be obtained from the SolGenome.net (ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/) and select it from the history.

    _images/galaxy_tophat_1.png
  5. This library has been prepared to preserve the strandedness of the RNAs.

    _images/galaxy_tophat_2.png

  1. Minimum and maximum intron lengths should be changed according to genome used.

    _images/galaxy_tophat_3.png

  1. Change the intron lengths for split reads as well.

    _images/galaxy_tophat_4.png

Output files:

  1. accepted_hits (BAM, BAI)

  1. Two binary files: .BAM (data) and .BAI (index)

3. These are the actual paired reads mapped to their position on the genome, and split across exon junctions. This can be visualized in IGV, IGB or UCSC, but you must download both .BAM and .BAI files to the same directory. splice_junctions (BED)


  1. BED file (list of genomic locations, no sequence) listing all the places TopHat had to split a read into two pieces to span an exon junction. This can be visualized at UCSC or in IGV, etc.

  1. deletions (BED) (if indel search is on)

  1. insertions (BED) (if indel search is on)