Short overview of potential work flow for de-novo assembly of transcriptoms using Trinity. Please see the getting started with your data tutorial for details about raw read processing. For more information about Trinity go to the Trinity website. Some of the python scripts linked to on this page use the screed module, you can download screed here.

1. Concatenate raw read files:
For paired end data: Combine R1 and R2 reads into 2 separate files.
For single end data: Combine all reads into a single file.
#Combine R1 reads into single file
cat *R1.fq >> R1.fq
#Combine R2 reads into single file
cat *R2.fq >> R2.fq
2. Quality trim raw reads. One option is to use See the scripts page from more details.
There are a few variables in q-trim you can modify:
QSCORE: Set your QSCORE following the ASCII table here (e.g., QSCORE= '5' should trim reads in your fastq file if their Phred < 21)
INTERCRAP: determines how many contiguous bases of low quality you are willing to ignore inside any given read (default is 5 bp).
MINLENGTH: determines the minimum read length you want to retain (default = 30 bp).
#Using q-trim
python infile.fq outfile.fq
3. Optional: If you are assembling paired end reads you may want to extract only reads for which both pairs remain after trimming. You can use to do this.
python R1_trimmed.fq R2_trimmed.fq
#Executing the above command will create 2 files in your working directory R1_trimmed.fq.both and R2_trimmed.fq.both
4. Download and install Trinity:
After downloading Trinity cd to the installation directory and type make. See the installing programs page for more details about this topic.
5. Execute Trinity:
#Set stacksize to unlimited.
ulimit -a
# Example Tinity run for paired-end data. --seqType fq --left R1_trimmed_reads.fq --right R2_trimmed_reads.fq --CPU 10 --output out/directory
All of the Trinity program options can be found here. The following commands are used above:
--seqType <string> :type of reads: ( cfa, cfq, fa, or fq )
--left <string> :left reads
--right <string> :right reads
--output <string> :name of directory for output (will be created if it doesn't already exist), default( "trinity_out_dir" )
--CPU <int> :number of CPUs to use, default: 2

You can use Trinity to assemble multiple paired-end library fragment sizes: set the —group_pairs_distance (default 500) according to the larger insert library. Pairings that exceed that distance will be treated as if they were unpaired by the Butterfly process. Trinity's defaults are tuned to a library with a 300 base fragment length and 76 base reads.

If you are running Trinity on the ittc server you can use a script called Colony.bash. This script monitors Java's memory use and garbage collection and makes executing Trinity more efficient. Below is a sample pbs script for submitting a Trinity job to qsub using the Colony.bash shell script. More information about queuing systems can be found here.
#PBS -N Job_Name
#PBS -l nodes=1:ppn=16,mem=120G,walltime=48:00:00
#PBS -S /bin/bash
#PBS -q bigm
#PBS -M your_email
#PBS -m abe
#PBS -o /path/to/out.log
#PBS -e /path/to/error.log
#Set stack size to unlimited.
ulimit -s unlimited
#cd to to directory containing trinity data and trinity out directories.
cd /my/trinity/data
#Execute trinity:
/bio/tools/5.1/trinity/RBMM/Colony.bash -w /working/directory -o /out/directory -s fq -l R1.fq -r R2.fq --CPU 10
--bfly JavaVM64bit --bflyHeapSpace 20G --bflyMinHeapSpace 20G --bflyHeapNursery 12G --bflyJavaGCParallel
--bflyJavaGCThreads 16 --repeat 5 --bflyJavaCmdLifespan_min 5 --bflyJavaCmdLifespan_max 1800
--bfly_opts "-V 10 --stderr"
6. What to do if the assembly doesn't finish: Sometimes your Trinity run may not execute to completion. If all or the majority of the Butterfly runs have finished you can combine the results of those runs into a fasta file of contigs using the command below.
#Execute this command from your Trinity output folder to concatenate all completed Butterfly assemblies
find chrysalis/ -name "*allProbPaths.fasta" -exec cat {} \; > Trinity.fasta