Analyzing+Barcode+Data+in+SPIDER

Spider is an R package for visualizing and analyzing barcode sequence data. This tutorial is a condensed version of the **[|available Spider tutorial]** with an alternative method for plotting the barcoding gap. More information about this package can be found at **the Spider website**.



code format="rsplus" install.packages("spider") library(spider) code code format="rsplus" Aln <- read.dna("path/to/mySequences.fas", format="fasta") code code format="rsplus" SplitNames <- strsplit(dimnames(Aln)1, split="_") Spp <- sapply(SplitNames, function(x) paste(x[1], x[2], sep="_")) Gen <- sapply(SplitNames, function(x) paste(x[1])) code code format="rsplus" dataStat(Spp, Gen)
 * 1.** Align your sequences and export the alignment as a fasta file.
 * 2.** Get R: Download R from a selected **CRAN mirror**.
 * 3.** Start R
 * 4.** Get and load Spider:
 * 5.** Import your alignment into R:
 * 6.** Define genus and species vectors:
 * 1) These commands assume that the sequences in the alignment are labeled in the following way: genus_species_accession
 * 1) Species Vector
 * 1) Genus Vector
 * 7.** Calculate summary statistics:
 * 1) This function will print summary statistics for your data, example below.

Genera Species Min  Max  Median  Mean  Thresh 1     4        2    16   10      9     1 code code format="rsplus" Dist <- dist.dna(Aln, pairwise.deletion = TRUE) code "The “barcoding gap” ( **[|Meyer & Paulay, 2005]** ) is an important concept in DNA barcoding. It is the assumption that the amount of genetic variation within species is smaller than the amount of variation between species. This allows the two to be distinguished. As pointed out by **[|Meier et al. (2008)]**, the barcode gap should be calculated using the smallest, rather than the mean interspecific distances. __Spider generates two statistics for each individual in the dataset, the furthest intraspecific distance among its own species—maxInDist and the closest, non-conspecific (i.e., interspecific distance)—nonConDist.__" code format="rsplus" inter <- nonConDist(Dist, Spp) intra <- maxInDist(Dist, Spp) code code format="rsplus" pdf("My_Barcode_Plot.pdf") DensityInter <- density(inter) DensityIntra <- density(intra) plot(DensityIntra, xaxt ='n', yaxt ='n', main ='My Plot', xlab = "genetic distance") polygon(DensityIntra, col="red") lines(DensityInter) polygon(DensityInter, col=rgb(1,1,0,0.5)) legend('topright', legend=c('intra','inter'), fill=c('red',rgb(1,1,0,0.5))) dev.off code code format="rsplus" IntraHist <- hist(intra,plot=F) InterHist <- hist(inter,plot=F) Dist <- IntraHist$breaks[2]-InterHist$breaks[1] Breaks <- seq(min(IntraHist$breaks,InterHist$breaks),max(IntraHist$breaks,InterHist$breaks),Dist) IntraHist <- hist(intra,breaks=Breaks,plot=F) InterHist <- hist(inter,breaks=Breaks,plot=F) MaxValues <- ifelse(IntraHist$counts > InterHist$counts, IntraHist$counts, InterHist$counts) MinValues <- ifelse(IntraHist$counts < InterHist$counts, IntraHist$counts, InterHist$counts) Area <- sum(MaxValues) (sum(MinValues)/Area)*100 code
 * Genera**: number of genera, **Species**: number of species, **Min**: the minimum number of individuals per species, **Max**: the maximum number of individuals per species, **Median**: the mean number of individuals per species, **Thresh**: how many species have fewer individuals than the threshold (default of 5)
 * 8.** Generate a distance matrix using the Kimura 2-parameter model:
 * 9.** Generate distributions of the furthest intraspacific and the closest interspecific distances. The Following explanation is taken from the Spider tutorial:
 * 10.** Plot density distributions of furthest intraspacific and the closest interspecific distances (see the **[|Spider tutorial]** for an alternative way of representing the barcoding gap) and save plot as a pdf. You can also plot distributions as overlapping histograms using the directions **here**.
 * 1) Open a pdf file
 * 1) Convert the count data associated with inter and intra into density distributions
 * 1) Plot the intraspacific density distribution: xaxt='n' and yaxt='n' remove
 * 2) number values from the x and y axes, main sets the title of the plot,
 * 3) xlab sets the x-axis label.
 * 1) Color the intraspacific density distribution red
 * 1) Add the interspacific density distribution to the same plot
 * 1) Color the interspacific density distribution a transparent yellow
 * 1) Add a legend to the plot: the first argument specifies the location,
 * 2) legend sets the legend text and fill sets the colors for each term in legend
 * 1) Stop writing to the pdf file
 * 11. Optional:** Calculate % overlap between furthest intraspacific and the closest interspecific distance count data distributions.
 * 1) Store histogram data for intra and inter distributions in variables
 * 1) Determine common bins for both histograms
 * 1) Store histogram data with common bins for intra and inter distributions
 * 1) Extract maximum and minimum count values at each bin
 * 1) Calculate total area
 * 1) Calculate percent overlap