Bellow are some examples of simple scripting solutions to commonly encountered file manipulation problems.

Fastq to Fasta File | Subset File | Replace Mac Line Breaks | Remove Line Breaks from Sequences

Fastq to Fasta File

To convert from a fastq to a fasta file you will need to remove the quality header and quality score line and replace the @ preceding the sequence header with a >.

Fastq File:
@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT
+
BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@

Fasta File:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT

Python:
myFastq = open('myfile.fastq', 'r') #open fastq file for reading
myFasta = open('myfile.fasta', 'w') #open fasta file for writing
 
while 1: #initiate infinite loop
    #read 4 lines of the fasta file
    SequenceHeader= myFastq.readline()
    Sequence= myFastq.readline()
    QualityHeader= myFastq.readline()
    Quality= myFastq.readline()
    if SequenceHeader == '': #exit loop when end of file is reached
        break
    #write output
    myFasta.write('>%s%s' %(SequenceHeader.strip('@'), Sequence))
 
#close files
myFastq.close()
myFasta.close()

Bash:
#grep for all sequence header lines and following line (-a 1) in your fastq file. Delete separator ('--') introduced
#by grep search. Replace @ with >. The '|' character pipes the output from the previous command into the following
#command. The grep search relies on the 'EAS' being common the all sequence headers in your fastq file.
 
grep -A 1 '@EAS' myfilefastq | sed '/--/d' | sed 's/@/>/' > myfile.fasta

Subset File

Here is an example for sub-setting a fastq file containing 1000 sequences into 10 fastq files containing 100 sequences each.

Bash:
#Loop over the range of files you need to generate (1000/100 = 10).
#Create a variable j that keeps track of how many lines you have processed.
#Pipe (|) the top j lines (head -n) from you file to the tail command to grab the last 100 lines (tail -n 100).
#Redirect (>>) the lines grabbed by tail into a new file.
 
for((i=1; i<=10; i=i+1)); do j=$[$i*100]; head -n $j myfile.fastq | tail -n 100 >> new_$i.fastq; done

Replace Mac Line Breaks


Bash:
cat yourfile | tr '\r' '\n'

Remove Line Breaks from Sequences

Here's how to get a sequence with line breaks onto the same line.

Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCAT
CATCATCATCATCAT
CATCAT

No Line Breaks:
>EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
CATCATCATCATCATCATCATCATCATCATCATCAT

This approach will work for fasta files and will require some modification for fastq files.

Python:
myFasta = open('myfile.fasta','r') #open fasta file for reading
NewFile = open('sameline.fasta','w') #open new fasta file for writing
 
line = myFasta.readline() #read first line in fasta file
 
while line: #loop over lines in fasta file
    NewFile.write(line) #write header line to new file
    sequenceList = [] #initiate empty list for storing sequence lines
    line = myFasta.readline() #read next line from fasta file
    while line and not line.startswith('>'): #loop over sequence lines
        sequenceList.append(line.strip('\n')) #strip line break from line and append to sequenceList
        line = myFasta.readline() #read next line from fasta file
    NewFile.write('%s\n' % ''.join(sequenceList)) #write sequence to new file
 
#close files
myFasta.close()
NewFile.close()