Editing Text with Text Wrangler

Here are two videos from Matthew Kweskin (Smithsonian Institution) that show some more advanced usage of Text Wrangler on a Mac. If your files are small enough to be opened in Text Wrangler you have the option to do some advanced search and replace using regular expressions in Text Wrangler. Skip on your favorite show tonight and watch this instead. I promise that you will be more productive if you can use regular expressions to edit text files.

Part 1: The Basics
Part 2: Advanced Search and Replace

Editing large Text Files in the Shell

You can turn any textfile or set of textfiles into a stream and dump it onto the screen using the command cat. This stream can be redirected (see I/O Redirection) and edited using the tools I will introduce here. Sometimes files are too big to be opened and manipulated in a text editor. For this you can use the stream editor sed. The stream editor will handle files of any size. Just use it in a pipe to direct a stream into sed and save the output in a text file. I mainly use the substitution function s. Regular expressions, as demonstrated in the videos above, can be used for powerful search and replace operations. To use regular expressions in Unix you will also want to learn about proper quoting. The last tool that is more of an ancient programming language rather than a mere tool is awk. Combine all of these ingredients, or parts thereof, and you can edit large text files (think sequence data) in the shell on the fly.

Example using sed:
cat foo.txt | sed 's/cat/dog/g' > foonew.txt
## replaces all instance of the word cat with dog in
## foo.txt and creates the edited document foonew.txt
sed 's/cat/dog/g' foo.txt > foonew.txt
## same as above but without invoking the command cat
## sed takes the file foo.txt as an argument
cat *.txt | sed 's/cat/dog/g' >> foonew.txt
## takes all text files in the directory and replaces all
## instances of cat with dog - the output is the combined
## and edited text file foonew.txt
I posted an example for search and replace using regular expressions in sed here, and searching for and replacing special characters like line-breaks and tabs with sed and perl here. Links to awk references can be found here and a script to combine fasta files in the shell using awk can be found here. Hopefully this will give you some ideas of what you can do with the tools I introduced.

What else can you do with regular expressions?

You can use regular expressions in a variety of tools. For example, grep and egrep can match patterns using regular expressions to find and extract lines from files (see the Unix commands page). Oh, and Perl and Python support regular expressions...

external image regular_expressions.png
Courtesy of XKCD.