FASTA-2-CSV

In bioinformatics FASTA format is a text-based file format to store and present nucleotide/peptide sequences. Used as preferred method for communicating sequences between projects, this format also allows for sequence names and comments to be included in header section.

While the format was originated from William Pearson’s FASTA software package, in recent years, it has become one of the key file formats used in Bioinformatics. Due to the nature of projects, in many cases, to store information from FASTA file into a RDBMS, it is handy to convert *.fasta into *.csv format.

For my projects, while searching the net I found a number of solutions suggested by research community around the world. Here I am providing a collection  together with my own solution.

My Perl Script

#!/usr/local/bin/perl
#
# Executaion as $ perl file.pl input.fasta output.csv
#

use strict; 
use warnings;

open my $INPUT_FILE,  "<", $ARGV[0]     or die $!;  
open my $OUTPUT_FILE, ">", $ARGV[1] or die $!;   # set output file

my $line;     
while ( <$in> ) {                               
    chomp;                                      
    tr/ /,/s;                                   
    if ( /^>/ ) {
        print $out "$line\n" if $line;          
        $line = "$_,";                          
        }                                       
    else {
        $line .= $_;                            
        }                                       
    }           
print $out "$line\n";                           

close $INPUT_FILE; 
close $OUTPUT_FILE;
print "\n Done!\n";

Other Collected Solutions

Tim Rayner’s One Liner Solution from BioStar

perl -e '$/=">"; while(  ) { next if length == 1; @x=split /\n/; printf "$x[0],$x[1]\n" } ' < your_sequence_file.fasta

Aleksandr Levchuk‘s Ruby one Liner from BioStar

 

ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!; if line =~ /^>/; puts unless first_line; print line[1..-1]; print ","; else; print line; end; first_line = false; end; puts' < s001.fasta

Ruby Script by Aleksander

Save the following as my_ruby_script and then execute as: ./my_ruby_script < f001.fasta > f001.fasta.csv

#!/usr/bin/ruby

first_line = true

while line = STDIN.gets
  line.chomp!

  if line =~ /^>/
    puts unless first_line
    print line[1..-1]
    print ","  # <-- Change this to "\t" and it's a convert-fasta-to-tab
  else
    print line
  end

  first_line = false
end
puts

Leave a Reply

Your email address will not be published. Required fields are marked *