In bioinformatics FASTA format is a text-based file format to store and present nucleotide/peptide sequences. Used as preferred method for communicating sequences between projects, this format also allows for sequence names and comments to be included in header section.

While the format was originated from William Pearson’s FASTA software package, in recent years, it has become one of the key file formats used in Bioinformatics. Due to the nature of projects, in many cases, to store information from FASTA file into a RDBMS, it is handy to convert *.fasta into *.csv format.

For my projects, while searching the net I found a number of solutions suggested by research community around the world. Here I am providing a collection  together with my own solution.

My Perl Script

# Executaion as $ perl input.fasta output.csv

use strict; 
use warnings;

open my $INPUT_FILE,  "<", $ARGV[0]     or die $!;  
open my $OUTPUT_FILE, ">", $ARGV[1] or die $!;   # set output file

my $line;     
while ( <$in> ) {                               
    tr/ /,/s;                                   
    if ( /^>/ ) {
        print $out "$line\n" if $line;          
        $line = "$_,";                          
    else {
        $line .= $_;                            
print $out "$line\n";                           

close $INPUT_FILE; 
print "\n Done!\n";

Other Collected Solutions

Tim Rayner’s One Liner Solution from BioStar

perl -e '$/=">"; while(  ) { next if length == 1; @x=split /\n/; printf "$x[0],$x[1]\n" } ' < your_sequence_file.fasta

Aleksandr Levchuk‘s Ruby one Liner from BioStar


ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!; if line =~ /^>/; puts unless first_line; print line[1..-1]; print ","; else; print line; end; first_line = false; end; puts' < s001.fasta

Ruby Script by Aleksander

Save the following as my_ruby_script and then execute as: ./my_ruby_script < f001.fasta > f001.fasta.csv


first_line = true

while line = STDIN.gets

  if line =~ /^>/
    puts unless first_line
    print line[1..-1]
    print ","  # <-- Change this to "\t" and it's a convert-fasta-to-tab
    print line

  first_line = false

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.