Building a Diamond Db using Refseq Protein

The idea is to build a Diamond database using Refseq protein (non redundant) and later compare it to blast. First thing to do is to download fasta files - nonredundant only. wget -c ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/complete.nonredundant_protein*.faa.gz At the moment, there is ~928 fasta files taking ~26 Gb of disk space. Also, I am interested in having taxonomic … Continue reading Building a Diamond Db using Refseq Protein