ETE3 – Taxonomic queries with ncbiquery

Recently, working with Ensembl Metazoa – Release 47, I found myself in need of querying taxonomic information about a few species. Googling around I have stumbled in a Biostars post and have decided to give ETE3 ncbiquery a try.

I have installed ETE3 version 3.1.1. The ncbiquery options are listed below:

ete3 ncbiquery --help

usage: ete3 ncbiquery [-h] [-o OUTPUT] [-v {0,1,2,3,4}] [--search SEARCH [SEARCH ...]] [--db DBFILE] [--taxdump_file TAXDUMPFILE] [--create] [--fuzzy FUZZY] [--tree] [--descendants] [--info] [--collapse_subspecies]
                      [--rank_limit RANK_LIMIT] [--full_lineage]

optional arguments:
  -h, --help            show this help message and exit

GENERAL OPTIONS:
  -o OUTPUT             Base output file name
  -v {0,1,2,3,4}        Verbosity level: 0=totally quite, 1=errors only, 2=warning+errors, 3=info+warnings+errors 4=debug

NCBI GENERAL OPTIONS:
  --search SEARCH [SEARCH ...]
                        A list of taxid or species names
  --db DBFILE           NCBI sqlite3 db file.
  --taxdump_file TAXDUMPFILE
                        Use local NCBI taxdump file instead of downloading from NCBI.
  --create              Create taxdump file and exit.
  --fuzzy FUZZY         EXPERIMENTAL: Tries a fuzzy (and SLOW) search for those species names that could not be translated into taxids. A float number must be provided indicating the minimum string similarity. Special sqlite
                        compilation is necessary.

NCBI OUTPUT OPTIONS:
  --tree                dump a pruned version of the NCBI taxonomy tree containing target species
  --descendants         dump the descendant taxa for each of the queries
  --info                dump NCBI taxonmy information for each target species into the specified file.
  --collapse_subspecies
                        When used, all nodes under the the species rank are collapsed, so all species and subspecies are seen as sister nodes
  --rank_limit RANK_LIMIT
                        When used, all nodes under the provided rank are discarded
  --full_lineage        When used, topology is not pruned to avoid one-child-nodes, so the complete lineage track leading from root to tips is kept.

If you execute ete3 ncbiquery alone, it will fetch taxdump.tar.gz and generate a sqlite database:

ete3 ncbiquery

NCBI database not present yet (first time used?)
Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Loading node names...
2260364 names loaded.
220237 synonyms loaded.
Loading nodes...
2260364 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/foo/.etetoolkit/taxa.sqlite ...
 2260000 generating entries... 
Uploading to /home/foo/.etetoolkit/taxa.sqlite

Inserting synonyms:      220000 
Inserting taxid merges:  55000 
Inserting taxids:       2260000 

# and the size
du -hs ~/.etetoolkit/

# 528M    /home/foo/.etetoolkit/

Now lets give it a try:

 ete3 ncbiquery --search 'homo sapiens' --info

# Taxid Sci.Name        Rank    Named Lineage   Taxid Lineage
9606    Homo sapiens    species root,cellular 
organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Primates,Haplorrhini,Simiiformes,Catarrhini,Hominoidea,Hominidae,Homininae,Homo,Homo sapiens        1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9605,9606

asking for –info will provide you with Taxid, Sci. Name, Rank, Named Lineage and Taxid Lineage.

Another output type is –tree:

 ete3 ncbiquery --search 'homo sapiens' --tree

(Homo sapiens neanderthalensis - 63221:1[&&NHX:taxid=63221:name=Homo sapiens neanderthalensis - 63221:rank=subspecies:sci_name=Homo sapiens neanderthalensis:named_lineage=root|cellular organisms|Eukaryota|Opisthokonta|Metazoa|Eumetazoa|Bilateria|Deuterostomia|Chordata|Craniata|Vertebrata|Gnathostomata|Teleostomi|Euteleostomi|Sarcopterygii|Dipnotetrapodomorpha|Tetrapoda|Amniota|Mammalia|Theria|Eutheria|Boreoeutheria|Euarchontoglires|Primates|Haplorrhini|Simiiformes|Catarrhini|Hominoidea|Hominidae|Homininae|Homo|Homo sapiens|Homo sapiens neanderthalensis],Homo sapiens subsp. 'Denisova' - 741158:1[&&NHX:taxid=741158:name=Homo sapiens subsp. 'Denisova' - 741158:rank=subspecies:sci_name=Homo sapiens subsp. 'Denisova':named_lineage=root|cellular organisms|Eukaryota|Opisthokonta|Metazoa|Eumetazoa|Bilateria|Deuterostomia|Chordata|Craniata|Vertebrata|Gnathostomata|Teleostomi|Euteleostomi|Sarcopterygii|Dipnotetrapodomorpha|Tetrapoda|Amniota|Mammalia|Theria|Eutheria|Boreoeutheria|Euarchontoglires|Primates|Haplorrhini|Simiiformes|Catarrhini|Hominoidea|Hominidae|Homininae|Homo|Homo sapiens|Homo sapiens subsp. 'Denisova']);

You can print a tree to make it a bit easier to check (using ete3 view –ncbi –text):

ete3 ncbiquery --search 'homo sapiens' --tree| ete3 view --ncbi --text

   /-Homo sapiens neanderthalensis - 63221
--|
   \-Homo sapiens subsp. 'Denisova' - 741158

Another possibility is to check on the descendants:

ete3 ncbiquery --search 'homo sapiens' --descendants

# Taxid Sci.Name        Rank    descendant_taxids       descendant_names
9606    Homo sapiens    species 63221|741158    Homo sapiens neanderthalensis|Homo sapiens subsp. 'Denisova'

You can search for multiple species and get a tree:

ete3 ncbiquery --search 'homo sapiens' 'pan troglodytes' 'mus musculus' 'ciona intestinalis'|ete3 view --ncbi --text

   /-Ciona intestinalis - 7719
  |
--|      /-Homo sapiens - 9606
  |   /-|
   \-|   \-Pan troglodytes - 9598
     |
      \-Mus musculus - 10090

In the same way, you can fetch Named lineage for all specified species:

ete3 ncbiquery --search 'homo sapiens' 'pan troglodytes' 'mus musculus' 'ciona intestinalis' --info
# Taxid Sci.Name        Rank    Named Lineage   Taxid Lineage

7719    Ciona intestinalis      species root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Tunicata,Ascidiacea,Enterogona,Phlebobranchia,Cionidae,Ciona,Ciona intestinalis       1,131567,2759,33154,33208,6072,33213,33511,7711,7712,7713,183770,7716,7717,7718,7719

9598    Pan troglodytes species root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Primates,Haplorrhini,Simiiformes,Catarrhini,Hominoidea,Hominidae,Homininae,Pan,Pan troglodytes      1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9596,9598

9606    Homo sapiens    species root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Primates,Haplorrhini,Simiiformes,Catarrhini,Hominoidea,Hominidae,Homininae,Homo,Homo sapiens        1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9605,9606

10090   Mus musculus    species root,cellular organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Glires,Rodentia,Myomorpha,Muroidea,Muridae,Murinae,Mus,Mus,Mus musculus     1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,314147,9989,1963758,337687,10066,39107,10088,862507,10090

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s