European Nucleotide Archive (ENA) and REST – retrieving NGS data links.

From time to time I get a list of ids such as:


along with a request for analysis of the data.

Those ids are the so called “Run accession” ids. They usually point out to NGS data that have been deposited in public databases such as SRA, ENA and others.

I prefer to download this kind of public data from ENA because it is faster than SRA (geographic reasons!) .

One way of doing this is individual searches per “Run accession” ids and then get the link to download the raw data. Depending on the number of ids, individual searches can be impractical.

EMBL-EBI offers a very nice way to access ENA database content (and other databases as well) programmatically – via REST.

The command I use relays on “file reports” and looks like:

curl -X GET ""

The output of this command is (converted to a table for better visualization):

run_accessionfastq_ftpfastq_bytesfastq_md5submitted_ftpsubmitted_bytessubmitted_md5sra_ftpsra_bytessra_md5;; 1634483165210ef5979c83bec96de9634406d2d885;;; 1474798735fb44f5b31fcb498dba0ffc72507008dc;

In this example, we can check that the download link is in the table field “fastq_ftp“. Also, the md5 for each file is provided and after the download, files can be checked for data integrity.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s