Fasterq-dump and Snakemake

A note on how to automatize public datasets fetch from NCBI using SRA toolkit and Snakemake.

Here we use a config, a rule and a conda environment file.

First, the Snakefile:

configfile: "include/rules/config.yaml"

include: "include/rules/fasterqdump.rule"

rule all:
    input:
        expand("01_raw/done__{srr}_dump", srr=config['srr'])

the configfile (include/rules/config.yaml)

srr:
  - SRR12345678

and the rule file (include/rules/fasterqdump.rule):

rule prefetch:
    output:
        "01_raw/.prefetch/sra/{srr}.sra"
    params:
        "{srr} --max-size 50GB -O 01_raw"
    log:
        "01_raw/.prefetch/sra/{srr}.log"
    conda:
        "yamls/sra-tools.yaml"
    shell:
        """
        prefetch {params} > {log} 2>&1 && touch {output}
        """

rule fastqdump:
    input:
        "01_raw/.prefetch/sra/{srr}.sra"
    output:
        touch("01_raw/done__{srr}_dump")
    params:
        args = "-S -O 01_raw/ -t 01_raw/",
        id_srr = "{srr}"
    log:
        "01_raw/{srr}.log"
    conda:
        "yamls/sra-tools.yaml"
    shell:
        """
        fasterq-dump {params.args} {params.id_srr} > {log} 2>&1 
        """

and the conda environment file (include/rules/yamls/sra-tools.yaml):

channels:
 - bioconda
dependencies:
 - sra-tools=2.10.9

to test:

snakemake -p -j 30 --use-conda -n

to run:

snakemake -p -j 30 --use-conda

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s