InterProScan and Snakemake

Following up with a previous post “InterProScan and Docker“, here a quick note on a InterProScan using Snakemake.

The commented Snakefile:

# Input fasta files with proteins sequences should be at lib/foobar.pep
PEPS, = glob_wildcards("lib/{pep}.pep")

# configfile path
configfile: "include/rules/config.yaml"

rule all:
    input:
        expand("02_interproscan/{pep}.tsv",pep=PEPS)

# get/install interproscan
rule install_interproscan:
    #https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html
    input:
    output:
        touch("02_interproscan/done__install_interproscan")
    params:
        "temp/"
    threads: 1
    log:
        "02_interproscan/logs/log__install_interproscan"
    shell:
        """
        wget -c ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.48-83.0/interproscan-5.48-83.0-64-bit.tar.gz -P {params} > {log} 2>&1 &&
        wget -c ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.48-83.0/interproscan-5.48-83.0-64-bit.tar.gz.md5 -P {params} >> {log} 2>&1 &&
        # Recommended checksum to confirm the download was successful:
        cd {params} && 
        md5sum -c interproscan-5.48-83.0-64-bit.tar.gz.md5 >> ../02_interproscan/log__install_interproscan 2>&1 &&
        tar -pxvzf interproscan-5.48-83.0-*-bit.tar.gz >> ../02_interproscan/log__install_interproscan 2>&1
        """

# setup interproscan
rule setup_interproscan:
    input:
        rules.install_interproscan.output
    output:
        touch("02_interproscan/done__setup_interproscan")
    threads: 1
    log:
        "02_interproscan/logs/log__setup_interproscan"
    shell:
        """
        cd temp/interproscan-5.48-83.0 && 
        python initial_setup.py > ../../{log} 2>&1
        """

# InterProScan does not accept stars in the sequence
rule cleanup_star_from_pep:
    input:
        "lib/{pep}.pep"
    output:
        "01_raw/{pep}.pep"
    threads: 1
    shell:
        """
        cat {input}|sed 's/*/X/g' > {output}
        """

rule run_interproscan:
    input:
        out = rules.setup_interproscan.output,
        pep = "01_raw/{pep}.pep"
    output:
        "02_interproscan/{pep}.tsv"
    params:
        "-f tsv -dp --tempdir temp/ --goterms --pathways --appl Pfam"
    threads: 30
    log:
        "02_interproscan/logs/log__{pep}_interproscan"
    shell:
        """
        temp/interproscan-5.48-83.0/interproscan.sh \
        {params} \
        --minsize {config[minsize_interproscan]} \
        -i {input.pep} \
        -o {output} \
        --cpu {threads} > {log} 2>&1
        """

It is necessary to set proper threads, –goterms, –pathways and appl that here it is set only to Pfam. For more appl possibilities, please check InterProScan help page. Also, minsize or ms parameter can be set at configfile.

The configfile (inside include/rules/ directory):

minsize_interproscan: 100

To dry-run:

snakemake --cores 30 -n

and to run:

snakemake --cores 30

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s