Parameters Optimization with Nextflow

Recently I have used Nextflow to try different parameters for a tool and found it worth sharing. We start by defining parameters: param1 = [50,70,90] param2 = [1,2,3,4,5] param3 = [0.1,0.2,0.3] Then we declare input, output and commands to run: in_data = Channel.fromPath('test') process test { input: path indata from in_data each p1 from param1 … Continue reading Parameters Optimization with Nextflow

Fasterq-dump and Snakemake

A note on how to automatize public datasets fetch from NCBI using SRA toolkit and Snakemake. Here we use a config, a rule and a conda environment file. First, the Snakefile: configfile: "include/rules/config.yaml" include: "include/rules/fasterqdump.rule" rule all: input: expand("01_raw/done__{srr}_dump", srr=config['srr']) the configfile (include/rules/config.yaml) srr: - SRR12345678 and the rule file (include/rules/fasterqdump.rule): rule prefetch: output: "01_raw/.prefetch/sra/{srr}.sra" … Continue reading Fasterq-dump and Snakemake

InterProScan and Snakemake

Following up with a previous post "InterProScan and Docker", here a quick note on a InterProScan using Snakemake. The commented Snakefile: # Input fasta files with proteins sequences should be at lib/foobar.pep PEPS, = glob_wildcards("lib/{pep}.pep") # configfile path configfile: "include/rules/config.yaml" rule all: input: expand("02_interproscan/{pep}.tsv",pep=PEPS) # get/install interproscan rule install_interproscan: #https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html input: output: touch("02_interproscan/done__install_interproscan") params: "temp/" … Continue reading InterProScan and Snakemake

Download Project from Basespace

Quick note on how to download data from Basespace. For Linux users, you cannot bulk download files using web interface. The alternative is to use BaseSpace Sequence Hub CLI. First, fetch "bs" application by using: wget "https://api.bintray.com/content/basespace/BaseSpaceCLI-EarlyAccess-BIN/latest/\$latest/amd64-linux/bs?bt_package=latest" -O bs Fix permissions: chmod +X ./bs chmod 755 ./bs Then, authenticate at Basespace: ./bs auth This command … Continue reading Download Project from Basespace

Conda environment and projects

A very important aspect of reproducible bioinformatics is to manage software, tools and environment properly. One interesting alternative for such difficult task is Conda. As stated in Conda's website: "Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux." . I have been using conda environments … Continue reading Conda environment and projects

Galaxy – Install tools from Workflow using Ephemeris

In order to run a workflow, you need proper tools and versions. With the help of Ephemeris one can achieve it easily. To install ephemeris: pip3 install ephemeris --user Then, get a YAML out of a give workflow: workflow-to-tools -w galaxy_workflow.ga -o galaxy_workflow.ga.yaml Get one API key for you @ admin session in Galaxy and: … Continue reading Galaxy – Install tools from Workflow using Ephemeris

JupyterHub – Install

JupyterHub brings the power of notebooks to groups of users. This is a quick note on installing JupyterHub using Debian 10. # get venv sudo apt-get install python3-venv # start venv sudo python3 -m venv /opt/jupyterhub/ # install packages sudo /opt/jupyterhub/bin/python3 -m pip install wheel sudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlab sudo /opt/jupyterhub/bin/python3 -m … Continue reading JupyterHub – Install

Building a Diamond Db using Refseq Protein

The idea is to build a Diamond database using Refseq protein (non redundant) and later compare it to blast. First thing to do is to download fasta files - nonredundant only. wget -c ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/complete.nonredundant_protein*.faa.gz At the moment, there is ~928 fasta files taking ~26 Gb of disk space. Also, I am interested in having taxonomic … Continue reading Building a Diamond Db using Refseq Protein