Pandas : pipe – Tablewise function application

pipe is designed to help chaining function calls on DataFrames and Series.

As showcase, lets grab ensembl genomes table and play with that. First, import libraries:

# import libraries

import pandas as pd
import numpy as np
# show full columns
pd.set_option('display.max_colwidth', None)


Now, get ensembl genomes table:

# get ensembl genomes table

colnames = ["name","species","division","taxonomy_id","assembly","assembly_accession","genebuild","variation","pan_compara","peptide_compara","genome_alignments","other_alignments","core_db","species_id",""]

df = pd.read_csv("",skiprows=1,sep="\t",names = colnames)


Lets define a few functions to work with the table:

# filter for EnsemblMetazoa only and get 'species' and 'taxonomy_id' columns
def clean_table(df):
    df = df[df['division']=="EnsemblMetazoa"]
    df = df[['species','taxonomy_id']]

# add a column 'short_name' that displays i.e. H.sapiens for homo_sapiens
def get_shorten_specie_name(df):
    name1 = df['species'].str.replace("^_","").str[0:1].str.upper()
    name2 = df['species'].str.replace("^_","").str.split("_").str[1]
    df['short_name'] = name1+"."+name2

# add https link to NCBI taxonomy database
def add_taxon_id_link(df):
    df['taxonomy_id_link'] = ""+df['taxonomy_id'].astype(str)

And now lets use pipe to chain clean_table, get_shorten_specie_name, add_taxon_id_link:


And the first 4 lines of the resulting table:


