This post contains an adapted R script based on
prep_n_run_GOplot.pl from Trinity, the denovo transcriptome assembler, for the times R cannot read in the produced
prep_n_run_GOplot.pl is used to produce
GOplot visualizing differential expression, sorted by GO terms (see http://wencke.github.io/). It uses the
DE.subset.GOseq.enriched files to produce two other files,
EC.genelist used in
In most cases, this script words properly. However, sometimes when the text annotation contains special characters such as ,
EC.genelist do not read properly into R when
Based on the perl script, I adapted a R version,
prep_for_GOplot.R to produce
EC.genelist shown below.
This is a command-line version which will require the
argparse package in order to read in arguments from the command line.
# to make EC. genelist, has columns ID, logFC and adj.P.val # DE.subset equivalents are the row name, logFC and FDR suppressPackageStartupMessages(library(argparse)) parser <- ArgumentParser() parser$add_argument("-s", "--DE_subset", help = "the subset file from the two-way comparison for both", required=T) parser$add_argument("-e", "--DE_go", help = "the enriched go file from the same two-way comparison", required=T) parser$add_argument("--comparison_prefix", help = "the string to add to output files, should be the names of the two-way comparison", required=T) # parser$print_help() args <- parser$parse_args() subset.genelist<-read.table(args$DE_subset, sep = "\t", header=T, quote = "") subset.genelist$gene<-rownames(subset.genelist) make.ec.genelist<-data.frame("ID" = subset.genelist$gene, "logFC" = subset.genelist$logFC, "adj.P.val" = subset.genelist$FDR) write.csv(make.ec.genelist, paste0(args$comparison_prefix,".EC.genelist"),row.names = F) # to make EC.david, has columns Category, ID, Term, GEnes, adj_pval # DE.subset.Goseq.enriched equivalents are , , , , over_represented_FDR subset.david<-read.table(args$DE_go, sep = "\t", header=T, quote = "") ## get all go-terms without any information missing_go <- subset.david$category[is.na(subset.david$term)] print(paste("go terms being skipped due to no information", as.character(missing_go))) subset.david<-subset.david[!is.na(subset.david$term),] make.ec.david<-data.frame("Category" = subset.david$ontology, "ID" = subset.david$category, "Term" =subset.david$term, "Genes" = subset.david$gene_ids, "adj_pval" = subset.david$over_represented_FDR) write.csv(make.ec.david, paste0(args$comparison_prefix, ".EC.david"), row.names=F)
Run the script as
Rscript prep_for_GOplot.R --help to see the help page. The script takes in three arguments,
DE.subset.GOseq.enriched , and a prefix for the file names.
Afte running this script, you will have to run
GOplot manually rather than through
prep_n_run_GOplot.pl. There is example code on the package website and Trinity also has the script
GOplot.Rscript in the directory
In summary, I used the
argparse package to adapt parts of a perl script by porting it into R instead. The script can be run on the command line just like the other Trinity scripts.