Load required R packages
Sys.setenv(JAVA_HOME='C:/Program Files/Java/jdk-9.0.1') # for 64-bit version
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
library(animation)
library(readr)
library(biomaRt)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following object is masked from 'package:biomaRt':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(pubmed.mineR)
Load the gene of interest
gene<-"BRCA1"
Create a excel workbook project with xlsx package
wb<-createWorkbook()
## create a bold font for column names of data tables
cs<-CellStyle(wb) + Font(wb, isBold = TRUE) + Border() + Alignment(h="ALIGN_CENTER")
Create a excel sheet within the workbook, we will name this sheet “Gene Info”, because we will put general gene info in this first “cover sheet”
sheet0<-createSheet(wb, sheetName = "Gene Info")
Now, we will use the biomaRt package to retrive basic gene info, such as genome coordinates
mart<-useMart("ensembl", dataset="hsapiens_gene_ensembl")
a<-mart@attributes$name
##the info we retrieve with biomaRt is fully customization, see biomaRt package manual for a full list of more than 800 gene features available
a.useful<-c(a[1], a[9:12], a[61], a[97])
gene.info<-getBM(a.useful, a[60], gene, mart)
Now we load our gene info onto the excel sheet we created
addDataFrame(gene.info, sheet0, colnamesStyle = cs)
On the same work sheet, we can add additional info, such as phenotype info
gene.phenotype<-getBM("phenotype_description", a[60], gene, mart)
addDataFrame(gene.phenotype, sheet0, startRow = 6, colnamesStyle = cs)
Let say this is all we are going to put on our first sheet. We are going to create a new work sheet for a new analysis, within the same workbook
sheet1<-createSheet(wb, sheetName = "Expression")
##Generate an expression figure according to stage
download.file("https://github.com/hackseq/2017_project_5/raw/master/data/exprData.csv.gz", "exprData.csv.gz")
exprData<-read.csv("exprData.csv.gz")
expr.gene<-subset(exprData, exprData$GeneId==gene)
expr.gene.graph<-as.data.frame(t(expr.gene))
expr.gene.graph<-mutate(expr.gene.graph, GENE=as.character(expr.gene.graph[1,1]))
expr.gene.graph<-expr.gene.graph[2:nrow(expr.gene.graph),]
We can save the expression table in worksheet1
addDataFrame(expr.gene.graph, sheet1, startRow = 1, colnamesStyle = cs)
Here our expression analysis can be virsualized with a figure
##Make an expression figure
gene.boxplot<-ggplot(expr.gene.graph, aes(x=GENE, y=as.numeric(as.character(expr.gene.graph$GENE)))) +
geom_boxplot() +
labs(x = "Gene",
y = "Expression (Log2 transformed)",
title = paste0(("Expression of "),gene))+
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(size=6))
One way of storing the figure in our work sheet is to save it to an external file first, then import the figure to our data Here, we are saving our expression figure on worksheet1, to avoid overlapping with our data table we load on this sheet earlier, we will load it onto column 6
ggsave(gene.boxplot, filename="Expression.figure.png")
## Saving 7 x 5 in image
## Warning in fun(x, ...): NAs introduits lors de la conversion automatique
## Warning in FUN(X[[i]], ...): NAs introduits lors de la conversion
## automatique
## Warning: Removed 350 rows containing non-finite values (stat_boxplot).
addPicture("Expression.figure.png", sheet1, scale=1, startRow=1, startColumn=6)
We can store unlimited number of data sheets or figure in a single workbook, on multiple worksheets. Create as many new sheets as needed and load the results data table or figure onto them.
sheet2<-createSheet(wb, sheetName = "CNV")
##import your CNV data results as table
CNV.info2<-"CNV.data"
addDataFrame(CNV.info2, sheet2, colnamesStyle = cs)
Sometimes, functions from packages will save images as .pdf as default format and .pdf can not be directly imported onto our work sheet. In such case, we can use the function from ‘animation’ package to convert pdf into .png file
im.convert("survival.pdf", output = "survival.png",extra.opts="-density 150")
addPicture("survival.png", sheet2, scale=0.5, startRow=12, startColumn=6)
Some functions do not have a built-in file output. We can use R default device image capture to store images
dev.off()
##some script making a figure to R imaging device
dev.copy2pdf()
file.rename("Rplot.pdf")
im.convert("Rplot.pdf", output = "Rplot.png",extra.opts="-density 150")
addPicture("Rplot.png", sheet2, scale=0.5, startRow=12, startColumn=6)
We can also store text results, such as text-mining results to our workbook Example of storing results of PubMed text-mining results on a worksheet
sheet3<-createSheet(wb, sheetName = "PubMed")
##grab abstracts for gene
##compile a abs database regarding "1000 genome breast cancer" search in pub-med
ABSdata<-readabs("pubmed_result_1000genomescancerbreast.txt")
## Warning in initialize(value, ...): NAs introduits lors de la conversion
## automatique
gene.abs<-getabs(ABSdata, gene, FALSE)
## [1] "14 abstracts BRCA1"
gene.PMID<-get_PMCIDS(gene.abs)
##extract sentence from abs for gene
gene.sentences<-Give_Sentences(gene, gene.abs)
addDataFrame(gene.PMID, sheet3, colnamesStyle = cs)
addDataFrame(unlist(gene.sentences), sheet3, startRow = 6, colnamesStyle = cs)
After all results are loaded onto our multi-sheet workbook, we can save our workbook as an excel sheet
saveWorkbook(wb, file = paste0("ALL_RESULTS.xlsx"))