Usage¶
To use EMASE in a new project¶
In python scripts, we can load ‘emase’ as a module:
import emase
Or:
from emase import AlignmentMatrixFactory as AMF
from emase import AlignmentPropertyMatrix as APM
from emase import EMfactory
To run EMASE on command line¶
Note: We will assume you installed EMASE in its own conda virtual environment. First of all, you have to “activate” the virtual environment by doing the following:
source activate emase
The first step of the pipeline is to process reference genome:
prepare-emase -G ${REF_GENOME} -g ${REF_GTF} -o ${REF_DIR} -m --no-bowtie-index
‘prepare-emase’ generates the following files for the reference genome:
${REF_DIR}/emase.transcriptome.fa
${REF_DIR}/emase.transcriptome.info <== Used as ${TID_FILE} in the following steps
${REF_DIR}/emase.gene2transcripts.tsv <== Used as ${GROUP_FILE} in the following steps
Then build a pooled transcriptome and prepare required files for EMASE:
create-hybrid -G ${GENOME1},${GENOME2} -g ${GTF1},${GTF2} \
-s ${SUFFIX1},${SUFFIX2} -o ${EMASE_DIR}
Now the following files will be available:
${EMASE_DIR}/emase.pooled.transcriptome.fa
${EMASE_DIR}/emase.pooled.transcriptome.info <== Used as ${TINFO_FILE} in the next steps
${EMASE_DIR}/bowtie.transcriptome.1.ebwt
${EMASE_DIR}/bowtie.transcriptome.2.ebwt
${EMASE_DIR}/bowtie.transcriptome.3.ebwt
${EMASE_DIR}/bowtie.transcriptome.4.ebwt
${EMASE_DIR}/bowtie.transcriptome.rev.1.ebwt
${EMASE_DIR}/bowtie.transcriptome.rev.2.ebwt
RNA-seq reads should be aligned against the pooled transcriptome:
bowtie -q -a --best --strata --sam -v 3 ${EMASE_DIR}/bowtie.transcriptome ${FASTQ} \
| samtools view -bS - > ${BAM_FILE}
Before running EMASE, we need to convert the bam file into the emase format:
bam-to-emase -a ${BAM_FILE} -i ${TID_FILE} -s ${SUFFICE1},${SUFFIX2} -o ${EMASE_FILE}
For paired-end data, perform upto this step with R1 and R2 end independently, and get their common alignments:
get-common-alignments -i ${EMASE_FILE_R1},${EMASE_FILE_R2} -o ${EMASE_FILE}
Finally, to run EMASE:
run-emase -i ${EMASE_FILE} -g ${GROUP_FILE} -L ${TINFO_FILE} -M ${MODEL} -o ${OUTBASE} \
-r ${READLEN} -p ${PSEUDOCOUNT} -m ${MAX_ITERS} -t ${TOLERANCE}
‘run-emase’ outputs the following files:
${OUTBASE}.isoforms.expected_read_counts
${OUTBASE}.isoforms.tpm
${OUTBASE}.genes.expected_read_counts
${OUTBASE}.genes.tpm