DVGfinder_v3
DVGfinder is a tool that integrates the most used DVGs search algorithms, ViReMa-a and DI-tector, in a unique workflow, making the analysis of a sample easier and more intuitive.
Also, DVGfinder_v3 implements a Gradient Boosting classifier to try to reduce the number of false positives introduced by the search algorithms and generates an HTML report with interactive tables and plots that facilitates a first exploration of the results.
Table of Contents
Built with
DVGs search algorithms
- ViReMa-a (0.23):
Routh A, Johnson JE. Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data. Nucleic Acids Res. 2014 Jan;42(2):e11. doi: 10.1093/nar/gkt916. Epub 2013 Oct 16. PMID: 24137010; PMCID: PMC3902915.
- DI-tector_06.py:
Beauclair G, Mura M, Combredet C, Tangy F, Jouvenet N, Komarova AV. DI-tector: defective interfering viral genomes’ detector for next-generation sequencing data. RNA. 2018 Oct;24(10):1285-1296. doi: 10.1261/rna.066910.118. Epub 2018 Jul 16. PMID: 30012569; PMCID: PMC6140465.
Getting Started
To get a local copy up and running follow these simple steps.
Prerequisites
DVGfinder uses the ViReMa-a (v0.23) and DI-tector_06.py programs.
This third party scripts are in the ExternalNeeds directory so you only have to follow the nexts steps to run DVGfinder.
Installation
- Clone the repo in the directory of your choice
git clone https://github.com/MJmaolu/DVGfinder.git
- Go to the DVGfinder directory
cd DVGfinder
- Give execution permission to all the scripts in the DVGfinder directory
chmod -R +x .
- Create a new environment with conda with all the dependencies needed to run DVGfinder
conda env create -f dvgfinder_env.yaml
- Activate DVGfinder environment
conda activate dvgfinder_env
Usage
python3 DVGfinder_v3.py -fq path_to_fastq_file [-r path_to_fasta_virus_reference] [-m margin] [-t threshold] [-n number_processes]
-r
The fasta of the viral reference and its indexed files by bwa
and bowtie
should be all in the path ExternalNeeds/references.
Tutorial
You can explore an example of results in the directory ‘tumvas72_N100K_l100’.
To test the program follow the next steps:
-
Activate the environment
-
Run DVGfinder on the example sample
python3 DVGfinder_v3.py -fq tumvas72_N100K_l100/tumvas72_N100K_l100.fq -r ExternalNeeds/references/TuMV-AS.fasta -t probability_threshold_to_filter_as_realsDVGs -n number_of_process
- Wait and your results will appear in the ‘FinalReports’ directory. In addition, an html report will open in your default browser.
Link to an example HTML report
About the HTML report:
-
The results are displayed at three levels (tabs at the top): ALL, CONSENSUS and FILTERED ML.
-
CONSENSUS and FILTERED ML only appear if both search algorithms have identified DVGs in the sample.
-
The same information displayed in the interactive tables is written in csv files.
Labeled Dataset
The dataset used to generate the classificator is avalaible as ‘630N5Ml100_v2_metrics_labeledDataset.csv’.
For this version of DVGfinder we have used a Gradient Boosting Classifier algorithm to generate the model but I encorage you to play directly with the data and try to improve it.
Reference
Preprint version:
- Olmo-Uceda, M.J.; Muñoz-Sánchez, J.C.; Lasso-Giraldo, W.; Arnau, V.; Díaz-Villanueva, W.; Elena, S.F. DVGfinder: A Metasearch Engine for Identifying Defective Viral Genomes in RNA-Seq Data. Preprints 2022, 2022030110 doi: 10.20944/preprints202203.0110.v1.
Contact
María José Olmo-Uceda - mariajose.olmo@csic.es
PhD student
EvolSysVir Group, I2SysBio (CSIC-UV)
Project Link: https://github.com/MJmaolu/DVGfinder
Page Link: https://mjmaolu.github.io/DVGfinder/
Under Construction
Any suggestions will be welcome :hugs: