Skip to the content.


Logo

DVGfinder_v3

DVGfinder is a tool that integrates the most used DVGs search algorithms, ViReMa-a and DI-tector, in a unique workflow, making the analysis of a sample easier and more intuitive. Also, DVGfinder_v3 implements a Gradient Boosting classifier to try to reduce the number of false positives introduced by the search algorithms and generates an HTML report with interactive tables and plots that facilitates a first exploration of the results.

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Tutorial
  5. Labeled Dataset
  6. Reference
  7. Contact

Built with

DVGs search algorithms

Routh A, Johnson JE. Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data. Nucleic Acids Res. 2014 Jan;42(2):e11. doi: 10.1093/nar/gkt916. Epub 2013 Oct 16. PMID: 24137010; PMCID: PMC3902915.

Beauclair G, Mura M, Combredet C, Tangy F, Jouvenet N, Komarova AV. DI-tector: defective interfering viral genomes’ detector for next-generation sequencing data. RNA. 2018 Oct;24(10):1285-1296. doi: 10.1261/rna.066910.118. Epub 2018 Jul 16. PMID: 30012569; PMCID: PMC6140465.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

DVGfinder uses the ViReMa-a (v0.23) and DI-tector_06.py programs.

This third party scripts are in the ExternalNeeds directory so you only have to follow the nexts steps to run DVGfinder.

Installation

  1. Clone the repo in the directory of your choice
    git clone https://github.com/MJmaolu/DVGfinder.git
    
  2. Go to the DVGfinder directory
    cd DVGfinder
    
  3. Give execution permission to all the scripts in the DVGfinder directory
    chmod -R +x .
    
  4. Create a new environment with conda with all the dependencies needed to run DVGfinder
    conda env create -f dvgfinder_env.yaml
    
  5. Activate DVGfinder environment
    conda activate dvgfinder_env
    

Usage

python3 DVGfinder_v3.py -fq path_to_fastq_file [-r path_to_fasta_virus_reference] [-m margin] [-t threshold] [-n number_processes]

-r The fasta of the viral reference and its indexed files by bwa and bowtie should be all in the path ExternalNeeds/references.

Tutorial

You can explore an example of results in the directory ‘tumvas72_N100K_l100’.

To test the program follow the next steps:

  1. Activate the environment

  2. Run DVGfinder on the example sample

python3 DVGfinder_v3.py -fq tumvas72_N100K_l100/tumvas72_N100K_l100.fq -r ExternalNeeds/references/TuMV-AS.fasta -t probability_threshold_to_filter_as_realsDVGs -n number_of_process
  1. Wait and your results will appear in the ‘FinalReports’ directory. In addition, an html report will open in your default browser.

Link to an example HTML report

About the HTML report:

Labeled Dataset

The dataset used to generate the classificator is avalaible as ‘630N5Ml100_v2_metrics_labeledDataset.csv’.

For this version of DVGfinder we have used a Gradient Boosting Classifier algorithm to generate the model but I encorage you to play directly with the data and try to improve it.

Reference

Preprint version:

Contact

María José Olmo-Uceda - mariajose.olmo@csic.es

PhD student

EvolSysVir Group, I2SysBio (CSIC-UV)


Project Link: https://github.com/MJmaolu/DVGfinder

Page Link: https://mjmaolu.github.io/DVGfinder/

Under Construction

Any suggestions will be welcome :hugs: