Commit ad1eeb2e authored by Theo Serralta's avatar Theo Serralta

Add new README

parent 803c9cd2
# Genomic CNV and SV Detection with GPU Acceleration
This project performs copy number variation (CNV) and structural variant (SV) detection on genomic data, leveraging
GPU acceleration to enhance performance for large datasets. It includes calculations of mappability, GC content, depth,
and normalization, followed by variant detection and result output in various formats, including VCF for SVs.
## Features
- **Copy Number Variation (CNV) Analysis**: Analyzes depth of coverage across genomic windows to detect CNVs.
- **Structural Variant (SV) Detection**: Identifies SVs (e.g., deletions, inversions, translocations) using paired-end
and split-read alignments.
- **GPU Acceleration**: Utilizes CUDA-enabled GPU processing to improve the efficiency of mappability, GC content,
depth, and normalization calculations.
- **Customizable Parameters**: Adjustable settings for window size, step size, and z-score thresholds.
### Author:
**SERRALTA Theo**
### Collaborators:
**DUFFOURD Yannis**
### Laboratory:
**GAD**
### Date:
**28/09/2023**
## Installation
1. Ensure you have Python and CUDA installed.
2. Install the necessary Python packages:
```bash
pip install numpy pysam pycuda pandas
```
3. Clone this repository.
## Usage
### Platform
Currently, this software is designed to run exclusively on the CCUB (Computing Center of the University of Burgundy).
### Directory
Navigate to the directory:
```bash
cd /work/gad/shared/analyse/test/cnvGPU/test_scalability/
```
### Recommended Execution with qsub
Run the following command to execute using qsub:
```bash
qsub -v NUM_CHR=<ALL_or_num_chr>,INPUTFILE=</path/to/the/input/bam/file>,LOGFILE=</path/to/the/log/file>,OUTPUT=</path/to/the/output/file>,OUTPUT_PAIRS=</path/to/the/output_pairs/file>,OUTPUT_SPLITS=</path/to/the/output_splits/file> ./wrapper_cnvGPU.sh
```
Example:
```bash
qsub -pe smp 1 -v NUM_CHR=ALL,INPUTFILE=/work/gad/shared/analyse/test/cnvGPU/test_scalability/dijen1000.bam,OUTPUT=exemple.out.tsv,OUTPUT_PAIRS=exemple.out_pairs.tsv,OUTPUT_SPLITS=exemple.out_splits.tsv,LOGFILE=exemple.log ./wrappers/wrapper_cnvGPU.sh
```
### Modifying Parameters
Certain parameters can be customized within the wrapper script:
- `window_size` (w): Default is `-w 100`
- `step_size` (s): Default is `-s 10`
- `zscore_threshold` (z): Default is `-z 1.5`
- `lengthFilter` (l): Default is `-l 200`
### Direct Execution without Wrapper
Alternatively, you can execute the program directly with Singularity:
```bash
singularity exec --nv -e /work/gad/shared/bin/singularity_images/pycuda/pycuda_sam.1.1.sif python3 /work/gad/shared/analyse/test/cnvGPU/test_scalability/cnv_sv_caller_gpu.py -b <input_bamfile> -c <int or "ALL"> -w <int> -s <int> -z <float> -l <int> -o <output_cnv_file_vcf> -p <output_pairs_file> -m <output_splits_file> -e <logfile>
```
Example:
```bash
singularity exec --nv -e /work/gad/shared/bin/singularity_images/pycuda/pycuda_sam.1.1.sif python3 /work/gad/shared/analyse/test/cnvGPU/test_scalability/cnv_sv_caller_gpu.py -b example.bam -c ALL -w 100 -s 10 -z 1.5 -l 200 -o example_cnv.vcf -p example_pairs.tsv -m example_splits.tsv -e example.log
```
## Output Files
- **VCF File**: Contains structural variant calls with relevant information on chromosome, position, variant type,
copy number, etc.
- **Paired-Read Events**: Details abnormal paired-end read alignments indicating possible structural variations.
- **Split-Read Events**: Lists split-read alignments for further variant investigation.
## Dependencies
- Python 3.x
- CUDA-compatible GPU
- [Numpy](https://numpy.org/), [pysam](https://pysam.readthedocs.io/), [pycuda](https://documen.tician.de/pycuda/),
[pandas](https://pandas.pydata.org/)
## License
This project is licensed under the MIT License.
## Acknowledgments
This tool was developed to assist with high-performance genomic analyses, utilizing GPU acceleration to make
large-scale CNV and SV detection feasible on big datasets.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment