diff --git a/README.txt b/README.txt new file mode 100644 index 0000000000000000000000000000000000000000..71841b51532bde4f643e4556eda68ac14b406833 --- /dev/null +++ b/README.txt @@ -0,0 +1,110 @@ + +# Genomic CNV and SV Detection with GPU Acceleration + +This project performs copy number variation (CNV) and structural variant (SV) detection on genomic data, leveraging +GPU acceleration to enhance performance for large datasets. It includes calculations of mappability, GC content, depth, +and normalization, followed by variant detection and result output in various formats, including VCF for SVs. + +## Features + +- **Copy Number Variation (CNV) Analysis**: Analyzes depth of coverage across genomic windows to detect CNVs. +- **Structural Variant (SV) Detection**: Identifies SVs (e.g., deletions, inversions, translocations) using paired-end + and split-read alignments. +- **GPU Acceleration**: Utilizes CUDA-enabled GPU processing to improve the efficiency of mappability, GC content, + depth, and normalization calculations. +- **Customizable Parameters**: Adjustable settings for window size, step size, and z-score thresholds. + +### Author: +**SERRALTA Theo** + +### Collaborators: +**DUFFOURD Yannis** + +### Laboratory: +**GAD** + +### Date: +**28/09/2023** + +## Installation + +1. Ensure you have Python and CUDA installed. +2. Install the necessary Python packages: + + ```bash + pip install numpy pysam pycuda pandas + ``` + +3. Clone this repository. + +## Usage + +### Platform +Currently, this software is designed to run exclusively on the CCUB (Computing Center of the University of Burgundy). + +### Directory +Navigate to the directory: + +```bash +cd /work/gad/shared/analyse/test/cnvGPU/test_scalability/ +``` + +### Recommended Execution with qsub + +Run the following command to execute using qsub: + +```bash +qsub -v NUM_CHR=<ALL_or_num_chr>,INPUTFILE=</path/to/the/input/bam/file>,LOGFILE=</path/to/the/log/file>,OUTPUT=</path/to/the/output/file>,OUTPUT_PAIRS=</path/to/the/output_pairs/file>,OUTPUT_SPLITS=</path/to/the/output_splits/file> ./wrapper_cnvGPU.sh +``` + +Example: + +```bash +qsub -pe smp 1 -v NUM_CHR=ALL,INPUTFILE=/work/gad/shared/analyse/test/cnvGPU/test_scalability/dijen1000.bam,OUTPUT=exemple.out.tsv,OUTPUT_PAIRS=exemple.out_pairs.tsv,OUTPUT_SPLITS=exemple.out_splits.tsv,LOGFILE=exemple.log ./wrappers/wrapper_cnvGPU.sh +``` + +### Modifying Parameters + +Certain parameters can be customized within the wrapper script: + +- `window_size` (w): Default is `-w 100` +- `step_size` (s): Default is `-s 10` +- `zscore_threshold` (z): Default is `-z 1.5` +- `lengthFilter` (l): Default is `-l 200` + +### Direct Execution without Wrapper + +Alternatively, you can execute the program directly with Singularity: + +```bash +singularity exec --nv -e /work/gad/shared/bin/singularity_images/pycuda/pycuda_sam.1.1.sif python3 /work/gad/shared/analyse/test/cnvGPU/test_scalability/cnv_sv_caller_gpu.py -b <input_bamfile> -c <int or "ALL"> -w <int> -s <int> -z <float> -l <int> -o <output_cnv_file_vcf> -p <output_pairs_file> -m <output_splits_file> -e <logfile> +``` + +Example: + +```bash +singularity exec --nv -e /work/gad/shared/bin/singularity_images/pycuda/pycuda_sam.1.1.sif python3 /work/gad/shared/analyse/test/cnvGPU/test_scalability/cnv_sv_caller_gpu.py -b example.bam -c ALL -w 100 -s 10 -z 1.5 -l 200 -o example_cnv.vcf -p example_pairs.tsv -m example_splits.tsv -e example.log +``` + +## Output Files + +- **VCF File**: Contains structural variant calls with relevant information on chromosome, position, variant type, + copy number, etc. +- **Paired-Read Events**: Details abnormal paired-end read alignments indicating possible structural variations. +- **Split-Read Events**: Lists split-read alignments for further variant investigation. + +## Dependencies + +- Python 3.x +- CUDA-compatible GPU +- [Numpy](https://numpy.org/), [pysam](https://pysam.readthedocs.io/), [pycuda](https://documen.tician.de/pycuda/), + [pandas](https://pandas.pydata.org/) + +## License + +This project is licensed under the MIT License. + +## Acknowledgments + +This tool was developed to assist with high-performance genomic analyses, utilizing GPU acceleration to make +large-scale CNV and SV detection feasible on big datasets.