README.md 1.61 KB
Newer Older
1
# GAD STR expansion detection pipeline
2

3 4 5 6
 - Author:  Anne-Sophie Denommé-Pichon
 - Version: 0.0.1
 - Licence: AGPLv3
 - Description: this pipeline allows to get STR genotype from short-read genomes on the locus specified. It uses ExpansionHunter, Tredparse and GangSTR. It computes genotypes called by the tools and identifies STR expansions using 3 outlier detection methods to highlight abnormal repeat counts.
7

8 9 10 11 12 13
## Setup

 - Fill the configuration file `config.sh`. There is an example in the repository.
 - Create `samples.list` (bam file names without .bam). There is an example in the repository.

## Usage
14 15

For now, scripts have to be launched from the clone directory.
16

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
### Calling STRs

Launch `launch_pipeline.sh`:

```sh
nohup ./launch_pipeline.sh samples.list &
```

Dependencies:
 - `config.sh`
 - `samples.list`
 - `pipeline.sh`
 - `wrapper_delete.sh`
 - `wrapper_ehdn.sh`
 - `wrapper_expansionhunter.sh`
 - `wrapper_gangstr.sh`
 - `wrapper_transfer.sh`
 - `wrapper_tredparse.sh`

### Identifying outliers 

38 39 40 41
To highlight abnormal repeat counts, the pipeline identified outliers using 3 methods: 
 1. repeats counts at a given locus > normal (in the gray zone or pathological zone)
 2. repeats counts at a given locus > 99th percentile or
 3. repeats counts at a given locus ≥ 4 standard deviations above the mean (Z-score ≥ 4). 
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

Launch `launch_results.sh`:

```sh
nohup ./launch_results.sh samples.list &
```

Dependencies:
 - `config.sh`
 - `samples.list`
 - `patho.csv`
 - `getResults.py`
 - `launch_str_outliers.sh`
 - `str_outliers.py`

## Future work

Another tool, ExpansionHunter DeNovo, will be added in the pipeline.