CLI Tutorial

Viola provides command line tools for several features.

Currently, following functions are available:

  • VCF to BEDPE conversion

  • Feature matrix generation for SV signature analysis

  • SV signature extraction

Preparation

Several test VCF files are used in this tutorial.

Before going to the next step, download these files in your working directory.

$ curl -O 'https://raw.githubusercontent.com/dermasugita/ViolaDocs/main/docs/html/_static/tutorial.manta.vcf'
$ mkdir -p signature_analysis/vcf
$ curl https://raw.githubusercontent.com/dermasugita/ViolaDocs/main/docs/html/_static/signature_analysis/definitions.txt > signature_analysis/definitions.txt
$ for i in $(seq 1 3); do curl 'https://raw.githubusercontent.com/dermasugita/ViolaDocs/main/docs/html/_static/signature_analysis/vcf/manta${i}.vcf' > signature_analysis/vcf/manta${i}.vcf; done

Overview

In your python environment (pyenv, virtualenv, pipenv, etc.), run viola.

$ viola

Usage: viola [OPTIONS] COMMAND [ARGS]...

Options:
--help  Show this message and exit.

Commands:
extract-signature
generate-feature-matrix  Generate feature matrix from VCF or BEDPE files.
vcf2bedpe                Convert a VCF file into a BEDPE file.

Actually, viola command has no option except --help, so the de facto syntax is viola COMMAND [ARGS].

VCF to BEDPE

You can convert VCF files into a BEDPE files with viola vcf2bedpe.

$ viola vcf2bedpe -h

Usage: viola vcf2bedpe [OPTIONS] [VCF]

Convert a VCF file into a BEDPE file.

A VCF argument is the path to the input VCF file.

Options:
--version        Show the version and exit.
--caller TEXT    The name of SV caller by which the input VCF was generated.
                [manta, delly, lumpy, gridss] could be acceptable (default,
                manta).

-i, --info TEXT  The names of INFO fields to return. To specify multiple
                INFO, separate them by commas. ex. --info SVTYPE,SVLEN,END

-f, --filter     If specified, FILTER field of the VCF files is included in
                output BEDPE.

-m, --format     If specified, FORMAT field of the VCF files is included in
                output BEDPE.

-h, --help       Show this message and exit.

Now let’s apply the example VCF file you got (See Preparation) to the vcf2bedpe command.

$ viola vcf2bedpe --caller manta tutorial.manta.vcf

chrom1     start1       end1 chrom2     start2       end2     name score strand1 strand2
chr1   82550460   82550461   chr1   82554225   82554226    test1  None       +       -
chr1   22814216   22814217   chr1   92581131   92581132    test2  None       -       -
chr1   60567905   60567906   chr1   60675940   60675941    test3  None       +       -
chr1   69583189   69583190   chr1   69590947   69590948    test4  None       +       -
chr11  104534876  104534877  chr11  104536573  104536574    test5  None       +       -
chr11  111134696  111134697  chr17   26470494   26470495  test6_1  None       +       -
chr17   26470494   26470495  chr11  111134696  111134697  test6_2  None       -       +

The result will be output to the stdout by default.

You can add other VCF features, including FILTER, INFO, and FORMAT.

$ viola vcf2bedpe --caller manta --filter tutorial.manta.vcf

chrom1     start1       end1 chrom2     start2       end2     name score strand1 strand2  MinSomaticScore   PASS
chr1   82550460   82550461   chr1   82554225   82554226    test1  None       +       -             True  False
chr1   22814216   22814217   chr1   92581131   92581132    test2  None       -       -             True  False
chr1   60567905   60567906   chr1   60675940   60675941    test3  None       +       -             True  False
chr1   69583189   69583190   chr1   69590947   69590948    test4  None       +       -            False   True
chr11  104534876  104534877  chr11  104536573  104536574    test5  None       +       -            False   True
chr11  111134696  111134697  chr17   26470494   26470495  test6_1  None       +       -             True  False
chr17   26470494   26470495  chr11  111134696  111134697  test6_2  None       -       +             True  False

$ viola vcf2bedpe --caller manta --info SVTYPE,SVLEN tutorial.manta.vcf

chrom1     start1       end1 chrom2     start2       end2     name score strand1 strand2 svtype_0   svlen_0
chr1   82550460   82550461   chr1   82554225   82554226    test1  None       +       -      DEL     -3764
chr1   22814216   22814217   chr1   92581131   92581132    test2  None       -       -      INV  69766915
chr1   60567905   60567906   chr1   60675940   60675941    test3  None       +       -      DEL   -108034
chr1   69583189   69583190   chr1   69590947   69590948    test4  None       +       -      DEL     -7757
chr11  104534876  104534877  chr11  104536573  104536574    test5  None       +       -      DEL     -1696
chr11  111134696  111134697  chr17   26470494   26470495  test6_1  None       +       -      BND         0
chr17   26470494   26470495  chr11  111134696  111134697  test6_2  None       -       +      BND         0

$ viola vcf2bedpe --caller manta --format tutorial.manta.vcf

chrom1     start1       end1 chrom2     start2       end2     name score strand1 strand2  sample1_N_PR_0  sample1_N_PR_1  sample1_N_SR_0  sample1_N_SR_1  sample1_T_PR_0  sample1_T_PR_1  sample1_T_SR_0  sample1_T_SR_1
chr1   82550460   82550461   chr1   82554225   82554226    test1  None       +       -            21.0             0.0            10.0             0.0            43.0             4.0            15.0             3.0
chr1   22814216   22814217   chr1   92581131   92581132    test2  None       -       -            24.0             0.0             NaN             NaN            35.0             5.0             NaN             NaN
chr1   60567905   60567906   chr1   60675940   60675941    test3  None       +       -            23.0             0.0             NaN             NaN            44.0             6.0             NaN             NaN
chr1   69583189   69583190   chr1   69590947   69590948    test4  None       +       -            21.0             0.0             NaN             NaN            20.0            12.0             NaN             NaN
chr11  104534876  104534877  chr11  104536573  104536574    test5  None       +       -            22.0             0.0             NaN             NaN            57.0            14.0             NaN             NaN
chr11  111134696  111134697  chr17   26470494   26470495  test6_1  None       +       -            12.0             0.0             NaN             NaN            45.0             5.0             NaN             NaN
chr17   26470494   26470495  chr11  111134696  111134697  test6_2  None       -       +            12.0             0.0             NaN             NaN            45.0             5.0             NaN             NaN

Feature Matrix Generation

Simple feature matrix can be generated by viola generate-feature-matrix.

$ viola generate-feature-matrix -h
Usage: viola generate-feature-matrix [OPTIONS] OUTPUT

Generate feature matrix from VCF or BEDPE files.

Options:
--version                       Show the version and exit.
--input-dir TEXT                The directory of input files. When
                                specified, the --files argument is disabled.

--input-files TEXT              The input files separeted by comma. When
                                specified, the --dir argument is disabled.

--input-files-id TEXT           The sample ID of input files separeted by
                                comma.

--format [vcf|bedpe]            File format. vcf or bedpe.
--caller [manta|delly|lumpy|gridss]
                                The name of SV caller by which the input VCF
                                was generated. This option can be specified
                                when --format=vcf.

--svtype-col-name TEXT          Name of the column of BEDPE files that
                                indicate SV type. If not specified, SV type
                                will be infered. This option can be
                                specified when --format=bedpe

--as-breakpoint                 Convert SVTYPE=BND records into breakpoint-
                                wise SV records and infer its SVTYPE. This
                                option is used when --format=vcf

--definitions TEXT              Path to the definition file of custom SV
                                class.

-h, --help                      Show this message and exit.

To run this command, custom SV definition file is required. You’ve may already downloaded definitions.txt in Preparation.

This is the content of definitions.txt. Detailed description for the syntax is explained here.

name "smaller DEL"
0 SVTYPE == DEL
1 SVLEN > -100
logic 0 & 1

name "larger DEL"
0 SVTYPE == DEL
logic 0

name "smaller DUP"
0 SVTYPE == DUP
1 SVLEN < 100
logic 0 & 1

name "larger DUP"
0 SVTYPE == DUP
logic 0

name "smaller INV"
0 SVTYPE == INV
1 SVLEN < 100
logic 0 & 1

name "larger INV"
0 SVTYPE == INV
logic 0

name "translocation"
0 SVTYPE == TRA
logic 0

Example command for feature matrix generation.

$ viola generate-feature-matrix --input-dir signature_analysis/vcf --format vcf --caller manta --as-breakpoint --definitions signature_analysis/definitions.txt output.tsv

######### output.tsv ##########
patients        smaller DEL     larger DEL      smaller DUP     larger DUP      smaller INV     larger INV      translocation   others
manta1  2       3       1       0       2       1       2       0
manta3  2       1       0       0       0       1       0       0
manta2  2       2       1       2       2       1       2       0

Now you have a simple feature matrix.