viola.Vcf

class Vcf(df_svpos, df_filters, odict_df_info, df_formats, odict_df_headers={}, metadata=None, patient_name=None)

Relational database-like object containing SV position dataframes, FILTER dataframe, INFO dataframes, FORMAT dataframe, and HEADER dataframes. The instances of this class have information equal to the VCF files.

Variables
  • sv_count (int) – Number of SV records

  • table_list – List of names of all tables included in the object

  • ids – List of all SV id.

  • patient_name – Patient name.

  • contigs – List of the contigs (chromosomes)

Parameters
  • df_svpos (DataFrame) – DataFrame containing information such as position, strand, svtype, etc. Columns should be following: [‘id’, ‘chrom1’, ‘pos1’, ‘chrom2’, ‘pos2’, ‘strand1’, ‘strand2’, ‘ref’, ‘alt’, ‘qual’, ‘svtype’] Main key is ‘id’. The ‘chrom1’ and ‘chrom2’ are the foreign key from contigs_meta table.

  • df_filter (DataFrame) – DataFrame containing FILTER information which locate on the 7th column of the vcf file. Columns of the input DataFrame should be following: [‘id’, ‘filter’] Main Key is the combination of (‘id’, ‘filter’). Each column is the foreign key from df_svpos, and filters_meta table, respectively.

  • odict_df_info (dict[str, DataFrame]) – OrderedDict of DataFrames which contain additional information on SV record (equivalent to INFO field of vcf). Each item of the dictionary contains single INFO. The dictionary key is the name of each INFO and should be in lowercase. Columns of the DataFrame should be following: [‘id’, ‘value_idx’, ‘infoname’] The ‘value_idx’ column contains 0-origin indice of INFO values. This is important when one SV record has multiple values of an INFO (eg. cipos). Main key is the combination of (‘id’, ‘value_idx’), and ‘id’ is the foreign key coming from df_svpos table.

  • df_formats (DataFrame) – DataFrame containing FORMAT information of the vcf file. Columns of the DataFrame should be following: [‘id’, ‘sample’, ‘format’, ‘value_idx’, ‘value’] Main key is the combination of (‘id’, ‘sample’, ‘format’). The (‘id’, ‘sample’, ‘format’) are the foreign key coming from (df_svpos, samples_meta, format_meta) table, respectively.

__init__(df_svpos, df_filters, odict_df_info, df_formats, odict_df_headers={}, metadata=None, patient_name=None)

Methods

__init__(df_svpos, df_filters, ...[, ...])

add_info_table(table_name, df)

Add a new INFO table to self.

annotate_bed(bed, annotation[, suffix])

Annotate SV breakpoints using Bed class object.

append_filters(base_df[, left_on])

Append filters to the right of the base_df, based on the SV id columns.

append_formats(base_df[, left_on])

Append formats to the right of the base_df, based on the SV id columns.

append_infos(base_df, ls_tablenames[, ...])

Append INFO tables to the right of the base_df, based on the SV id columns.

as_bedpe()

Convert Vcf object into Bedpe object.

breakend2breakpoint()

Converts a Vcf object into a breakpoint-based Vcf object by integrating the paired breakends (BND) and infering their SVTYPE.

calculate_info(operation, name)

Calculate values of INFO tables according to the 'operation' argument and add a new INFO table as the result.

change_repr_config(key, value)

classify_manual_svtype(definitions, ...[, ...])

Classify SV records by user-defined criteria.

copy()

Return copy of the instance.

drop_by_id(svid)

Remove SV records specified in "svid" argument.

filter(ls_query, query_logic)

Filter Vcf object by the list of queries.

filter_by_id(arrlike_id)

Filter Vcf object according to the list of SV ids.

get_feature_count_as_series(feature, ls_order)

Return counts of unique values as a pd.Series for the INFO specified in the "feature" argument.

get_ids()

Return all SV ids as the set type.

get_info(info_name)

Return a info specified in the argument as pandas DataFrame object.

get_microhomology(fasta[, max_homlen])

Infer microhomology length and sequence in each breakpoint.

get_table(table_name)

Return a table specified in the argument as pandas DataFrame object.

integrate(merged_vcf, priority)

Return an integrated Vcf object

is_reciprocal()

merge(ls_vcf, ls_caller_names, threshold[, ...])

Return a merged or integrated vcf object from mulitple caller's bedpe objects in ls_bedpe

remove_info_table(table_name)

Remove an INFO table from self.

replace_svid(to_replace, value)

Renamed specified SV ID.

replace_table(table_name, table)

Replace existing table into new table.

set_value_for_info_by_id(table_name, sv_id, ...)

Set value to the specified info table by sv_id.

to_bedpe(file_or_buf[, custom_infonames, ...])

to_bedpe_like(file_or_buf, custom_infonames=[], add_filters, add_formats, confidence_intervals: bool=False) Return a BEDPE file.

to_bedpe_like([custom_infonames])

Return a DataFrame in bedpe-like format.

to_vcf(path_or_buf)

Return a vcf-formatted String.

to_vcf_like()

Return a vcf-formatted DataFrame.

view(custom_infonames, return_as_dataframe)

Quick view function of the Vcf object.

Attributes

contigs

Return a list of contigs (chromosomes) listed in the header of the VCF file.

ids

Return all SV ids as list.

idx

patient_name

Return the name of the patient.

repr_config

Return current configuration of __repr__() function.

sv_count

Return number of SV records.

table_list

Return a list of names of all tables in the object.