Filtering Annotations¶
Jannovar is foremost a program for annotating variants. Its features include the annotation based on predicted molecular impact, but also for compatibility with different inheritance models.
These annotations can in turn be used for filtering variants, i.e., including or excluding variants based on some criteria.
The functionality for filtration is not included in Jannovar itself but can easily be performed with bcftools (or even grep
if you are brave).
Variant Filtration with BCFtools¶
Given an annotated VCF file, we can easily use the bcftools view
command for filtering the variant to
- contain only variants with a given annotation, or
- contain no variant with a given annotation.
We will use the following annotated VCF file small.vcf
that we will filter.:
##INFO=<ID=INHERITANCE,Number=.,Type=String,Description="Mode of Inheritance">
##contig=<ID=1,length=249250621>
##jannovarCommand=annotate-v -d data/hg19_refseq.ser -i small.vcf -o small.jv.vcf
##jannovarVersion=0.17
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT individual
1 866511 rs60722469 C CCCCT 258.62 . ANN=CCCCT|coding_transcript_intron_variant|LOW|SAMD11|148398|transcript|NM_152486.2|Coding|4/13|c.305+42_305+43insCCCT|p.(%3D)|386/18841|306/2046|102/682||;INHERITANCE=AR GT:AD:DP:GQ:PL 1/1:6,5:11:14.79:300,15,0
1 879317 rs7523549 C T 150.77 . ANN=T|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.799C>T|p.(Arg267Cys)|1155/19962|799/1188|267/396|| GT:AD:DP:GQ:PL 0/1:14,7:21:99:181,0,367
1 879482 . G C 484.52 . ANN=C|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.964G>C|p.(Asp322His)|1320/19962|964/1188|322/396|| GT:AD:DP:GQ:PL 0/1:28,20:48:99:515,0,794
For example, we can limit the variants to those compatible with autosomal recessive mode of inheritance.
$ bcftools view -i 'INHERITANCE[*] = "AD"' small.vcf
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=249250621>
##INFO=<ID=INHERITANCE,Number=.,Type=String,Description="Mode of Inheritance">
##bcftools_viewVersion=1.2+htslib-1.2.1
##bcftools_viewCommand=view -i 'INHERITANCE[*] = "AD"' small.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT individual
1 866511 rs60722469 C CCCCT 258.62 . ANN=CCCCT|coding_transcript_intron_variant|LOW|SAMD11|148398|transcript|NM_152486.2|Coding|4/13|c.305+42_305+43insCCCT|p.(%3D)|386/18841|306/2046|102/682||;INHERITANCE=AR GT:AD:DP:GQ:PL 1/1:6,5:11:14.79:300,15,0
Similarly, we can remove all files compatible with this mode of inheritance.
$ bcftools view -e 'INHERITANCE[*] = "AD"' small.vcf
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=249250621>
##INFO=<ID=INHERITANCE,Number=.,Type=String,Description="Mode of Inheritance">
##bcftools_viewVersion=1.2+htslib-1.2.1
##bcftools_viewCommand=view -i 'INHERITANCE[*] = "AD"' small.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT individual
1 879317 rs7523549 C T 150.77 . ANN=T|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.799C>T|p.(Arg267Cys)|1155/19962|799/1188|267/396|| GT:AD:DP:GQ:PL 0/1:14,7:21:99:181,0,367
1 879482 . G C 484.52 . ANN=C|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.964G>C|p.(Asp322His)|1320/19962|964/1188|322/396|| GT:AD:DP:GQ:PL 0/1:28,20:48:99:515,0,794
The following shows how to limit the variants to those having a missense functional impact.
$ bcftools view -i 'ANN ~ "missense"' small.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=ANN,Number=1,Type=String,Description="Functional annotations:'Allele|Annotation|Annotation_Impact|Gene_Name|Gene_ID|Feature_Type|Feature_ID|Transcript_BioType|Rank|HGVS.c|HGVS.p|cDNA.pos / cDNA.length|CDS.pos / CDS.length|AA.pos / AA.length|Distance|ERRORS / WARNINGS / INFO'">
##INFO=<ID=INHERITANCE,Number=.,Type=String,Description="Mode of Inheritance">
##contig=<ID=1,length=249250621>
##jannovarCommand=annotate-v -d data/hg19_refseq.ser -i small.vcf -o small.jv.vcf
##jannovarVersion=0.17
##bcftools_viewVersion=1.2+htslib-1.2.1
##bcftools_viewCommand=view -i 'ANN ~ "missense"' small.jv.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT individual
1 879317 rs7523549 C T 150.77 . ANN=T|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.799C>T|p.(Arg267Cys)|1155/19962|799/1188|267/396|| GT:AD:DP:GQ:PL 0/1:14,7:21:99:181,0,367
1 879482 . G C 484.52 . ANN=C|missense_variant|MODERATE|SAMD11|148398|transcript|XM_005244727.1|Coding|9/9|c.964G>C|p.(Asp322His)|1320/19962|964/1188|322/396|| GT:AD:DP:GQ:PL 0/1:28,20:48:99:515,0,794
Similarly, we could use -e
instead of -i
for inverting the selection.