Annotating CSV FilesΒΆ
Sometimes, it is useful to annotate a whole CSV file where you have the chromosome, position, reference allele, and alternative allele in different columns.
You can do this using the annotate-csv
command of Jannovar.
You have to pass a path to a annotation database file and one or more chromosomal change specifiers. Jannovar will then return the effect, the HGVS annotation at the end of the line in the given CSV format and prints it out to the standard output.
Just imagine we have tab separated file with a header named input.tsv
contig position reference alt
chr1 12345 C A
chr1 12346 C A
Now we run jannovar with this command an will get this output:
# java -jar jannovar-cli-0.24.jar annotate-csv -d data/hg19_refseq.ser --input input.tsv -c 1 -p 2 -r 3 -a 4 --header --type TDF [...] contig position reference alt HGVS FunctionalClass chr1 12345 C A DDX11L1:NR_046018.2:n.354+118C>A: NON_CODING_TRANSCRIPT_INTRON_VARIANT chr1 12346 C A DDX11L1:NR_046018.2:n.354+119C>A: NON_CODING_TRANSCRIPT_INTRON_VARIANT
The format for the chromsomal change is as follows:
{CHROMOSOME} {POSITION} {REF} {ALT}
- CHROMOSOME
- name of the chromosome or contig
- POSITION
- position of the first change base on the chromosome; in the case of insertions the first base after the insertion; the first base on the chromosome has position
1
- REF
- the reference bases
- ALT
- the alternative bases
Right now it is only possible to use the column number and not the header column. This might be extended in the future. Possible CSV file types are:
- Default
- Standard comma separated format, as for RFC4180 but allowing empty lines.
- TDF
- Tab-delimited format.
- RFC4180
- Comma separated format as defined by RFC4180.
- Excel
- Excel file format (using a comma as the value delimiter). Note that the actual value delimiter used by Excel is locale dependent, it might be necessary to customize this format to accommodate to your regional settings.
- MySQL
- Default MySQL format. This is a tab-delimited format with a LF character as the line separator. Values are not quoted and special characters are escaped with
\
. The default NULL string is\\N
.