1. What information is available in the dbInDel?

dbInDel is the first database presenting bona fide InDels in putative enhancers for 254 human and 21 murine cell line/tissue samples, which can be visualized in an interactive manner and are downloadable. Along with InDels detected, the database predicts potentially recruited transcription factors because of binding motifs caused by InDels. In addition, dbInDel incorporates annotation of InDel-assigned genes, together with their expression profiles in normal and cancer cells/tissues as well as survival analysis. The discovery of enhancer-associated InDels provides a foundation for investigation of the functional contributions of this class of variants across a broad spectrum of human diseases. More importantly, enhancer-associated InDels may reveal a new means to implicate potential targets for therapies and diagnostic approaches that empower precision medicine.

2. What samples are included in the dbInDel?

dbInDel catalogs the enhancer-associated InDels detected in 254 human and 21 murine cell line/tissue samples. Among 254 human samples, 242 are cancer cells/tissues. The detailed sample information can be found in the Browse page, and the statistical summaries of sample and InDels are displayed in the Statistics page.

3. What are the unique features the dbInDel?

Although only a few driver noncoding mutations have been identified and validated (Horn et al., 2013; Huang et al., 2013; Mansour et al., 2014; Rahman et al., 2017), these findings strongly suggest that somatic noncoding mutations which contribute to tumor biology also occur in gene regulatory elements. Identification of InDels from sequencing reads of H3K27ac ChIP-Seq, which are generated predominantly from active regulatory domains, provides a direct link between the variants and their putative function in regulating gene expression. Importantly, dbInDel further analyzes binding motifs caused by InDels and predicts transcription factors potentially recruited, thereby providing mechanistic predictions of the functions of these enhancer-associated variants.

4. How to use the dbInDel?

4.1 Quick Search

In the home page of the dbInDel, users can type in a gene ID or symbol to perform a quick search of the corresponding InDel-assigned gene of interest. Quick search result will be re-directed to Search page.

4.2 Browse

In the Browse page, users can simply click or enter search terms to filter samples included in the database. The left panel of the Browse page is organized for sample filtering based on species, biosample types and caner types. “Search” box can also be used to locate the samples of interest. By clicking the sample name, the InDel information, sorted by coverage-biasing InDels, will be displayed as an interactive table. Further clicking an InDel will open a new tab showing sample information, general description of the InDel-assigned gene including: i) crosslink of drug-gene interaction database; ii) potential roles in human cancers which are integrated from COSMIC database, a most recent TCGA Pan-Cancer analysis entitled “Comprehensive Characterization of Cancer Driver Genes and Mutations”, as well as PubMed ID`s crosslink of GWAS studies. Notably, users can interactively visualize the enhancers and InDels based on sequencing reads of H3K27ac ChIP-Seq and potentially-recruited transcription factors because of binding motifs caused by InDels. In addition, gene expression profiles in normal and cancer cells/tissues and survival plots are displayed.

Example 1: Browse enhancer-associated InDels of TAL1 in Jurkat cell line

1.1 Users can click Browse → Human → Cell line → Blood, or Browse → Cancer Type → Leukemia, or simply enter "Jurkat" in "Search" tab. After selecting "Jurkat", all InDels for this cell line will be displayed.

Users can also filter insertions and deletions by enter "ins" or "del" in the 'In-Del' tab of the table.

1.2 After selecting or searching “TAL1”, users can see the following plot showing the enhancer-assigned gene TAL1, the location of the InDel and transcription factors which are recruited because of the InDel.

If zoomed in, the sequence of reference genome (hg19/mm10) will be displayed. In the “Mapping” frame, H3K27ac ChIP-Seq reads in bins at the TAL1 locus are shown and the InDel is noted in orange.

1.3 In the “Expression” frame, TAL1 gene expression profiles are shown across paired normal and tumor samples, and cancer cell lines.

1.4 In the “Survival” frame, Kaplan-Meier survival plots of TAL1 expression in 28 different tumor types is are displayed.

4.3 Search

User can also enter a gene symbol or a genomic region to find the corresponding InDels and genes of interest. In the result page, user can further search in each tab.

Moreover, user can select cancer type of interest in the search page, or enter a name in "Cancer_type" tab in the result page.

4.4 Download

In the left panel of “Download”, samples can be filtered or searched by sample names. The table of InDels, “.bw” file and enhancers of the sample can be downloaded in the right panel.

4.5 Data Sources

A one-to-one mapping of each sample to PubMed ID(s) and GSM ID(s) is provided in Data Sources.

4.6 Statistics

The distribution of Indels across different cancer types is displayed.

5. Flowchart to detect and annotate enhancer-associated InDels

Genome sequencing has proven valuable for the identification of numerous somatic variants in coding DNA. However, small InDel variants in non-coding genome, especially in enhancers, that may play functional roles in tumorigenesis and other disease settings are poorly understood. However, current whole-genome analysis discards most of this class of variants because of computational complexity and burden.

Notably, enhancer-associated noncoding variants can be computationally identified from H3K27ac ChIP-Seq data in cells/tissues. The detailed process for enhancer-associated InDel model used in the dbInDel, summarized below in a flowchart, are based on a seminal study with modifications (Abraham et al., 2017).

6. Abbreviations of cancer types

7. The rule used for an InDel variant naming

“chr1:47704968-47704968 ins CCGTTTCCTAAC” means the variant is on chromosome 1, position 47704968, with “CCGTTTCCTAAC” insertions.

“chr1:47697695-47697695 del C” means the variant is on chromosome 1, position 47697695, with a “T” deletion.

8. Contact us

If you have any questions regarding the usage of dbInDel or the interpretation of results, please do not hesitate to contact us. We always welcome any suggestions on how to improve the database. Please contact us with feedback to Dr. Moli Huang (huangml at suda dot edu dot cn). Thank you.

*Your information will not be used for any purpose.