Molecular communication: the information exchange inside our cells
Engineers have developed methods to transmit information efficiently and reliably so that two people, like Alice and Bob, can carry out conversations using various communication technologies. Using these tools, students can acquire knowledge that was traditionally exchanged between a teacher and a student face-to-face.
Communication is also essential in music: an orchestra conductor and musicians make eye contacts to create rich, harmonious sound together.
Similar mechanisms exist inside our cells to regulate gene expression and function. In cells, communication between a gene and sequences that regulate the gene occurs through physical contact. These physical contacts between regulatory element enhancers (enh) and the gene promoters (prom) may activate gene expression. Our goal is to understand how this communication occurs within the nucleus.
We leverage 3D genome mapping technologies to capture the dynamic chromatin interactions and study gene regulation in the context of developmental biology. Algorithms on graphs allow us to interpret the data.
Dynamic chromatin interactions
Various protein factors--each with distinct roles--organize our genome within the nucleus. CTCF is a zinc finger protein that directly binds to the DNA and forms structural loops with cohesin, a ring-like extruder. This process occurs over time and quantifying the dynamics will help us understand the biophysical properties of these loops. Cohesin is also found to link a distal enhancer element to its target gene promoter (TSS) and potentially help RNA Polymerase II (RNAPII) to transcribe.
CTCF, cohesin, RNAPII, and other protein factors cooperate to form dynamic chromatin interactions, which may be regulating gene expression. We aim to study the molecular mechanisms by which protein factors orchestrate transcriptional activities.
Our ultimate goal is to understand the fundamental question in biology: how one cell becomes another cell type. For example, embryonic stem cells form three germ layers ectoderm, mesoderm, and endoderm, each of which differentiates into specific cell types ranging from neurons to cardiomyocytes. It is known that changes in transcriptional landscape accompany the differentiation processes in early development, yet the precise mechanisms controlling these transcriptional changes remain elusive.
By applying 3D genome mapping technologies to mouse embryonic stem cells, we can identify which of the chromatin interations are correlated with gene expression and potentially "instruct" cells to become a specific cell type. In addition, we utilize hybrid mouse strains to delineate if traits are inherited from the maternal or paternal genome.
Algorithms on graphs
Interpreting the data generated from high-throughput sequencing assays requires sophisticated algorithms and computational tools.
In particular, Hi-C, ChIA-PET, and their variants are based on proximity ligation and yield only pairwise interactions. The frequency of the interactions among genomic loci A, B, C, and D is measured by the number of sequencing reads connecting two specific loci. These genome-wide interaction maps can be converted into a graph, where nodes (vertices) are genomic loci and edges are interaction frequencies.
By contrast, ligation-free methods ChIA-Drop, SPRITE, and GAM provide multi-way interactions that may involve 2 or more loci. These data naturally lend themselves to hypergraphs, where a hyperedge can connect 2 or more vertices.
We and others borrow this framework to develop algorithms that help us visualize, interpret, and predict the 3D genome mapping data.