There has been extraordinary progress in molecular biology during the 50-year span that began with the discovery of the DNA double helix and culminated with the nearly complete specification of our genetic inheritance at level of DNA sequence. The bulk of the eukaryotic genome is packaged into nucleosome particles, each of which comprises an octamer with two copies of each of four core histones--H2A, H2B, H3, and H4--which wrap nearly two turns of DNA. Nucleosomes can be differentiated both by numerous post-translational histone modifications and by incorporation of histone variants, which can replace canonical histones to form nucleosomes with special roles and properties. In contrast to our understanding of genomes, the inheritance of differences in gene expression between cells and tissues and how they are mediated by histones and other chromatin proteins is poorly understood. To better understand inheritance that does not depend on DNA sequence, we apply genomic tools to the study of proteins of the epigenome: histones, transcription factors, nucleosome remodelers, and RNA polymerase II (RNAPII).
We have introduced genomic tools to probe the dynamic structure of the chromatin landscape and explore its relationship to gene regulation and centromere function. These tools include salt fractionation, a method to extract classically ‘active’ chromatin; CATCH-IT, a metabolic labeling strategy to directly measure nucleosome turnover; INTACT, a cell-type-specific nuclear purification method to determine chromatin differences between tissues; ORGANIC, a method for mapping native chromatin at base-pair resolution; 3'NT, a method for determining the last base added onto a nascent chain within the active site of RNA polymerase II (RNAPII); TMP-seq, a method to map DNA torsion genome-wide; MNase X-ChIP-seq, a high-resolution cross-linked chromatin immunoprecipitation (ChIP) protocol for large insoluble complexes, ChEC-seq, an in situ alternative to ChIP; MINCE-seq, a metabolic labeling strategy for observing changes in nucleosomes and transcription factors (TFs) during DNA replication; H3Q85C chemical cleavage for precise genome-wide mapping of single nucleosomes and linkers in vivo; and CUT&RUN, an in situ alternative to ChIP using antibody-targeted tethered MNase to release DNA adjacent to proteins of interest, which has low background, high resolution, requires about one tenth the sequencing depth of ChIP, and permits profiling with low cell numbers.
Transcription through Nucleosomes
We have used 3'NT to address a long-standing question in the transcription field: How do RNA polymerases overcome nucleosome barriers in vivo? By comprehensively mapping the positions of elongating and arrested RNAPII using 3'NT, we found that nucleosomes are barriers to RNAPII elongation at essentially all genes, with the nucleosome downstream of the transcriptional start site the strongest barrier. The histone variant H2A.Z is enriched in this nucleosome, and we found that it acts to reduce nucleosome barrier strength. One potential mechanism for overcoming the nucleosome barrier to transcription is to mobilize nucleosomes by ATP-dependent remodelers. Using MNase X-ChIP-seq, we found that the Chd1 remodeler is recruited to promoters of mouse genes, where it causes nucleosomes to turn over during transcription and allows RNAPII to escape into the gene body. Another potential mechanism for overcoming the nucleosome barrier to transcription is the DNA torsional stress created by RNA polymerase transit, which can unwrap and destabilize nucleosomes. Using TMP-seq, we found that inhibiting topoisomerases results in both increased torsion measured at high resolution and increased turnover of nucleosomes, confirming this mechanism in vivo. We applied H4S47C-anchored cleavage mapping to identify asymmetric nucleosomes flanking budding yeast promoters that are evidently intermediates in nucleosome remodeling. We also found that compounds used in standard chemotherapy that intercalate between the bases and potentially generate torsional stress also enhance nucleosome turnover associated with transcription, suggesting a chromatin-based mechanism for cell killing by these drugs. Taken together, our findings provide a mechanistic framework for transcription through a nucleosome in vivo.
Nucleosomes are disrupted during transcription and other active processes, but the structural intermediates during nucleosome disruption in vivo are unknown. To identify intermediates, we mapped subnucleosomal protections in Drosophila cells using Micrococcal Nuclease followed by sequencing. At the first nucleosome position downstream of the transcription start site, we identified unwrapped intermediates, including hexasomes that lack either proximal or distal contacts. Inhibiting topoisomerases or depleting histone chaperones increased unwrapping, whereas inhibiting release of paused RNAPII or reducing RNAPII elongation decreased unwrapping. Our results indicate that positive torsion generated by elongating RNAPII causes transient loss of histone-DNA contacts. Using this mapping approach, we found that nucleosomes flanking human CTCF insulation sites are similarly disrupted. We also identified diagnostic subnucleosomal particle remnants in cell-free human DNA data as a relic of transcribed genes from apoptosing cells. Thus identification of subnucleosomal fragments from nuclease protection data represents a general strategy for structural epigenomics.
We have used MINCE-seq to characterize the genome-wide location of nucleosomes and other chromatin proteins behind replication forks at high temporal and spatial resolution. We found that the characteristic chromatin landscape at Drosophila promoters and enhancers is lost upon replication. The most conspicuous changes are at promoters that have high levels of RNAPII stalling and DNA accessibility and show specific enrichment for the BRM remodeler. Enhancer chromatin is also disrupted during replication, suggesting a role for TF competition in nucleosome re-establishment. Thus, the characteristic nucleosome landscape emerges from a uniformly packaged genome by the action of TFs, RNAPII, and remodelers minutes after replication fork passage. MINCE-seq thus provides a first glimpse into the dynamic processes that establish and maintain the chromatin landscape every cell generation.
A class of histone variants in which we have a long-standing interest mediates chromosome segregation. Centromere-specific histone H3 variants, called cenH3, CENP-A (in mammals), or Cse4 (in yeast), mark the location of the kinetochore, which attaches to microtubules to segregate chromosomes in mitosis and meiosis. We previously showed that cenH3 nucleosomes of budding yeast wrap DNA to form positive supercoils, in contrast to conventional nucleosomes, which form negative supercoils. More recently, we precisely characterized this nucleosome in vivo and in vitro. We used ORGANIC and V-plot analysis to show that the ~120 bp budding yeast centromere consists of a particle containing cenH3 and H2A wrapped by the ~90 percent AT-rich ~80 bp central DNA segment (CDEII). This supports a hemisome model in which a core containing one each of the four histones is wrapped by CDEII. We also produced stable cenH3-H4-H2A-H2B hemisomes in vitro by reconstitution with a 78 bp CDEII DNA duplex. To precisely delineate the organization of the particle wrapped by CDEII, we applied H4S47C-anchored cleavage mapping, which converts histone H4 into a cleavage reagent, thus revealing the precise position of histone H4 in every nucleosome in the genome. We found that a single core structure is compatible with centromere cleavage patterns and distances; in this structure, oppositely oriented cenH3-H4-H2A-H2B hemisomes occupy one of two rotationally phased positions on each of the 16 yeast centromeres at similar frequencies within the population.
In contrast, H4S47C-anchored cleavage mapping of centromeric nucleosomes in fission yeast indicates these nucleosomes have two H4 molecules, suggesting conventional octameic nucleosomes. In fission yeast, cenH3 nucleosomes inhabit the central domain that is nearly devoid of H3 nucleosomes, but they are not consistently positioned and are variably spaced, with no preferred positions for assembly of inner kinetochore proteins.
In repeat-based centromeres of plants and animals, satellite sequences position the cenH3 nucleosomes, which are translationally and rotationally phased, possibly contributing to kinetochore stability. Applying new experimental and computational tools, we have begun to elucidate the molecular organization of animal and plant centromeres embedded in homogeneous satellite repeats, which have proven intractable to current mapping strategies. Using hierarchical clustering of sequences immunopreciptated by kinetochore proteins of the constitutive centromere-associated network (CCAN), we found that a unique chromatin complex occupies young dimeric α-satellite arrays that dominate functional human centromeres, with two positioned CENP-A (cenH3) nucleosomes that each protect about 100 bp separated by a linker that contains the CENP-B box, bound by CENP-B. CENP-T appears to occupy this linker, forming a complex with two CENP-A nucleosomes and CENP-C.
However, the low salt conditions typically used in ChIP leave more than 80% of kinetochore proteins insoluble, raising the possibility that we were looking at a structurally distinct fraction of CCAN complexes. Indeed CCAN complexes extracted by native ChIP in high salt yielded heterogeneous larger fragments of ~100-450 bp. By combining classical salt fractionation of chromatin with CUT&RUN (CUT&RUN.Salt), in which MNase is tethered and does not nibble or cut particle fragments internally, we observed primarily fragments of ~160-185 bp with a smaller peak ~340 bp, regardless of salt solubility. Low salt fragments were less enriched in for CENP-B, suggesting that CENP-B contributes to stability, and both CENP-B box density and match to the CENP-B box consensus sequence correlated with the efficiency of CCAN formation on a- satellite dimeric arrays. Surprisingly we found a diversity of CCAN structures on neighboring dimers that diverged by as little as 5%, with sharply different occupancies on some adjacent monomers, and with differences in orientation of the complex relative to the CENP-B box.
Despite the conserved function of centromeres, necessary at every cell division, centromere sequences are not conserved and show a surprising diversity in both sequence and organization. Centromeres range from the point centromeres of budding yeast, with a single cenH3 nucleosome and a consensus sequence on each chromosome, to the regional centromeres of fission yeast and many other single-cell eukaryotes that encompass a few kilobases of often unique sequence, to the tandem repeat centromeres of most animals and plants that span megabases of DNA, to holocentric centromeres, in which attachment to the spindle apparatus spans the entire length of the chromosome. Even sibling species such as Drosophila melanogaster and D. simulans have remarkably different centromere sequences. In D. melanogaster centromeres comprise a few families of short 5- and 10-bp repeats, whereas D. simluans centromeres comprise a complex family of diverged 500 bp repeats. Comparison of the quantity of each repeat found across related Drosophila species indicates that individual centromere repeat families are expanding in the lineages where they serve as centromeres. This is consistent with the centromere drive model, which predicts that tandem repeats will compete in female meiosis for inclusion in the egg, the only one of four meiotic products that will be passed on. If an expanded satellite array can attract more CENP-A nucleosomes, it may become favorably oriented to be passed into the egg. The recent expansion of 5- and 10-bp repeats in D. melanogaster suggests that rotational phasing of nucleosomes may be advantageous. Because a complete turn of the DNA double helix comprises about ten basepairs, a 10-bp repeat will always present the same face to the histone octamer, and the AA dinucleotides in each repeat reduce the energy of wrapping and stabilize the nucleosome. Such rotational phasing has been observed in vivo in rice. Rotational phasing of nucleosomes may stabilize centromeres against the pulling forces of the spindle in anaphase and thereby favor their own inclusion in the egg.
Independent of centromere drive, centromeres appear to be selected for their content of non-B form DNA. Non-B form DNA regions can be detected by Permanganate/S1 nuclease sequencing, and have been found to be abundant in both human α-satellite and mouse major and minor satellites from activated B-cells. Cruciform extrusion is a form of non-B DNA that is promoted by short (<10 bp) dyad symmetries, which are widespread at centromeres throughout the eukaryotic domain, including satellite centromeres of primates, mouse, horse, chicken, stickleback, plants, regional centromeres of fission yeast, and point centromeres of budding yeasts.
Although the mechanism whereby non-B DNA forms at centromeres is unknown, we have proposed two hypotheses. One is that the 4-way junctions of cruciforms are bound by Holliday junction binding activity of HJURP and its Scm3 ortholog in yeast, whereupon HJURP would load CENP-A/H4. Alternatively, non-B DNA might result from transcriptional initiation, where melting of DNA is required for engagement of Pol II, and from Pol II elongation, which moves the denaturation bubble forward. These hypotheses are not mutually exclusive, and in both cases the enigmatic CENP-B sequence-specific DNA binding protein may play a role. Centromeres that are not predicted to be enriched for non-B form DNA, such as our own, have DNA-binding proteins like CENP-B or Reb1 that induce sharp bends in DNA, which may serve the same function of initiating non-B form DNA to relieve the stress of accommodating a 60° bend in the DNA. Given the enrichment of non-B form DNA at centromeres throughout the eukaryotic domain, it seems likely that this feature of centromeres can provide a basis for centromere specification despite the lack of primary sequence conservation.