A map of human PRDM9 binding provides evidence for novel behaviors of PRDM9 and other zinc-finger proteins in meiosis

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: November 28, 2017 (This version)
Accepted Manuscript published: October 26, 2017 (Go to version)
Accepted: October 24, 2017
Received: May 5, 2017

1. Related to
ZCWPW1 is recruited to recombination hotspots by PRDM9 and is essential for meiotic double strand break repair

Daniel Wells, Emmanuelle Bitoun ... Simon R Myers

Research Article Sep 16, 2020
Further reading

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Appendix 1
Appendix 2
Data availability
References
Article and author information
Metrics

Abstract

PRDM9 binding localizes almost all meiotic recombination sites in humans and mice. However, most PRDM9-bound loci do not become recombination hotspots. To explore factors that affect binding and subsequent recombination outcomes, we mapped human PRDM9 binding sites in a transfected human cell line and measured PRDM9-induced histone modifications. These data reveal varied DNA-binding modalities of PRDM9. We also find that human PRDM9 frequently binds promoters, despite their low recombination rates, and it can activate expression of a small number of genes including CTCFL and VCX. Furthermore, we identify specific sequence motifs that predict consistent, localized meiotic recombination suppression around a subset of PRDM9 binding sites. These motifs strongly associate with KRAB-ZNF protein binding, TRIM28 recruitment, and specific histone modifications. Finally, we demonstrate that, in addition to binding DNA, PRDM9's zinc fingers also mediate its multimerization, and we show that a pair of highly diverged alleles preferentially form homo-multimers.

https://doi.org/10.7554/eLife.28383.001

eLife digest

Human cells have two copies of each chromosome: one from the mother, and one from the father. When cells divide to form sex cells, such as sperm or egg cells, the maternal and paternal chromosomes line up next to each other and swap some of their DNA. This process, known as genetic recombination, creates different versions of genes and ensures that we are all unique – or genetically diverse.

Recombination is a complex process that is largely controlled by a protein called PRDM9. This protein binds DNA at particular spots on the chromosome and directs other proteins to carry out recombination nearby. However, not all of PRDM9’s binding sites are known, and not all regions that PRDM9 binds to undergo recombination. Until now, it was not understood why this is the case at fine scales.

To investigate this further, Altemose et al. activated the human version of PRDM9 in human kidney cells grown in the laboratory. The results showed that PRDM9 often bound near the start sites of genes, although these regions rarely undergo recombination in humans. When PRDM9 bound near these sites, it sometimes turned the gene on, which suggests that it may also help to regulate the activity of genes.

Moreover, a specific group of DNA-binding proteins, called KRAB-ZNF proteins, appear to suppress recombination wherever they bind, which explains why some PRDM9 binding sites do not recombine. Lastly, Altemose et al. discovered that the part of PRDM9 that binds to DNA can also bind to other copies of PRDM9 proteins. This self-binding ability might play a role in bringing together the maternal and paternal chromosomes at the correct spots during recombination.

Together, these results shed new light on the recombination process, which is a driving force in the formation of new species and essential for fertility. A next step will be to study these results further in tissues of the reproductive organs. This will provide a better understanding of the forces that shape human evolution.

https://doi.org/10.7554/eLife.28383.002

Introduction

In humans and mice, PRDM9 determines the locations of meiotic recombination hotspots (Baudat et al., 2010; Myers et al., 2010; Parvanov et al., 2010). PRDM9 is expressed early in meiotic prophase (Sun et al., 2015), during which its C2H2 Zinc-Finger (ZF) domain binds DNA at particular motifs and its PR/SET domain trimethylates surrounding histone H3 proteins at lysine 4 (H3K4me3; Hayashi et al., 2005) and at lysine 36 (H3K36me3; Wu et al., 2013; Eram et al., 2014; Powers et al., 2016; Davies et al., 2016; Grey et al., 2017; Yamada et al., 2017). At a subset of PRDM9 binding sites, SPO11 is recruited to form Double Strand Breaks (DSBs) (Neale and Keeney, 2006; Smagulova et al., 2011). These DSBs undergo end resection and the resulting single-stranded DNA ends are decorated with the meiosis-specific protein DMC1 (Neale and Keeney, 2006).

In vivo experiments to date have mapped the locations of intermediate events in recombination by performing Chromatin ImmunoPrecipitation with high-throughput sequencing (ChIP-seq) against the H3K4me3 mark and the DMC1 mark in testis tissue from mice and humans (Baker et al., 2014; Smagulova et al., 2011; Brick et al., 2012; Pratto et al., 2014; Davies et al., 2016), or by sequencing DNA fragments that remain attached to SPO11 after DSB formation in mice (Lange et al., 2016). Recent studies have also published direct PRDM9 ChIP-seq results using a custom antibody in mouse testes (Baker et al., 2015a; Walker et al., 2015; Grey et al., 2017). To study the DNA-binding properties of mouse PRDM9, one study sequenced genomic DNA fragments bound in vitro by recombinant proteins containing only the PRDM9 ZF array (Walker et al., 2015). In humans, recombination hotspots identified by DMC1 mapping and by Linkage Disequilibrium (LD) mapping have enabled the discovery of human PRDM9 binding motifs (Myers et al., 2008, 2010; Hinch et al., 2011; Pratto et al., 2014; Davies et al., 2016). However, these published motifs are neither sufficient nor necessary to predict genome-wide PRDM9 binding, DSB formation, or recombination events (Myers et al., 2010; Pratto et al., 2014), and it has been suggested that binding might be influenced by chromatin features in cis (Walker et al., 2015). Moreover, not all PRDM9 binding sites become hotspots (Baker et al., 2014; Grey et al., 2017), and the reasons for this remain unclear. In particular, apart from PRDM9 motifs themselves, there are no specific DNA sequence features that have been shown to modulate recombination rate in cis in mammals.

The H3K4me3 mark has been associated with meiotic recombination initiation in budding yeast (Borde et al., 2009), which lack PRDM9, as well as in PRDM9 knockout mice (Brick et al., 2012). Recent work has suggested that this histone mark is bound by CXXC1, a protein that also binds to PRDM9’s KRAB domain and to the axis-associated protein IHO1 (Imai et al., 2017). Because the H3K4me3 mark is also found at active gene promoters (Santos-Rosa et al., 2002), PRDM9 has been hypothesized to play a role in meiotic gene regulation, in addition to its role in initiating recombination (Hayashi et al., 2005; Mihola et al., 2009). In fact, PRDM9 was shown to activate transcription in a reporter gene assay (Hayashi et al., 2005), and its SET domain has been shown to de-repress a subset of genes when tethered to their promoters (Cano-Rodriguez et al., 2016). However, recent experiments demonstrate full fertility in transgenic mice with completely remodeled PRDM9 binding landscapes (Baker et al., 2014; Davies et al., 2016), suggesting that PRDM9 has no essential role in gene activation. This does not preclude the possibility that PRDM9 may play a secondary gene regulatory role in meiosis. PRDM9 has also been shown to bind to itself and form multimers in transfected cells, while maintaining its ability to bind DNA and trimethylate histones (Baker et al., 2015b). However, it is not known which domains of PRDM9 mediate this multimer formation activity nor whether PRDM9 allelic variation impacts multimerization.

To investigate the properties of PRDM9’s zinc-fingers in humans as they relate to the questions posed above, we expressed several engineered versions of PRDM9 in a mitotic human cell line (HEK293T), then performed various high-throughput sequencing experiments. While this approach cannot reproduce cell-type-specific phenomena found only in spermatocytes and oocytes, it nevertheless enables us to infer some of the fundamental rules governing the behavior of PRDM9 in the nucleus. Indeed, as we describe below, this system replicates many of the key properties of PRDM9 binding in vivo. In these cells, we performed ChIP-seq against human PRDM9, H3K4me3, H3K36me3, and chimp PRDM9, as well as ATAC-seq (Assay for Transposase-Accessible Chromatin with high-throughput sequencing) to examine nucleosome positioning and DNA accessibility, and RNA-seq to examine gene expression. Importantly, by comparing data from transfected and untransfected cells (in which there is weak endogenous PRDM9 expression), we can observe the same genomic sites with and without the effects of PRDM9 overexpression. This approach also allows us to rapidly engineer and test various different alleles and truncations of PRDM9 to explore the properties of its individual domains. Further, our results are complemented by previously published data on LD-based recombination hotspots (Frazer et al., 2007), DSB hotspots decorated by DMC1 (Pratto et al., 2014), H3K4me3 in human testes (Pratto et al., 2014), and histone modifications across human cell types (Kundaje et al., 2015), which we jointly analyze to understand the regulation of recombination outcomes downstream of PRDM9 binding. As described below, our results implicate a widespread role for other zinc-finger genes in suppressing, rather than activating, meiotic recombination in humans.

Results

A map of direct PRDM9 binding in the human genome

We performed ChIP-seq in HEK293T cells transfected with the human PRDM9 reference allele (the ‘B’ allele) containing an N-terminal YFP tag that was targeted for immunoprecipitation. To identify regions bound by PRDM9, we modeled binding enrichment relative to a measure of local background coverage at each position in the genome (detailed in Appendix 1), which accounts for local differences in sequencing coverage, including differences attributable to the known aneuploidy of this cell line (Graham et al., 1977; Bylund et al., 2004; Lin et al., 2014). This yielded 170,198 PRDM9 binding peaks across the genome (p<10⁻⁶), demonstrating that PRDM9 can bind with some affinity to many sites outside of recombination hotspots, which number in the tens of thousands (Myers et al., 2005; Pratto et al., 2014). This large number of peaks likely results from the high expression level of PRDM9 in this system, providing sensitivity to detect even weak binding interactions, although it may be attributable in part to the chromatin organization of this cell type.

We compared our ChIP-seq data with a set of 18,343 published in vivo human DSB hotspot peaks from DMC1 ChIP-seq experiments in testis samples (Pratto et al., 2014). We found evidence for binding at 74% of DSB hotspots (at p<10⁻³) after correcting for chance overlaps (see Materials and methods). The proportion bound in our system is greater (up to 82%) at DSB hotspots >15 Mb from telomeres, which show elevated recombination rates in human males (Dib et al., 1996; Pratto et al., 2014; Figure 1—figure supplement 1a). Overlap probabilities increase with both PRDM9 binding strength and DMC1 heat (Figure 1b; Figure 1—figure supplement 1b). Furthermore, at PRDM9 binding sites, we observed peaks in LD-based recombination rates (HapMap CEU map, Frazer et al., 2007), which increase with PRDM9 binding strength (Figure 1c–d), as does DMC1 enrichment (Figure 1—figure supplement 2c). Therefore, despite cell-type differences between our HEK293T expression system and the chromatin environment of early meiotic cells, our binding peaks capture the majority of biologically relevant recombination hotspots and reveal many additional non-hotspot sites bound by PRDM9 in HEK293T cells.

Figure 1 with 4 supplements see all

Download asset Open asset

Comparison of seven distinct motifs bound by human PRDM9 (B allele).

(a) Seven motif logos produced by our algorithm (applied to the top 5,000 PRDM9 binding peaks ranked by enrichment, after filtering out repeat-masked sequences) were aligned to each other and to an in silico binding prediction (Myers et al., 2010; Persikov et al., 2009; Persikov and Singh, 2014, maximizing alignment of the most information-rich bases. The position of the published hotspot 13-mer is indicated by the gray box overlapping the in silico motif (Myers et al., 2008). On the right is the percentage of the top 1,000 peaks (ranked by enrichment without further filtering) containing each motif type. Zinc-finger residues at 3 DNA-contacting positions (labeled −1, 3, 6) are illustrated below each ZF position, classified by polarity, charge, and presence of aromatic side chains. ZFs 5 and 6 lack positively charged amino acids and contain aromatic tryptophan residues, and they coincide with a variably spaced motif region (indicated by vertical dotted lines). Motif 4 is truncated here. (b) H3K4me3 ChIP-seq data from PRDM9-transfected HEK293T cells (this study) and H3K4me3/DMC1 data from testes (Pratto et al., 2014) were force-called to provide a p-value for enrichment of each sample in a 1 kb window centered on each PRDM9 peak (filtered to remove coverage outliers and those overlapping H3K4me3 peaks in untransfected cells). PRDM9 enrichment values are unitless (equal to the estimated signal divided by background, minus 1 and set to 0 if negative, at the base with the smallest p-value within each peak). Peaks were split into deciles according to their PRDM9 enrichment values, and the proportion of peaks with a force-called H3K4me3 or DMC1 p-value <0.05 is plotted within each decile. (c) Peaks were stratified into quartiles based on increasing PRDM9 enrichment (light green to dark green) after filtering out promoters. Mean recombination rates (from the HapMap LD-based recombination map, Frazer et al., 2007) at each base in the 20 kb region centered on each bound motif are plotted for each quartile, with smoothing (ksmooth, bandwidth 25). (d) *Left plot*: Peak enrichment quartiles (filtered to remove promoters as in c) were separated by motif type (Motifs 2, 3, and 5 were combined due to low abundance), and the mean HapMap CEU recombination rate overlapping peak centers was plotted against median PRDM9 enrichment in each quartile, with lines of best fit added for Motif 7 (pink) versus all other motifs. *Right plot*: Fold enrichment of each motif in AB-only DMC1 peaks versus AA-only DMC1 peaks (Pratto et al., 2014). Error bars indicate two standard errors of the mean (left plot) or 95% bootstrap confidence intervals (right plot).

https://doi.org/10.7554/eLife.28383.003

Figure 1—source data 1 List of all ChIP-seq samples.: https://doi.org/10.7554/eLife.28383.008
Download elife-28383-fig1-data1-v2.xlsx
Figure 1—source data 2 PWMs for all motifs, in MEME format.: https://doi.org/10.7554/eLife.28383.009
Download elife-28383-fig1-data2-v2.txt

PRDM9 can bind multiple motifs with different internal spacings

Next, we leveraged the large number and high resolution of our ChIP-seq peaks to search for sequence motifs at PRDM9 binding sites using a Bayesian de novo motif-finding algorithm (described in Davies et al., 2016 and in Materials and methods). Rather than yielding a single motif described by a position weight matrix (PWM), this algorithm allows binding sites to be described by a mixture of multiple motifs enriched in peak centers. The algorithm identified seven non-degenerate motifs, representing distinct PRDM9 binding modes. These explain 75% of the strongest 1000 binding peaks, falling to 53% of all peaks (Figure 1a). The remaining peaks contain mostly degenerate, GC-rich sequences (Figure 1—figure supplement 3), similar to DMC1 hotspots in transgenic mice containing this same human PRDM9 allele (Davies et al., 2016) and interpretable as binding to clusters of individually weaker motif matches in mostly GC-rich regions.

While each of the seven motifs has a close internal match to the published 13-mer found in human recombination hotspots (Myers et al., 2008), allowing for multiple binding modalities revealed that the zinc fingers predicted to bind upstream of this 13-mer (ZFs 1–6) can show comparably high sequence specificity (Figure 1a). We aligned our seven motifs to each other and to an in-silico motif prediction (based on the zinc-finger domain’s amino acid sequence alone; Myers et al., 2010; Persikov et al., 2009; Persikov and Singh, 2014), revealing differences across motifs driven mainly by variable internal spacings (Figure 1a) alongside smaller differences in base-pair preferences (e.g. Motif 5). The region corresponding to ZF5 and ZF6 is predicted to span 6 bp, but in Motifs 4–7 this region spans only 2 bp, and in Motif 1 it spans only 5 bp. Interestingly, we only observed these three particular spacings, and the expected 6 bp binding footprint is observed only for Motifs 2 and 3, which explain a relatively small proportion of peaks (6%). This alternative spacing cannot be captured in a single motif, possibly explaining why ZFs 1–6 have shown weak sequence specificity in previously published hotspot motifs (Myers et al., 2008, 2010; Hinch et al., 2011; Pratto et al., 2014).

Alternative spacing within motifs could explain how long zinc-finger arrays like PRDM9’s are able to consecutively bind DNA despite theoretical physical constraints (Persikov and Singh, 2011), similar to multivalent CTCF binding (Nakahashi et al., 2013). Our results are also consistent with recent findings that truncated mouse PRDM9 alleles can stably bind discontinuous submotifs, though at reduced specificities, with subsets of zinc fingers (Striedner et al., 2017). ZF5 and ZF6, which overlap the variably spaced region, have large, aromatic tryptophan residues at the DNA-contacting ‘−1’ position (Figure 1a). They also lack the positively charged DNA-contacting residues found in the most sequence-specific zinc fingers in the array (consistent with an electrostatic attraction to the negatively charged DNA). We speculate that these bulky, uncharged middle zinc fingers might fail to bind DNA strongly and may act more like a linker between the more strongly binding zinc fingers found upstream and downstream.

Interestingly, we observed a lower mean LD-based recombination rate (Frazer et al., 2007) around Motif 7 peaks, not explained by differences in PRDM9 binding enrichment, promoter overlap, repeat overlap, or H3K4me3 enrichment (Figure 1d, Figure 1—figure supplement 4). We hypothesized that Motif 7 might be favorably bound by the B allele and thus underrepresented in LD-based recombination maps, which are dominated by historical recombination events initiated by the more common A allele of PRDM9, which differs at a single DNA-contacting amino acid in ZF5 (Baudat et al., 2010). To test this hypothesis, we searched for our seven motifs in DSB hotspots unique to an individual with an A/B PRDM9 genotype, then compared these to DSB hotspots found in homozygous A/A individuals (Pratto et al., 2014). We found that Motif 7 is two-fold enriched in A/B-only hotspots relative to A/A hotspots, while all other motifs are found in more similar proportions between the two sets (Figure 1d). Motif 7 also resembles, but extends, a motif previously identified in A/B-only hotspots (Pratto et al., 2014). We conclude that the B allele must bind Motif 7 with greater affinity than does the A allele, demonstrating distinguishable binding preferences between these highly similar PRDM9 alleles.

PRDM9 deposits H3K4me3 essentially everywhere it binds

We investigated the histone methylation activity of PRDM9 by performing ChIP-seq against the H3K4me3 mark in transfected and untransfected cells. After subtracting sites overlapping ‘pre-existing’ H3K4me3 peaks (those present in untransfected cells), we found that 95% of PRDM9 binding peaks show H3K4me3 following transfection (p<0.01), and this proportion increases to 100% with increasing PRDM9 binding enrichment (see Figure 1b). That is, PRDM9 makes the H3K4me3 mark essentially everywhere it binds, regardless of the pre-existing chromatin substrate, with H3K4me3 signal strength increasing with PRDM9 binding strength ( $r$ =0.48, Figure 1—figure supplement 1c, Figure 1—figure supplement 2). As observed in mice (Davies et al., 2016; Powers et al., 2016; Grey et al., 2017), we also observe localized H3K36me3 deposition at bound sites (see Figure 1—figure supplement 1d).

Apart from depositing H3K4me3/H3K36me3 locally around its binding sites, PRDM9 has been shown to phase surrounding nucleosomes in vivo in mice (Baker et al., 2014). To investigate this behavior in transfected HEK293T cells, we performed ATAC-seq and found that full-length PRDM9 appears to phase surrounding nucleosomes even in this completely different cell type and expression system (see Figure 2—figure supplement 3a). However, when we transfected a truncated version of PRDM9 including only the zinc-finger domain, we saw no evidence of nucleosome phasing around PRDM9 binding sites (see Figure 2—figure supplement 3b). Instead, its ATAC-seq coverage pattern appears similar to that of unstransfected cells or of cells transfected with a truncated version of PRDM9 excluding the zinc-finger domain (Figure 2—figure supplement 3c,d). We confirmed that this ‘ZF only’ truncated protein localizes to the nucleus (see Figure 2—figure supplement 4), and previous studies have shown that PRDM9’s ZF array is sufficient to bind DNA (Walker et al., 2015; Striedner et al., 2017). This suggests that PRDM9’s nucleosome phasing behavior stems not only from the binding of its ZF array to DNA, but may involve steric effects of the non-ZF region or require histone methylation.

Human PRDM9 frequently binds promoters

A study in mice has shown that, in the absence of PRDM9, DSBs localize to active promoters marked with H3K4me3, suggesting that PRDM9 may serve to provide alternative H3K4me3 sites to compete with and direct recombination away from promoters (Brick et al., 2012). However, our ChIP-seq data revealed that, surprisingly, of the 12,982 protein-coding genes with H3K4me3 surrounding their Transcription Start Site (TSS) in our untransfected cells (p<10⁻⁵), 81% have a PRDM9 binding peak center within 500 bp of the TSS, compared to only 6% expected by chance overlap (yielding a corrected overlap fraction of 79%). At promoters with little or no prior H3K4me3, the proportion bound by PRDM9 decreases to 15% (corrected for chance overlaps, Figure 2a), though this difference could potentially be explained by increasing power to detect weak binding events at more active genes. If we concentrate only on the strongest quartile of PRDM9 binding enrichment at promoters, we see that roughly 10% of promoters are strongly bound, regardless of H3K4me3 enrichment (Figure 2a).

Figure 2 with 4 supplements see all

Download asset Open asset

Human PRDM9 can bind promoters, though recombination is suppressed.

(a) The chance-corrected proportion of protein-coding genes that have a PRDM9 peak center occurring within 500 bp of the TSS, stratified by different PRDM9 enrichment value thresholds (shades of green, with thresholds listed), in each quartile of force-called H3K4me3 enrichment surrounding the TSS in untransfected cells. The power to detect weaker binding events increases at more active promoters (as measured by H3K4me3), though strong PRDM9 binding events appear at roughly 10% of all promoters regardless of activity. (b) Barplot illustrating the proportion of promoter or non-promoter PRDM9 peaks assigned to each of the 7 motifs (or no motif, in gray). Motif 7 appears 2-fold enriched in promoter peaks. (c) Mean HapMap CEU recombination rates are reported for promoter (pink squares) and non-promoter (gray circles) human PRDM9 peaks split into quartiles of PRDM9 enrichment (filtered not to overlap repeats or occur within 15 Mb of a telomere; error bars represent two standard errors of the mean). Both median enrichment values and recombination rates are greater for non-promoter peaks, even in overlapping ranges of PRDM9 enrichment. (d) Mean recombination rate in 20 kb windows centered on bound motifs, for promoter (pink) and non-promoter (gray) peaks further filtered only to include peaks with PRDM9 enrichment values between 1 and 2 (smoothing: ksmooth bandwidth 200). (e) Mean H3K36me3 enrichment in transfected cells divided by mean H3K36me3 enrichment in untransfected cells at 36,000 non-promoter PRDM9 binding sites split into quartiles of PRDM9 enrichment (shades of purple). (f) same as e but for 10,000 promoter PRDM9 binding sites split into quartiles of PRDM9 enrichment. (g) The absolute mean enrichment values used to generate plots e and f, split into transfected (solid) and untransfected (dotted) samples at promoter (pink) and non-promoter (gray) PRDM9 binding sites in the top quartile of PRDM9 enrichment. There is a depletion of H3K36me3 coverage surrounding promoters in untransfected cells, but the magnitude of this depletion decreases in transfected cells. (h) At 4,000 protein-coding genes with a strong PRDM9 binding peak within 500 bp of the TSS (PRDM9 enrichment >2 and <10), we show the relationship between force-called H3K4me3 enrichment and force-called H3K36me3 enrichment in the 1 kb surrounding each TSS, for both transfected and untransfected cells (solid and dotted lines). Error bars indicate two standard errors of the mean H3K36me3 enrichment within each quintile of H3K4me3 enrichment. H3K36me3 enrichment increases in transfected cells at all strongly bound promoters, but this effect diminishes almost to 0 as promoter activity increases (which forces H3K36me3 close to 0 in all cells). This effect cannot be accounted for by the modest decrease in PRDM9 enrichment at more active promoters (mean PRDM9 enrichment decreases from 4.3 in the first H3K4me3 quintile to 3.1 in the fifth quintile).

https://doi.org/10.7554/eLife.28383.010

Previous datasets in humans have been unable to detect this affinity for promoters because they relied on H3K4me3, DMC1, or LD mapping as proxies for inferring PRDM9 binding sites (Baker et al., 2015b; Pratto et al., 2014; Myers et al., 2010). Since active promoters contain PRDM9-independent H3K4me3 peaks, they are filtered out from H3K4me3 analyses, and since DSBs are suppressed at promoters (at least in the presence of PRDM9, as shown by Brick et al., 2012), promoters are underrepresented in DMC1 and LD-based recombination hotspots. One recent study mapped binding of the human PRDM9 B allele in HEK293T cells by ChIP-exo, yielding a conservative set of 839 peaks after stringent filtering (Imbeault et al., 2017). Of these 839 peaks, 87% overlap our 170,198 peaks, and they are similarly enriched in promoters (18% occur within 500 bp of a TSS, versus 6% when shifted 5 kb, compared to 15% and 7% with our peaks, respectively).

To exclude the possibility that PRDM9 binding peaks observed at promoters were false positives (Jain et al., 2015), we performed two ChIP-seq replicates on cells transfected with a PRDM9 construct in which we replaced the human ZF domain with the ZF domain from the chimpanzee w11a allele, which is not predicted to bind the GC-rich DNA commonly found at promoters (Auton et al., 2012; Schwartz et al., 2014). We found that the chimp allele binds a T-rich motif (Figure 2—figure supplement 1c), and only 5% of chimp PRDM9 peaks occur within 500 bp of a human peak center, below the 8% expected by chance (Figure 2—figure supplement 1b). In contrast to results for human PRDM9, only 3% of promoters fall within 500 bp of a chimp PRDM9 peak, versus 9% expected by chance overlap, confirming that the promoter peaks we observe for the human allele are unlikely to be ChIP-seq artifacts.

Furthermore, motif identification at human PRDM9’s promoter binding sites identified the expected binding motifs at similar frequencies to non-promoter peaks, except for a twofold enrichment of Motif 7 (Figure 2b). Interestingly, Motif 7 is also the B-allele-enriched motif, so PRDM9’s promoter affinity might also differ between common human alleles. We suggest that these GC-rich motifs, together with accessible chromatin, enable human PRDM9 to consistently bind to promoter regions in HEK293T cells (Figure 2a). Notably, however, PRDM9 peaks in promoters tend to have lower mean enrichment estimates across a range of motif FIMO scores (Figure 2—figure supplement 2d). It is also worth noting that in vivo mapping of PRDM9 binding will be required to confirm that promoter binding occurs in meiotic cells, although it is difficult to understand how this sequence-dependent binding could be cell-type-specific across all promoters.

PRDM9-induced H3K36me3 is depleted at promoters

Although there is widespread binding of human PRDM9 to promoters in HEK293T cells, we observe little to no elevation in local recombination rate or testis DMC1 enrichment at these binding sites (Figure 2c,d, Figure 2—figure supplement 2e,f). In the absence of PRDM9, DSBs localize to promoters in mice (Brick et al., 2012), but in light of our results, it remains difficult to explain how recombination might be suppressed at promoters despite direct PRDM9 binding. A second mark, H3K36me3, is also deposited by PRDM9 at many of its binding sites in vivo (Powers et al., 2016), and it shows a similar pattern to H3K4me3 around DSB sites in mice (Yamada et al., 2017).

At both non-promoter and promoter PRDM9 peaks, we observed a similar enrichment of H3K36me3 in transfected relative to untransfected cells (Figure 2e,f), confirming that PRDM9 indeed binds these sites. However, a very strong depletion of H3K36me3 around promoters in untransfected cells means that absolute levels of H3K36me3 remain low in promoters, relative to non-promoter binding sites (Figure 2g). Interestingly, the amount of H3K36me3 deposited by PRDM9 at promoters negatively correlates with the amount of H3K4me3 enrichment at those promoters in untransfected cells, and this cannot be explained by differential PRDM9 binding (Figure 2h). This suggests that at highly active promoters, PRDM9 is less able to deposit H3K36me3, or this mark is actively removed. This difference between promoter and non-promoter binding sites could in principle explain the lack of recombination at promoters, if the simultaneous presence of both H3K36me3 and H3K4me3 influences recombination initiation, as has been suggested by Powers et al. (2016) and shown to be consistent with DSB data by Yamada et al. (2017). In humanized mice, in vivo DSB hotspot sites favor motif positions with lower PRDM9-independent H3K4me3 levels than genomic background (Davies et al., 2016), and this seems highly concordant with our human results.

PRDM9 can activate transcription of some genes, including VCX and CTCFL

We have shown that human PRDM9 binds promoters and deposits the H3K4me3 mark wherever it binds in HEK293T cells, which raises the possibility that PRDM9 may affect gene expression, given that H3K4me3 is highly enriched at active promoters (Santos-Rosa et al., 2002). Tethering PRDM9’s SET domain to other promoter-binding proteins has been shown to de-repress gene expression in a context-dependent manner (Cano-Rodriguez et al., 2016), leading us to hypothesize that full-length human PRDM9 might also be able to activate gene expression. We therefore performed RNA-seq in cells transfected with human PRDM9, along with control samples that were either untransfected, transfected with the chimp allele, or transfected with a construct containing only the human zinc-finger domain (and incapable of H3K4me3 deposition; referred to as ‘ZF only’; all constructs illustrated in Figure 5a).

Seven transcripts showed overwhelming evidence of being differentially expressed in cells transfected with the human allele versus all other samples, with all seven being upregulated by PRDM9 presence. Five overlap known genes: MEG3, ONECUT3, LGALS1, VCX, and CTCFL. Interestingly, the latter two genes are normally expressed only in spermatogenesis (Lahn and Page, 2000; Sleutels et al., 2012). We validated expression induction at these two genes using qPCR (Figure 3).

Figure 3 with 1 supplement see all

Download asset Open asset

Spermatogenesis-specific genes *VCX* and *CTCFL* are activated by human PRDM9 in HEK293T cells.

(a) left: Bar plots showing the log₂ fold change relative to untransfected HEK293T cells in computed FPKM values (fragments per kilobase of transcript per million mapped RNA-seq reads) for HEK293T cells transfected with the human allele, the chimp allele, or a construct containing only the human Zinc-Finger domain, for *CTCFL* and *VCX*, with *CTCF* as a negative control. Error bars conservatively represent maximum ranges of the ratios given confidence intervals for FPKM values computed by cufflinks (Trapnell et al., 2012). Asterisks indicate significant differential gene expression, as reported by CuffDiff (p<0.0001). right: qPCR validation results for the same genes from 3 independent biological replicates. Y-axis values are log $_{2}$ ratios of $Δ Δ$ C $_{t}$ values for each gene relative to the untransfected sample (normalized to the *TBP* housekeeping gene; see Materials and methods). Error bars represent 95% confidence intervals from 3 biological replicates (t distribution; gray points represent individual replicate values), and asterisks indicate p<0.001 (one-tailed t test). (b) A browser screenshot (Zhou et al., 2011) from Chr20 near the promoter region of *CTCFL* with custom tracks indicating ChIP-seq and RNA-seq raw read coverage data. Human PRDM9, but not chimp PRDM9, (green) binds a G-rich repeat near the TSS, adding an H3K4me3 mark (light red) where none is present in untransfected cells. RNA-seq coverage (blue) spikes in the coding regions in transfected cells, while it is nearly flat in untransfected cells or chimp-transfected cells. Testis H3K4me3 coverage (dark red, from Pratto et al., 2014) peaks at a slightly different locus, corresponding to an alternative TSS.

https://doi.org/10.7554/eLife.28383.015

Figure 3—source data 1 qPCR primers, Ct values, and calculations.: https://doi.org/10.7554/eLife.28383.017
Download elife-28383-fig3-data1-v2.xlsx
Figure 3—source data 2 PRDM9-bound genes with differential expression.: https://doi.org/10.7554/eLife.28383.018
Download elife-28383-fig3-data2-v2.xlsx

CTCFL is a variant of chromatin regulator CTCF, and in mice it has been shown to be expressed exclusively in pre-leptotene spermatocytes (Sleutels et al., 2012). Male knockout mice show greatly reduced fertility due to meiotic arrest (Sleutels et al., 2012), and variants at CTCFL influence genome-wide recombination rates in human males (Kong et al., 2014). CTCFL may be involved in organizing the meiotic chromatin landscape and regulating the transcription of meiotic genes (Sleutels et al., 2012). We found that CTCFL RNA levels increase 28-fold after transfection with the human allele, from a nearly undetectable baseline transcription level (Figure 3; we note this may underestimate the true relative expression level given that transfection efficiency is not 100%). PRDM9 binds strongly to a GC-rich repeat near the CTCFL TSS and deposits H3K4me3, which is absent in untransfected cells (Figure 3). The chimp PRDM9 allele, in contrast, does not bind near the TSS and does not show elevated transcript levels after transfection (Figure 3).

VCX encodes a small, highly charged protein of unknown function and has been previously studied for its involvement in PRDM9-related non-homologous recombination events and X-linked ichthyosis (Myers et al., 2008; Van Esch et al., 2005). We found that PRDM9 does not in fact bind near the annotated VCX TSS, but instead in the middle of the gene and very strongly at a minisatellite array of PRDM9 binding motifs (Myers et al., 2008) near the terminus of the gene (Figure 3—figure supplement 1). PRDM9 adds the H3K4me3 mark throughout the gene’s coding regions in a pattern similar to that seen in testes (Figure 3—figure supplement 1). RNA-seq coverage suggests normal splicing, but use of an alternative promoter that excludes the first, untranslated exon (Figure 3—figure supplement 1).

We note that this result does not establish whether human PRDM9 is necessary or sufficient for CTCFL and VCX expression in vivo, but still PRDM9 is demonstrably able to trigger the transcription of these genes in a way that depends on the binding of its zinc fingers. Previous work has shown that Prdm9 expression begins in pre-leptotene cells in mice (Sun et al., 2015), concurrent with Ctcfl expression (Sleutels et al., 2012) and thus supports the possibility that PRDM9 may promote CTCFL transcription in vivo. The failure of the chimp allele to bind to or activate the expression of human CTCFL further suggests that this behavior may not be essential across organisms, although the chimp allele might in principle still bind the CTCFL promoter in the chimp genome. Similarly, there is no evidence that human PRDM9 alleles with very different binding preferences, such as the C allele, would bind the same promoter. Also notably, the motif bound at the CTCFL promoter is Motif 7, so the A and B alleles may bind this locus with different affinities.

43 additional genes showed weaker evidence of being activated by human PRDM9 binding near their annotated transcription start sites, with 41 showing increases, as opposed to decreases, in expression (Figure 3—source data 2). We lack power to detect small changes in gene expression, especially decreases in expression (Trapnell et al., 2012). Nonetheless, it is likely that effects of similar magnitude to CTCFL and VCX are quite rare. Our data do make it clear that PRDM9 binding and histone trimethylation near a promoter can trigger or enhance gene expression in some cases. Furthermore, this effect on gene expression is not likely to result from PRDM9 binding alone but from its trimethylation activity, given that transfection with the zinc fingers alone does not trigger expression. Further work will need to establish if promoter-binding PRDM9 alleles are able to regulate gene expression in vivo, whether as an accidental side effect of binding or specifically functional, though this work may remain challenging in humans.

Analysis of THE1B repeats reveals non-PRDM9 motifs influencing recombination

Although our seven motifs (Figure 1a) improve our understanding of PRDM9 binding, even the top-scoring 0.1% of motif matches genome-wide have only a 50% chance of overlapping an actual PRDM9 binding peak (see Figure 2—figure supplement 2a). Moreover, at best we only observe a 55% correlation between H3K4me3 and DMC1 enrichment values from testis data surrounding our PRDM9 binding sites (Figure 1—figure supplement 2f). Therefore, other influences such as wider sequence and chromatin contexts must impact both binding and downstream recombination outcomes. The only specific known mammalian sequence feature so far identified as influencing either PRDM9 binding, or downstream recombination events, is the PRDM9 binding motif itself. Thus, it is uncertain which factors prevent or promote hotspot occurrence, whether these act in cis or trans, and what these might be. A powerful approach to identify factors that might influence PRDM9 binding and subsequent hotspot formation is to search for sequence motifs predicting these outcomes. Identified motifs are likely to have a causal influence, so they can help address whether particular histone modifications associated with those motifs have a genuinely causal role themselves.

We hypothesized that sequence motifs unrelated to PRDM9 binding might have strong local effects on recombination outcomes, but these motifs might evade detection if they operate only at a minority of recombination hotspots. To attempt to overcome this and control for the effects of local genetic context, we focused on hotspots centering within one family of retrotransposon elements, called THE1B repeats, which are the most strongly hotspot-enriched among all human repeats (Myers et al., 2008). PRDM9 binds directly to a subset of THE1B repeat copies containing matches to its target motif (Figure 4a), in a known region of the repeat (Myers et al., 2008, see Appendix 2), and THE1B-centered hotspots contribute a substantial fraction of all human A- and B-allele controlled recombination (4.6% measured by DMC1 mapping; Pratto et al., 2014). We analyzed over 20,000 THE1B repeats throughout the human genome, which share highly similar sequences perturbed by random mutations. These mutations allowed us to precisely dissect the impact of particular sequence motifs on PRDM9 binding, and on downstream DSB formation (as measured by DMC1 mapping, from Pratto et al., 2014) and crossover activity (as measured by LD mapping, from Frazer et al., 2007). We used conditional association testing to identify collections of motifs that independently correlate with PRDM9 binding or recombination (see Appendix 2).

Figure 4 with 2 supplements see all

Download asset Open asset

Influences on recombination in cis downstream of PRDM9 binding.

(a) Analysis of THE1B repeats shows the positions along the THE1B consensus (bottom, gray) of motifs influencing PRDM9 binding (top row), motifs influencing recombination hotspot occurrence at bound sites (middle two rows), and motifs influencing H3K4me3/H3K9me3 in testes and somatic cells (bottom row). Rectangle widths show motif size, and heights show log-odds-ratio or effect size (two standard errors delineated). Rectangles below the lines have negative effects. Motifs associated with PRDM9 binding are in red; others in blue. Binding motifs for labeled proteins are at the plot base. (b) Left plot shows LD-based recombination rates around the centers of THE1B repeats containing different approximate matches to the PRDM9 binding motif CCTCCC[CT]AGCCA[CT] (colors) and the motif ATCCATG (lines dotted if present). Right plot is the same but shows mean H3K4me3 in testes (from Pratto et al., 2014). ATCCATG presence reduces recombination and increases H3K4me3. (c) Impact of ATCCATG presence (+) or absence (-) on normalized enrichment values around the centers of THE1B repeats, of H3K4me3 and H3K9me3 in different cells (labeled pairs of color bars, normalized to equal 1 at edges). H3K9me3 shows the strongest signal increase. (d) Predicted non-PRDM9 H3K9me3/H3K4me3 versus probability DMC1-based or LD-based hotspots occur at PRDM9-bound sites. For the x-axis repeats were binned according to an additive DNA-based score, using the bottom row of part A and the combination of motifs they contained. (e) Estimated impact on whether a hotspot occurs of co-binding by individual KRAB-ZNF proteins (labels; Imbeault et al., 2017) near a PRDM9 binding peak (genome-wide, not only within THE1B repeats, after filtering out promoter regions). For each KRAB-ZNF protein, a GLM was used to estimate the impact of KRAB-ZNF binding (binary regressor) on hotspot probability. We show the estimated log $_{2}$ -odds, with 95% CIs. Colors indicate H3K9me3 enrichment increase at co-bound sites. Horizontal line shows the results for TRIM28. Features below the horizontal dotted line have a negative estimated impact on downstream recombination.

https://doi.org/10.7554/eLife.28383.019

Figure 4—source data 1 Detailed information on all THE1B motifs.: https://doi.org/10.7554/eLife.28383.022
Download elife-28383-fig4-data1-v2.xlsx

Seventeen distinct motifs (Figure 4a) were found to influence PRDM9 binding to THE1B copies in HEK293T cells (Figure 4—source data 1). All map within the predicted PRDM9 binding region and span the entire region, confirming that all of PRDM9’s zinc fingers are involved in binding. Motifs promoting PRDM9 binding associated with higher H3K4me3 enrichment in testes (data from Pratto et al., 2014) and with increasing LD/DMC1 hotspot probability, so the same motifs must operate in vivo (Figure 4a; detailed in Appendix 2). Importantly for the results described below, binding of PRDM9 does not associate strongly with any sequence motifs outside the directly bound region, so it might act as a local ‘pioneer’ protein at least on this background, despite results in mice (Grey et al., 2017).

We then independently tested for the presence of motifs influencing recombination hotspot formation conditional on presence of a PRDM9 binding site in HEK293T cells. We identified an initial seven such motifs (Figure 4a; detailed in Appendix 2; Figure 4—source data 1). Only three of these map within the PRDM9 binding region and correspond to stronger/weaker PRDM9 enrichment. The remaining four motifs show no association whatsoever with PRDM9 binding in HEK293T cells, and map well outside the PRDM9 binding motif (Figure 4a). We refer to these as ‘non-PRDM9 recombination-influencing motifs’. The strongest signal is for the motif ATCCATG (joint p=2.8×10^-9 for LD-hotspots, OR = 0.32), whose presence within a THE1B repeat produces a 2.5-fold reduction in the surrounding recombination rate at PRDM9-bound THE1B repeats (Figure 4b). ATCCATG presence also reduces the local recombination rate around THE1B repeats not bound by PRDM9, implying a more general, PRDM9-independent mechanism of recombination suppression (Figure 4b). Notably, this suppression extends beyond the boundaries of the THE1B repeat itself.

We observed strong testis H3K4me3 enrichment at THE1B repeats containing PRDM9 binding motifs regardless of whether ‘ATCCATG’ was present, and after conditioning on the strength of the PRDM9 motif match (Figure 4b). Therefore, this motif must suppress recombination downstream of PRDM9 binding in vivo. In fact, presence of the modifier motif ATCCATG actually modestly increased the testis H3K4me3 signal, even at THE1B copies not containing a PRDM9 motif and not bound by PRDM9 in HEK293T cells (Figure 4b), which we return to below. Similar results were observed for the other three non-PRDM9 recombination-influencing motifs.

Recombination-influencing motifs associate with H3K9me3 and H3K4me3 across many cell types

We hypothesized that the recombination-influencing motifs described above might be bound by chromatin-modifying proteins. To examine this possibility, we independently searched for motifs that could predict chromatin states within THE1B elements. Specifically, we searched de novo for motifs associated with 15 previously identified chromatin states, and individual histone modifications, across each of 125 somatic cell types (Kundaje et al., 2015). Strikingly, we observed that the motif ATCCATG (independently identified above as the strongest non-PRDM9 recombination-influencing motif) is also the strongest single predictor of the ‘heterochromatin’ state, marked by enriched H3K9me3. THE1B repeats containing ATCCATG are heterochromatin-enriched in over half of cell types, especially in embryonic stem cells, and exhibit a strong localized increase in H3K9me3 (Figure 4c). More surprisingly, we also observed a weak, but significant, localized increase in H3K4me3 signal (p=7.5×10⁻¹³; Figure 4c). We also saw the same weak H3K4me3 peak in testes, after restricting analysis to THE1B repeats not bound by PRDM9 (Figure 4b,c), indicating this modification operates fully independently of PRDM9. This weak increase might reflect genuine partial co-occurrence of H3K9me3 and H3K4me3 at the same locus (but possibly on different alleles, or in different cells), or in theory it could be explained by non-specificity of experimental antibodies for these two histone modifications.

We reasoned that we might more generally exploit the subtle H3K4me3 signal elevation (whatever its underlying cause) as a potential marker also of H3K9me3 elevation in germline tissues by examining H3K4me3 in testes (Pratto et al., 2014). We performed de novo motif finding to identify PRDM9-independent 7-mers associated with testis H3K4me3 in THE1B repeats definitively not bound by PRDM9 (detailed in Appendix 2). This identified eighteen motifs significantly associated with non-PRDM9 H3K4me3 (after Bonferroni correction, Figure 4a). The motif ATCCATG remained the most strongly associated (p<10⁻²⁵), with eight other motifs clustered around it (Figure 4a). Confirming that these motifs also predict H3K9me3 levels, we observed almost perfect positive correlation (r = 0.93) between H3K4me3 signal strength in testes and H3K9me3 (as well as H3K4me3) in particular ROADMAP ESC lines (Figure 4—figure supplement 1c). Therefore, these 18 motifs predict both H3K9me3 and H3K4me3, broadly observable across somatic cells and (at least for the latter mark) testes also, and so we refer to this set as ‘non-PRDM9 H3K9me3/H3K4me3 motifs.’

In addition to the top-scoring motif, ATCCATG, many or all of the remaining 17 non-PRDM9 H3K9me3/H3K4me3 motifs evidently impact meiotic recombination (Figure 4—source data 1; p<0.00036 for effect size correlation). All four of the non-PRDM9 recombination-influencing motifs we found overlap at least one of these 18 independently derived non-PRDM9 H3K9me3/H3K4me3 motifs (Figure 4a; note that power differences account for the smaller size of the former motif set). Summing these 18 motif influences to produce a score for each THE1B repeat using only its DNA sequence, we see more than a threefold difference in the probability of observing a recombination hotspot across PRDM9-bound THE1B copies between the top and bottom 10% quantiles of the score (Figure 4d). Given that we are only able to examine the region within each 1–2 kb recombination hotspot corresponding to the 354 bases of the THE1B element, this likely underestimates the true impact of local sequence on whether hotspots occur or not.

Notably, our testing for association with other histone-defined chromatin states (e.g. states enriched for H3K27me3) in ROADMAP-studied cell types identified many more sequence motifs. These included the known binding targets of two proteins, DUX4 and ZBTB33, that were previously shown to bind to THE1B elements, with DUX4 showing strong expression in testes (Young et al., 2013; Wang et al., 2012). However, only those motifs associated with heterochromatin and H3K9me3/H3K4me3 overlapped our non-PRDM9 recombination-influencing motifs. Thus, only a particular subset of chromatin modifications correspond to suppressed recombination, in THE1B repeats at least.

Overall, this analysis of thousands of human hotspots reveals that in cis, it is not simply PRDM9 binding that influences whether hotspots occur. Multiple sequence motifs exist that do not prevent PRDM9 binding, but instead modify the average amount of recombination that occurs downstream of binding, over two-fold for a single motif (ATCCATG). Given this diversity even within THE1B-centered hotspots, completely different motifs might operate to modulate recombination activity in other hotspots, either centered in different repeats or in non-repeat DNA. In contrast to this complexity, examination of histone modifications reveals a common signature across recombination-influencing motifs, with strong alterations in the specific histone mark H3K9me3 and weaker signals for H3K4me3. This suggests that the mechanism of action across motifs might share fundamental similarities. Both H3K4me3 and H3K9me3 marks correlate negatively with recombination across all human hotspots (Figure 4d; Figure 4—figure supplement 1b), and reduced levels of non-PRDM9 H3K4me3 within hotspots has been observed in mice (Brick et al., 2012; Davies et al., 2016).

KRAB-ZNF binding and TRIM28 recruitment predict low recombination near PRDM9 binding sites

The large class of human KRAB-ZNF genes represent an obvious set of motif-binding candidates that might explain H3K9me3 deposition within THE1B repeats and more broadly. In many such genes, the KRAB domain recruits TRIM28, which in turn recruits histone-modifying proteins including SETDB1, which lead to H3K9me3 deposition on nearby nucleosomes (Schultz et al., 2002; Imbeault et al., 2017). We therefore examined recent data measuring genome-wide binding of 222 KRAB-ZNF proteins in humans, and sites where TRIM28 is present in embryonic stem cells, for overlap with THE1B repeats (Imbeault et al., 2017; Appendix 2). Notably, although PRDM9 is a KRAB-ZNF protein, its KRAB domain does not interact with TRIM28 (Imai et al., 2017). We identified three KRAB-ZNF proteins (ZNF100, ZNF430 and ZNF766), as well as TRIM28, that are enriched for binding in THE1B repeats and also associate genome-wide with H3K9me3 deposition. We identified binding motifs for each of these four proteins within THE1B repeats. Strikingly, ATCCATG overlapped the second most significant motif for TRIM28 recruitment, and additional motif analysis for TRIM28 revealed a large (51 bp) motif, fully spanning a cluster of eight motifs associated with H3K9me3/H3K4me3 and recombination rate (Figure 4a), and presumably representing the binding target of one or more KRAB-ZNF protein(s) whose binding targets have not yet been experimentally characterized. The three ZNF proteins also all bind sites overlapping those implicated in impacting H3K9me3/H3K4me3 and meiotic recombination, two in the same region as the TRIM28 motif, but with differing sequence specificity (Figure 4a). Thus, while binding maps are not yet available for every human KRAB-ZNF protein, those that bind THE1B repeats consistently operate to reduce recombination, and TRIM28 recruitment can explain the strongest signals we see.

Across all our PRDM9 binding peaks (not only those in THE1B elements), 36.5% fall within 500 bp of a binding site of at least one of the KRAB-ZNF proteins with available data (Imbeault et al., 2017), suggesting that such repression might be important in regulating recombination more generally. To test this, we individually analyzed the KRAB-ZNF proteins with at least 30 instances of a KRAB-ZNF binding peak occurring near a PRDM9 binding peak (after excluding DNase HS regions and promoters, which are often bound by multiple different proteins), for their effect on whether a hotspot occurs at these PRDM9 binding peaks (Appendix 2). This revealed a universal negative trend (Figure 4e) typified by a twofold reduction in recombination locally at TRIM28-marked sites genome-wide, with every gene except one (ZNF282, which was non-significant) inferred to reduce hotspot odds. Binding of almost all KRAB-ZNF genes tested correlated positively with H3K9me3, and those genes with strongest H3K9me3 enrichment showed the strongest suppression of recombination locally (Figure 4e).

Together, our results indicate a mechanism of cis recombination repression affecting thousands of human PRDM9 binding sites. Binding of KRAB-ZNF proteins to specific sequence motifs within or nearby the PRDM9 binding site, followed by TRIM28 recruitment and H3K9me3 deposition, universally acts to strongly repress local recombination. Perhaps surprisingly, this can occur without preventing PRDM9 binding or H3K4me3 deposition. We suggest that this is the mechanism at play for the recombination-suppressing, H3K9me3-promoting ATCCATG motif, which we suspect is bound by a KRAB-ZNF protein whose binding sites have not yet been mapped. Many KRAB-ZNF genes bind to specific sets of retrotransposon repeats (THE1B repeats represent one example), so this repressive mechanism is likely to act to reduce recombination around many particular repeats.

Genome-wide broad-scale rates vary independently of PRDM9 binding

Finally, we used our THE1B dataset to examine the relationship between PRDM9 binding and broad-scale recombination rates genome-wide while controlling for local genetic context. To do so, we partitioned THE1B repeats into quintiles of increasing recombination rate in the surrounding 1 Mb in males (independently measured by Kong et al., 2002). We observed that DMC1 enrichment increases >10-fold with surrounding recombination rate across both telomeric and non-telomeric regions, but H3K4me3 enrichment in testes, a proxy for meiotic PRDM9 binding, shows no association whatsoever (Figure 4—figure supplement 2). Therefore, in broad ‘hotter’ regions, double-strand breaks and crossovers occur at much higher frequencies, completely independently of the local sequence (which is similar in THE1B repeats genome-wide) or the local level of PRDM9 binding. This proves that, at least in human males, megabase-scale recombination rates throughout the genome are not associated with PRDM9’s ability to bind and deposit H3K4me3, consistent with previous observations in the specific case of elevated human male recombination in telomeres (Pratto et al., 2014).

Multimer formation is mediated primarily by the ZF array

Our results thus far have added to the already complex array of evolutionary forces buffeting PRDM9, relating to its ability to influence gene expression or to the co-binding of other zinc-finger proteins near its binding sites. Another dimension of evolutionary constraint may arise from PRDM9’s ability to bind to itself and form functional multimers. Previous work has shown that PRDM9 as a whole can multimerize and that hetero-multimers of the human A and C alleles can bind the sequence targets of either allele and trimethylate surrounding histones (Baker et al., 2015b). However, it remains unknown which PRDM9 domain is responsible for this observed multimerization behavior. We sought to determine whether multimerization might involve PRDM9’s ZF domain in any way, given other examples of ZF domains mediating protein-protein interactions (McCarty et al., 2003; Lee et al., 2007). To do so, we co-expressed PRDM9 constructs with different ZF domain properties and performed co-ImmunoPrecipitation (co-IP) experiments, thus extending our study from PRDM9’s DNA-binding properties to its protein binding properties.

First, to confirm the ability of the PRDM9 alleles we study here to form multimers (Baker et al., 2015b), we performed co-IP experiments with full-length human B-allele PRDM9 constructs differentially tagged with HA and V5 epitopes and co-transfected into HEK293T cells. Following IP against the HA-tagged construct, we detected the V5-tagged construct very robustly; and conversely (Figure 5—figure supplement 1). This is consistent with human PRDM9 binding strongly to itself, as demonstrated previously in HEK293 cells (Baker et al., 2015b).

Figure 5 with 3 supplements see all

Download asset Open asset

PRDM9 multimer formation is mediated by the ZF domain in an allele-biased manner.

(a) Overview of the different C-terminally tagged PRDM9 constructs used. Both an HA and a V5 version of each construct were generated for co-IP experiments. (b) Barplot showing the relative intensity of western blot co-IP bands normalized to input bands (from 50-μg of total lysate protein) for each combination of bait and prey constructs. Whenever both bait and prey contain the zinc-finger domain (green bars), the co-IP signal is much stronger than when either or both constructs lack a ZF domain (orange bars). See Figure 5—figure supplements 1 and 2 for complete westerns with mock controls. (c) Barplot showing the results of competitive co-IP experiments performed in cells transfected with both Human and Chimp as prey (with the same epitope tag) and either Human or Chimp as bait (with a complementary epitope tag). Bars indicate the relative co-IP band intensity for Human and Chimp prey constructs when pulled down with either Chimp or Human bait. When Human is used as bait, more Human prey is pulled down; when Chimp is used as bait, more Chimp prey is pulled down (and this holds for both directions of HA/V5 tagging).

https://doi.org/10.7554/eLife.28383.023

To narrow the PRDM9 domain(s) responsible for this self-binding behavior, we split the full-length human B-allele PRDM9 cDNA into two pieces: one containing only the C-terminal Zinc-Finger domain (the ‘ZFonly’ construct), and one containing everything else (the ‘noZF’ construct; illustrated in Figure 5a). We co-transfected these constructs and full-length PRDM9 into HEK293T cells in various combinations. The full-length human construct and the ZFonly construct localized to the nucleus, but the noZF construct localized throughout the cell, confirming a dominant role for the ZF domain in nuclear localization (Figure 2—figure supplement 4, Collin et al., 2013; Wang et al., 2014).

Interestingly, the ZF domain alone appears to be responsible for most of PRDM9’s self-binding activity (Figure 5b). Following co-transfection of noZF-HA and noZF-V5, and despite very high expression levels visible in the input, only a very faint co-IP band is visible in the absence of the ZF array. Because the mock control lane is clean (Figure 5—figure supplement 2a), this band likely reflects a real but weak self-binding capability mediated by the non-ZF portion of PRDM9 (though we cannot rule out a role for the ‘early zinc finger’). In complete contrast, we saw an intense co-IP band when co-transfecting ZFonly-HA with ZFonly-V5. Therefore, the zinc-finger domain of one PRDM9 protein can bind strongly to the zinc-finger domain of another, while the rest of the protein interacts more weakly.

We confirmed this result by co-transfecting full-length, V5-tagged human PRDM9 with either noZF-HA or ZFonly-HA, revealing that the ZFonly construct is sufficient to bind and pull down the full-length construct. This finding replicated in a repeat experiment, and when reversing the direction of the IP-western experiment (Figure 5—figure supplement 2b). No co-IP band is seen in a negative control experiment in which we co-transfected the noZF construct with the ZFonly construct (Figure 5b), ruling out an interaction between the ZF domain and the rest of PRDM9 or any interaction between the epitope tags used. Our results remained unchanged following complete DNA digestion by benzonase in the ZFonly-ZFonly co-IP experiment (Figure 5—figure supplement 3a), implying that DNA is not required for the observed interaction between ZF domains.

Hetero-multimers of divergent ZF arrays form less efficiently

Finally, to examine the specificity of ZF array binding, we replaced the final exon containing the human ZF array with a synthesized cDNA matching the final exon of the chimpanzee reference PRDM9 allele (w11a) containing 18 zinc fingers (compared to 12 in the human allele, allowing us to resolve them as two distinct bands), and with different DNA-binding preferences. We refer to the resulting tagged constructs as Chimp-HA and Chimp-V5 (Figure 5a). To test the relative efficiency of homo- versus hetero-multimerization, we performed direct competition experiments. We transfected cells with three constructs: for example, Chimp-V5 plus Chimp-HA plus Human-HA. In this case Chimp-V5 would be the ‘bait’ pulled down by IP with anti-V5, and Chimp-HA and Human-HA would be the co-IP ‘prey’ detected by western blotting with anti-HA (we replicated by reversing the tags). The results show that Chimp PRDM9 pulls down Chimp PRDM9 more than twofold more efficiently than it pulls down Human PRDM9. Similarly, Human PRDM9 pulls down Human PRDM9 more than twofold more efficiently than it pulls down Chimp PRDM9 (Figure 5c). Thus, PRDM9 preferentially forms homo-multimers rather than hetero-multimers, at least for ZF arrays as highly diverged as Human and Chimp. These findings replicated after completely digesting DNA with benzonase (Figure 5—figure supplement 3). Because chimp and human PRDM9 ChIP-seq peaks almost never overlap (Figure 2—figure supplement 1b), we can rule out the possibility that heteromultimer formation between these two alleles results from co-binding to short DNA fragments that may be protected from benzonase digestion by PRDM9. That is, these results also confirm that PRDM9 multimer formation must be mediated by protein-protein interactions, not by protein-DNA interactions, though we still cannot formally rule out a role for DNA in enhancing this protein-protein interaction.

Discussion

The extremely rapid evolution of PRDM9’s zinc fingers, both within and between species, is one of the most striking features of this remarkable protein. Our results imply that over and above their role in positioning recombination sites and a role in chromosome synapsis (Davies et al., 2016), several other factors might influence this evolution. We showed here that PRDM9’s zinc-finger domain can impact its ability to form multimers, its ability to activate gene expression, and its ability to initiate recombination, in particular if it binds near promoters or near targets of other zinc-finger proteins.

PRDM9’s zinc-finger array has been regarded primarily as a DNA-binding domain with no other demonstrated functions, although studies of other zinc-finger proteins have shown that ZF domains can participate in highly specific protein-protein interactions, including with each other (McCarty et al., 2003; Lee et al., 2007). The mammalian gene with the most similar ZF-array to PRDM9 is ZNF133, whose zinc fingers have an almost identical consensus sequence, apart from at DNA-contacting bases, to PRDM9. ZNF133 has been shown to interact with PIAS1 (which interestingly is recruited to DNA damage sites; Galanty et al., 2009) via its zinc fingers, which can simultaneously bind its protein and DNA targets (Lee et al., 2007). Thus, it seems credible that multimerization interactions involving PRDM9 might involve its zinc fingers, and it further seems plausible that PRDM9’s zinc-finger domain might be able to mediate interactions with other proteins. Currently, we can only speculate about what function PRDM9 multimerization might serve if it occurs in meiosis. If biased multimerization occurs in vivo between different PRDM9 alleles (mediated by their variable zinc-finger domains), it could have important meiotic impacts in PRDM9 heterozygotes, although further study is needed, for example to determine if hetero-multimers form less efficiently between the human A, B and C alleles. Together with binding affinity differences, variable hetero-multimerization might impact PRDM9 dominance patterns, and dominance over less advantageous existing alleles could further increase the evolutionary advantage enjoyed by some newly arising alleles (Baker et al., 2015b) or potentially play a role in the dosage sensitivity of PRDM9 in causing hybrid infertility in mice (Flachs et al., 2012; Ségurel et al., 2011). One intriguing hypothesis is that multimer formation may play some role in PRDM9-mediated homologue pairing, which we previously identified as a potential mechanism to explain the role of PRDM9 in fertility and speciation in mice (Davies et al., 2016). In this case, a preference for homo-multimer formation would have obvious advantages.

Our results also highlight the key impact of zinc-finger variation on PRDM9 binding at both fine and broad scales. We observed no fewer than seven different modes of human PRDM9 binding with different internal spacings between several DNA-contacting zinc fingers (Figure 1a), a pattern not detected in previous studies. Binding is strongly impacted by all zinc fingers—as we observed in THE1B repeats and has been previously shown for mouse alleles (Billings et al., 2013)—and involves extensive sequence specificity not captured by a single shared motif. However, the chimpanzee w11a PRDM9 allele binds differently not only at fine scales but also broad scales (Figure 2—figure supplement 1) and avoids promoters. Similarly, a recent study in mice (Grey et al., 2017) found that two mouse PRDM9 alleles do not directly bind at promoters. When Spo11 was present to form DSBs, additional PRDM9 peaks appeared at a small number of promoters—hypothesized as due to indirect recruitment (Grey et al., 2017). An earlier study in mice with AT-rich PRDM9 binding motifs suggested that PRDM9 may direct recombination away from promoters by depositing competitive H3K4me3 marks (Brick et al., 2012).

In contrast to these alleles in chimp and mouse, we observed human PRDM9 directly binding to many promoter regions, previously unobserved due to filtering of PRDM9-independent H3K4me3 peaks and the evident suppression of DSB formation at these sites (Pratto et al., 2014; Baker et al., 2015b). Given the similarity of promoter composition and organization across cell types, the human A/B alleles likely bind to promoters in vivo as well, although we cannot exclude the possibility that such binding is prevented somehow, and further study will need to determine the promoter affinities of other human PRDM9 alleles. Our results imply that the suppression of recombination at promoters (including those that we show are bound by PRDM9) cannot simply be due to PRDM9 binding away from promoters. Interestingly, PRDM9 deposits less H3K36me3 at promoters compared to non-promoters, particularly at promoters with higher levels of PRDM9-independent H3K4me3 (Figure 2). We speculate that, if the co-occurrence of the H3K4me3 and H3K36me3 marks is essential for recombination initiation (as suggested by Powers et al., 2016; Yamada et al., 2017), then the relative lack of H3K36me3 at PRDM9-bound promoters could explain why these binding sites fail to initiate recombination. Of course, this does not explain why recombination tends toward promoters in the absence of PRDM9, be it in knockout mice (Brick et al., 2012) or lineages that have lost PRDM9 (Baker et al., 2017), such as dogs (Auton et al., 2013). Together with the discovery of a fertile woman with two nonfunctional copies of PRDM9 (Narasimhan et al., 2016), these results highlight the unresolved complexity surrounding PRDM9’s role in meiosis.

Adding to this complexity is our finding that PRDM9 can influence the transcriptional activity of a subset of bound genes, such as the spermatogenesis-specific CTCFL and VCX genes, in transfected HEK293T cells. Speculatively, this pleiotropic effect may even help to explain why a single PRDM9 allele predominates in many human populations. That is, while a multitude of alleles may function equally well in specifying sites of meiotic recombination initiation, perhaps a subset can positively affect fertility by binding to and enhancing the expression of meiotic genes such as CTCFL, and these alleles are consequently driven to high frequency by positive selection. We also observed that a predicted submotif shared by many western chimp PRDM9 alleles (Schwartz et al., 2014) corresponds precisely to a group of chimp zinc fingers with the strongest influence on binding targets (Figure 2—figure supplement 1c), similar to the prior observation of a group of ‘C-type’ human PRDM9 alleles that are diverse overall, but again overlap in the region identified to most strongly influence binding (Hinch et al., 2011; Berg et al., 2011; Pratto et al., 2014). This apparent sharing of binding specificities between alleles could potentially be driven by PRDM9’s effects on transcription, its propensity to form multimers, and/or its ability to bind symmetrically to homologous chromosomes in heterozygotes (Davies et al., 2016). Further work will need to explore the extent to which these behaviors are functionally important in vivo.

Aside from recombination suppression at promoters, our results shed light on an additional level of recombination regulation occurring downstream of PRDM9 binding. Sequence-specific binding by the large collection of KRAB-ZNF genes is associated with localized recombination suppression at scales >1 kb, without suppressing nearby PRDM9 binding, or H3K4me3 deposition, either in transfected cells (this study) or in testes (Pratto et al., 2014, Figure 4e). This implies that hundreds of motifs exist that mark sites of local recombination suppression. In contrast, we observe no impact of the presence/absence of binding sites for proteins such as DUX4 (Young et al., 2013) on recombination, despite our observing clear effects of the DUX4 binding motif on local chromatin marks (Figure 4—source data 1). Instead, perhaps only certain chromatin modifications suppress recombination. At their binding sites, many KRAB-ZNF proteins recruit TRIM28 which in turn recruits histone remodeling proteins including SETDB1 and HP1, depositing the H3K9me3 modification (Schultz et al., 2002; Imbeault et al., 2017), which has been associated with suppression of meiotic recombination in mice (Buard et al., 2009; Walker et al., 2015; Yamada et al., 2017). It has been suggested that KRAB-ZNF-induced heterochromatin may serve to stabilize repetitive sequences by preventing non-allelic homologous recombination (NAHR) (Vogel et al., 2006; Iyengar et al., 2011). Furthermore, PRDM9 has been shown to interact with both readers and writers of H3K9me3 (Parvanov et al., 2017). Interestingly, we also saw a weak increase in H3K4me3 signal whenever H3K9me3 increased, and this signal is also observed in testes, implying the motifs we find can impact chromatin modifications in this tissue, and—unlike PRDM9—in many somatic cell types also.

Most KRAB-ZNF proteins bind repeats, and they constitute the largest family of transcription factors in mammals, with rapid evolution (Imbeault et al., 2017). Evidence suggests that the KRAB domain may have first evolved in an ancient ancestor of PRDM9 and then spread (Birtle and Ponting, 2006), so it is interesting that these partial descendants of PRDM9 appear to disrupt meiotic recombination. In general, KRAB-ZNF genes appear to emerge concomitantly with the spread of particular transposon families, and they play a role in repressing transposon activity (Imbeault et al., 2017; Jacobs et al., 2014; Wolf et al., 2015; Rowe et al., 2013). Paradoxically though, they often remain active long after their targets lose transpositional activity (Imbeault et al., 2017). Our results suggest that one possible reason might be an adaptive role for KRAB-ZNF genes in specifically suppressing meiotic recombination in and around repeats, which otherwise could be prone to mediating deleterious genomic rearrangements (as proposed by Zamudio et al., 2015 regarding DNA methylation at transposons). If so, evolution of PRDM9 to bind new repeats might, in turn, lead to co-evolution of ZNF genes to suppress meiotic recombination at a subset of those repeats. We note that the meiotic effects of KRAB-ZNF proteins might be apparent even if they are not expressed in meiotic cells, as their chromatin marks might be transmitted epigenetically from precursor cells (Rowe et al., 2013). However, previous work has shown that KRAB-ZNF co-repressors are essential for normal gametogenesis in mice. Namely, the H3K9me3 methyltransferase SETDB1 is required to silence endogenous retroviruses in mouse primordial germ cells (Liu et al., 2014), and germline knockout of TRIM28 leads to sterility (Weber et al., 2002). Further study will need to determine which, if any, KRAB-ZNF proteins are active in human meiotic cells.

Another consequence of KRAB-ZNF-mediated meiotic recombination suppression is that not only PRDM9 binding sites, but potentially many other sites within hotspots, are predicted to cause DSB initiation asymmetry, and thus are likely to be subject to biased transmission—as seen previously for PRDM9 motifs and GC-biased gene conversion in hotspots (Boulton et al., 1997; Coop and Myers, 2007; Myers et al., 2010; Baker et al., 2015a; Smagulova et al., 2016; Davies et al., 2016). Unlike self-destructive drive at PRDM9 motifs, such drive would bias the evolution of features with broad impacts across cell types, towards increased KRAB-ZNF binding and hence constitutive silencing of hotspot regions, even if this silencing is selectively disadvantageous. Recent work by (Yamada et al., 2017) has demonstrated that as many as a third of meiotic DSBs occur within repetitive sequences in B6 mice, although DSB frequencies vary substantially among different classes of repeats, with most classes being depleted for DSBs. The authors hypothesize that PRDM9 may evolve to target transposons for meiotic recombination so that the effects of hotspot death will rapidly inactivate them by driving mutations or deletions of the PRDM9 binding site to fixation (and this advantage might compensate for the risk of NAHR at those repeats; Yamada et al., 2017). Our work suggests that PRDM9 binding to transposable elements might also inactivate them in a second way: by accelerating their evolution towards constitutive silencing by KRAB-ZNF proteins. In this model, hotspot self-destructive drive would be mirrored by the rapid accumulation of new KRAB-ZNF binding sites within PRDM9-bound transposable elements—a prediction that should be examined empirically by future studies. On the other hand, given strong DSB suppression at promoters, nearby PRDM9 binding sites might be immune from the effects of hotspot death, which would otherwise act to abolish its binding and drive potentially deleterious mutations—including any which might weaken the promoter—to fixation in these regions. Indeed, the potentially destructive or repressive effects of hotspot death could explain why meiotic recombination is directed away from functional elements like promoters, and towards deleterious elements like transposons, at least in humans and mice.

Share this article

Cite this article

Comparison of seven distinct motifs bound by human PRDM9 (B allele).

Figure 1—source data 1

Figure 1—source data 2

Human PRDM9 can bind promoters, though recombination is suppressed.

Spermatogenesis-specific genes VCX and CTCFL are activated by human PRDM9 in HEK293T cells.

Figure 3—source data 1

Figure 3—source data 2

Influences on recombination in cis downstream of PRDM9 binding.

Figure 4—source data 1

PRDM9 multimer formation is mediated by the ZF domain in an allele-biased manner.

Author details

Nicolas Altemose

Present address

Contribution

For correspondence

Competing interests

Nudrat Noor

Present address

Contribution

Competing interests

Emmanuelle Bitoun

Contribution

Competing interests

Afidalina Tumian

Present address

Contribution

Competing interests

Michael Imbeault

Present address

Contribution

Competing interests

J Ross Chapman

Contribution

Competing interests

A Radu Aricescu

Present address

Contribution

Competing interests

Simon R Myers

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading