Three-dimensional genome reorganization foreshadows zygotic genome activation in Drosophila
Drosophila stock maintenance
PCNA::EGFP flies used for interphase-staged Pico-C were generously provided by S. A. Blythe and E. Wieschaus (Princeton University). The sfGFP-Zelda and GAF-sfGFP flies were generated as described previously11,60. Jabba-Trap was expressed from the maternal α-tubulin 67 promoter, kindly provided by S. A. Blythe61. Flies were maintained on standard cornmeal-agar food and raised at 25 °C.
Embryo fixation and collection
A full step-by-step protocol covering embryo fixation can be accessed at protocols.io (https://doi.org/10.17504/protocols.io.rm7vz9ro5gx1/v1). Briefly, PCNA::EGFP flies were allowed to lay eggs on 0.4% acetic acid agar plates at 25 °C for 1 h pre-collection (NC12–NC14) or 2 h pre-collection (NC9–NC11). After egg laying, plates were incubated for 30 min (NC9–NC11) or 1–1.5 h (NC12–NC14) before fixation. Embryo fixation was performed following the published Micro-C protocol57, with the exception that EGS was used as the sole secondary crosslinker. Embryos were first crosslinked in 1% formaldehyde, quenched with Tris-HCl (final 0.75 M), washed in phosphate-buffered saline with Triton X-100 (0.5% final concentration) and then crosslinked in 3 mM EGS for 45 min at room temperature. After a second quench, embryos were washed, staged and sorted. Interphase embryos showed clear nuclear PCNA signals and NC was assigned based on nuclear density. Sorted embryos were snap-frozen in liquid nitrogen and stored at −80 °C.
Pico-C
The complete step-by-step protocol, including the micrococcal nuclease digestion test, is available at protocols.io (https://doi.org/10.17504/protocols.io.kqdg31nm1l25/v1). Pico-C libraries were generated following the published Micro-C protocol57,62 with several key modifications. Nuclei were immobilized on concanavalin A-coated magnetic beads to increase DNA recovery and eliminate centrifugation steps before DNA extraction. Approximately 10 embryos (~60,000 nuclei) were used for the NC14 libraries, with input doubled at each earlier stage (up to ~300 embryos for NC9). After nuclear extraction, samples were resuspended in 900 µl of MB1 and 100 µl of 10× binding buffer (200 mM Hepes-KOH, pH 8, 100 mM KCl, 10 mM CaCl2, 10 mM MnCl2 and 5 mM spermidine) along with 10 µl of pre-washed concanavalin A-coated beads. The mixture was rotated for 10 min at room temperature to allow nuclei to bind the beads.
Bound nuclei were digested with micrococcal nuclease at 37 °C for 10 min with shaking (950 rpm). The reaction was stopped by adding EGTA ((ethylenebis(oxonitrilo))tetra-acetate, final 20 mM concentration) and samples were washed 3× with cold MB2 buffer (50 mM NaCl, 10 mM Tris-HCl, pH 7.5, and 10 mM MgCl2) while kept on the magnet. End-repair and phosphorylation were performed as previously described62 using T4 PNK and Klenow fragment.
Biotinylation was extended to 4 h to improve ligation efficiency and reduce downstream PCR requirements. At the 2-h mark, 1.5 µl of 100 mM ATP and 50 U of Klenow fragment were added and incubation continued for the remaining 2 h at 25 °C. The reaction was stopped with 0.5 M EDTA and samples were washed twice with 500 µl of cold MB3 buffer (50 mM Tris-HCl, pH 7.5, and 10 mM MgCl2).
Proximity ligation was carried out using T4 DNA ligase (10,000 U total, added in two steps) for 5 h at room temperature. Unligated ends were treated with exonuclease III to remove free biotin-dNTPs. Crosslinks were reversed by overnight incubation at 65 °C in the presence of proteinase K and sodium dodecylsulfate (SDS). DNA was purified by extraction with phenol and chloroform. To maximize yield from low-input samples, total DNA was carried forward into the final library construction phase and size selected for dinucleosomal fragments (350–500 bp) after PCR amplification using Ampure XP beads. Final libraries were eluted in 10 mM Tris-HCl, pH 8, and quantified by Qubit before sequencing.
Embryo injections
Injections were performed as previously described18. Briefly, α-amanitin (0.5 mg ml−1 in water) or water alone was injected evenly into a PCNA::EGFP embryo. Embryos older than NC8 after injection were discarded. The remaining embryos were monitored until NC14, then either fixed and staged for Pico-C or pooled (10 embryos per batch) in 30 µl of TRIzol and flash-frozen for RNA extraction. All embryos were hand sorted to ensure correct staging. A subset of embryos was allowed to develop, revealing consistent arrest in amanitin-injected embryos, whereas controls developed normally as shown before18.
RNA-seq in embryos
For each replicate, 40 embryos were pooled and homogenized with a metal pestle. RNA was extracted by chloroform phase separation and purified from the aqueous layer using the NEB Monarch RNA Cleanup Kit (NEB, cat. no. T2030). All samples showed RNA integrity no. >7 on a Bioanalyzer. Libraries were prepared with the Watchmaker mRNA Enrichment and Library Prep Kits according to the manufacturer’s protocol, dual indexed, pooled and sequenced on an Illumina NextSeq 2000 to a depth of ≥80 million paired-end, 60-bp reads per sample.
Single-embryo ATAC–seq on embryos depleted for Zelda and GAF
Single stage 5 (NC14) embryos of the genotype sfGFP-Zld;Jabba-Trap/CyO;GAF-sfGFP and sfGFP-Zld; Sp/CyO;GAF-sfGFP controls, identified based on morphology, were harvested and processed for ATAC–seq as described previously11. Libraries were prepared from four replicates of both experimental and control embryos and paired-end sequencing was performed on a NovaSeq X Plus.
Immunohistochemistry of embryos depleted for Zelda and GAF
The sfGFP-Zelda;Jabba-Trap/CyO;sfGFP-GAF and sfGFP-Zelda;Sp/CyO;sfGFP-GAF control flies were put in cages, allowed to lay for 1.5 h and the embryos were aged for an additional hour at 25 °C. Drosophila embryo collection, fixation and staining were performed as previously described11. The following antibodies were used: rabbit anti-GFP (Abcam, cat. no. ab290, 1:500); DyLight 488-conjugated goat anti-rabbit (Thermo Fisher Scientific, cat. no. 35552, 1:1,000); and DAPI (Thermo Fisher Scientific, cat. no. D1306, 1:1,000). Images were taken using a Nikon Ti-2e Epifluorescent microscope or Nikon A1R+ confocal and processed by Fiji or ImageJ63 and Adobe Photoshop software.
Pico-C data processing
Pico-C reads were mapped to the dm6 genome using BWA-MEM v0.7.17 (ref. 64). Aligned reads were then processed with Pairtools v1.0.3 (ref. 65) using the parse2 function to rescue complex walks and any reads below a mapping quality of 3 were removed. The resulting pairs files were sorted and converted to FAN-C v0.9.18 (ref. 66) pairs format. Duplicates were removed and quality control metrics were extracted.
We then generated FAN-C .hic files. Principal component analysis (PCA) was performed on these .hic files using FAN-C, with a sample size of 100,000 interactions, focusing on distances between 100 kb and 1 Mb to assess variation and clustering across the samples. As biological replicates clustered well across all stages (Extended Data Fig. 1a), the .hic files of replicates were merged. The merged files were binned at different resolutions and normalized using iterative correction. We then used a customized Python script to determine the bin size, where 80% of bins had >1,000 reads, as done previously67, to estimate the optimal resolution of each stage. Finally, the merged FAN-C .hic files were converted to balanced Cooler .mcool files using the cooler zoomify utility from Cooler v0.9.3 (ref. 68), with the options –balance –balance-args ‘–mad-max 0 –max-iters 1000’ to perform iterative matrix balancing.
Pico-C, Micro-C and Hi-C coverage tracks
Genome-wide coverage tracks from Pico-C data were generated using cooltools v0.5.3 (ref. 69). Full or downsampled matrices (–count 100000000 using random sample) were used to compute coverage at multiple resolutions with cooltools coverage, producing raw TSV and BigWig outputs. Raw BigWigs were normalized to counts per million (CPM) using wiggletools and converted to the BigWig format via bedGraphToBigWig.
For quantifying signal across chromatin states, per-bin read counts were extracted from 1-kb cooler matrices, intersected with state annotations and scaled by overlap size to obtain total counts per state. Counts were then normalized by region length and total mapped reads to derive bins per million coverage, analogous to transcript per million normalization in RNA-seq.
Loop and boundary identification
Boundaries were detected from equally downsampled .hic files using FAN-C (500-bp bins, 16-kb window). Boundaries with insulation >0.45 and outside centromeric or low-mappability regions were retained. Stage-specific comparisons were performed in R using GenomicRanges v1.56 (ref. 70) and ComplexUpset v1.3.3 (ref. 71)with a 5-kb tolerance. All boundary calls are provided in Supplementary Data 1.
Loops were identified using Mustache v1.3.2 (ref. 27) at 250-bp, 1-kb, 2-kb, 4-kb and 16-kb resolutions to capture looping dynamics across distances. Loops with P > 0.01 or in low-mappability or low-centromeric regions were removed. Resolution-specific calls were merged iteratively with Bedtools v2.29.2 (ref. 72) (pairToPair -slop 1000) to create a unified loop set. Stage-specific loop overlap (NC9–NC14) was assessed with hicVennDiagram v1.2 (ref. 73) applying the same overlap thresholds used for boundaries. All loop calls are provided in Supplementary Data 2.
Compartment analysis
To identify chromatin compartments from Pico-C data, we used CALDER2 v0.7 (ref. 35).
NC14 intrachromosomal contact matrices were generated with fanc dump at multiple resolutions (1–50 kb) and converted to tab-delimited format for CALDER2. A log2(input-normalized H3K36me3)42,74 BigWig track was used to define the correct A or B orientation. A 3-kb resolution was chosen because it yielded the strongest correlation (≥0.4) with H3K36me3. Subcompartments were simplified to A.1, A.2, B.1 and B.2 for downstream analyses, excluding previously characterized centromeric regions16. Compartment calls are provided in Supplementary Data 6.
Aggregate analysis
Aggregate analysis for loops calls, boundaries and compartments was done by first calculating the expected interaction frequencies using our Pico-C matrices with cooltools expected-cis. Pileup analysis was then carried out using coolpup.py v1.1.0 (ref. 75).
For boundary local pileups, the score in the right corner reflects the average insulation—calculated by dividing the signals in opposing corners of the aggregate as described in the documentation. Compartment pileups were rescaled using the parameters –rescale –rescale_flank 1 –rescale_size 99 to normalize feature size and flanking regions.
Clustering of boundaries and loop anchors
The signal of various epigenetic marks and transcription was extracted for a 5-kb window around the center of individual boundaries and loop anchors (anchors within 1 kb of each other were merged and treated as a single anchor). An enrichment score for each region was calculated by dividing the average signal of the center (−500 bp to +500 bp) by the average signal from the flanking region (±1 kb to ±2.25 kb). Hierarchical clustering of boundaries and loop anchors was performed on the enrichment scores using Euclidean distance and Ward’s method. The number of clusters was selected empirically as the largest number that produced epigenetically distinct groups. For visualization purposes, UMAP dimensionality reduction was applied to the region by enrichment score matrix. Loops >500 kb were excluded from this analysis, because they tend to show higher noise and weaker signal due to distance-dependent decay (16.1% of total loops). Cluster results are provided in Supplementary Data 3.
ChromHMM input data processing
Public datasets were processed using standardized pipelines specific to each data type.
For ChIP–seq data, adapter trimming and deduplication were performed with fastp v0.23.1, followed by alignment to dm6 using Bowtie2 v2.5.4 (ref. 76) (–no-unal, –no-mixed and –no-discordant). SAM files were converted and sorted with Sambamba v1.0.1 (ref. 77) and reads with mapping quality <20 or overlapping black-listed regions were removed. ATAC–seq data were processed identically, with additional use of deepTools v3.5.678 alignmentSieve (–ATACshift). Open chromatin reads (<100 bp) and nucleosomal fragments (180–250 bp) were separated. RNA-seq data were aligned using STAR v2.7.10a79. The resulting BAM files were indexed using Sambamba.
Quality control checks for all datasets were performed using Samtools v1.18 (ref. 80) and FastQC v0.12.1 (ref. 81).
ChromHMM
ChromHMM v1.24 (ref. 29) was used for chromatin state analysis. Datasets were grouped into five categories: pre-minZGA (NC1–NC8), pre-majZGA(a) (NC9–NC11), pre-majZGA(b) (NC12–NC13), ZGA (NC14) and Zelda−, which included Zld knockdown data from NC12–NC14. For pre-majZGA stages, where data were available across both NC9–NC11 and NC12–NC13, shared datasets were used; otherwise, stage-specific data were included where available. Input controls were specified in the cell-marks file and, for RNA signal, flowthrough maternal RNA-seq data7 were used to minimize maternal transcript influence.
Deduplicated BAM files were used for binarization and model training (with LearnModel options -b 100 -r 400). Models with varying state numbers were tested and a 20-state model was selected based on log(likelihood). States were annotated with ChIPseeker v1.40 (ref. 82) using enrichment across gene bodies, chromosomal distribution and ChromHMM emission probabilities, and ordered by developmental enrichment. Gene ontology enrichment was performed with clusterProfiler v4.12 (ref. 83) using enrichGO (BP ontology, Padj < 0.05), with terms simplified and visualized as dot plots highlighting key biological processes. De novo motif enrichment was assessed using HOMER v4.11 (ref. 30) (findMotifsGenome.pl) on regions showing peak enrichment for each state. ChromHMM tracks are provided in Supplementary Data 5.
Metaloci analysis
ChIP–seq and RNA-seq data were processed per replicate using deepTools bamCompare to generate log2(ratio BigWigs) when inputs are available. For RNA-seq, flowthrough was used as input to control for maternal effects. Replicates were merged using bigwigAverage. The dm6 genome was divided into 200-kb windows (100-kb step), excluding bins with >5% unmappability and/or centromeric overlap.
Pico-C and genomic signals were integrated using an adapted METAloci framework40. Spatial autocorrelation was applied to assess correlations between chromatin interactions and genomic features. Stage comparisons (NC12 versus NC14) used two-sided Wilcoxon’s tests (*P < 0.05, **P < 0.01, ***P < 0.001) and violin plots show global Moran’s distributions with mean values highlighted.
RNA-seq analysis (α-amanitin-treated versus water-treated embryos)
Adapters and low-quality bases were trimmed using fastp v0.24.0 (ref. 84) with –detect_adapter_for_pe and reads were aligned to the dm6 genome using STAR v2.7.11b (–outFilterMismatchNoverLmax 0.04, –outFilterMultimapNmax 1 and –outSAMtype BAM SortedByCoordinate). Only uniquely mapping reads were retained. Gene-level quantification was performed with featureCounts v2.0.8 (ref. 85) in paired-end mode (-p).
Differential expression analysis between α-amanitin-treated and water-treated embryos was performed using DESeq2 v1.44.0 (ref. 86). Genes with ≥10 counts in at least 2 samples were retained.
A set of stable maternal transcripts based on prior annotations8 was used for normalization (≥1,000 counts in ≥2 samples, |log2(fold-change)| < 1, log(counts per min) > 1). Variance-stabilized counts were used for visualization and significance was defined as Padj < 0.05.
Scale-factor-normalized BigWig tracks were generated with deepTools (bamCoverage, bin size = 1) and averaged across replicates using bigwigAverage.
Chromosight analysis
Chromosight v1.6.3 (ref. 44) was used to quantify boundary and loop strengths using –subsample to have equally sampled contact maps from α-amanitin-injected, water-injected and Zelda or GAF DKD embryos and matched controls. For boundary analysis, scores were computed at 500-bp resolution using –pattern borders. Loop strengths were quantified using NC14 loop BEDPE coordinates across multiple resolutions (1–8 kb) to capture interactions at varying genomic distances. The highest-resolution score was retained per loop. Loop scores were analyzed at the anchor level by assigning each loop anchor independently to predefined merged loop-anchor cluster regions.
Scores were compared between samples and controls in R using unpaired Wilcoxon’s rank-sum tests (***P < 0.001, **P < 0.01, *P < 0.05). Scores were grouped by boundary or loop–anchor cluster, with anchors participating in multiple loops linked to more than one score. To track feature score trends, normalized score differences (−1 to 1) were calculated per locus-positive values indicating stronger signals in controls and negative in experimental samples. Outliers were removed using the interquartile range method and mean and median differences were summarized per cluster.
Model training and evaluation
We adapted the second-stage Orca model46, composed of a hierarchical sequence encoder and a multilevel cascading decoder, to predict the chromatin structures at 125-bp, 250-bp, 500-bp and 1-kb levels from 250-kb DNA sequences. NC12 and NC14 Pico-C data were used and genomic sequences were retrieved from the dm6 reference genome. All chromosomes except chr2L:0–19 Mb were used for training, with this region reserved for validation.
We adopted the same training strategy as Orca46. The training process with stochastic gradient descent took about 1,500,000 steps (250-kb sequence with batch size 8 and learning rate 0.01 with momentum 0.98) on a server with 4 NVIDIA Tesla v100 (32 GB) graphics processing units. Model performance on the holdout region was assessed by concatenating and flattening 1-kb prediction matrices and calculating Pearson’s correlation between predicted and observed Pico-C contact maps.
Multiplexed in silico mutagenesis
We employed the same in silico mutagenesis screening approach utilized in Orca46 at 1-kb resolution to identify individual motifs critical for genome interaction. The disruption impact on local genome interactions is measured by 250-kb structural impact score, which is the average absolute log(fold-change) of interactions between the disruption position and all other positions in the 250-kb window.
For motif enrichment analysis, Drosophila nonredundant motifs were downloaded from the JASPAR 2024 database87. Motif matches for each 10-bp site were scanned for after extending by 10-bp flanking sequence on each side, and a maximum log(odds score) over the 30-bp window for each motif was calculated. To quantify the enrichment of motifs, two-sided Student’s t-test (without assuming equal variance) was performed to compare the motif log(odds scores) of the sites with impact score >0.02 for NC14 and >0.01 for NC12 against the background of 100,000 sites randomly drawn among all 10-bp sites screened. Fold enrichment was also computed on the same sites with a motif log(odds threshold) of 10.
Motif-specific mutagenesis
To assess how specific motifs influence global genome architecture, we analyzed boundary regions following genome-wide mutagenesis. The validation region was tiled into 250-kb windows (125-kb step) and accessible motif instances were defined by intersecting ATAC–seq peaks with motif matches. Each site was extended to 14 bp and randomly mutated to ensure consistent sequence length.
Boundaries identified in the original Pico-C data were used for pileup analysis (log(fold-change) over background within 20 kb). Boundary strength was quantified by the insulation score, defined as the difference between the sums of the diagonal versus off-diagonal quadrants of the interaction matrix centered on each boundary. The insulation change was computed as the difference in the score before and after mutation.
Five motifs—Zelda, GAF, M1BP, Dref and sqz—were tested individually and in combination. Pairwise effects were assessed by comparing insulation changes at shared boundaries and combined effects were evaluated by mutating all accessible instances of the five motifs within each window and comparing total insulation changes to the sum of individual perturbations.
Structural impact score distribution
Mean structural impact scores were calculated across ChromHMM state regions by averaging scores within annotated regions. Promoter-like states (7 and 19) were further subdivided based on overlap with TSSs. Differences across states were tested using ANOVA (P < 2 × 10−¹⁶) and confirmed with a nonparametric Kruskal–Wallis test (P < 2.2 × 10−¹⁶). Pairwise comparisons were performed using Tukey’s honest significant difference (HSD) test and the results were visualized with mean scores ± 95% confidence intervals and a compact letter display indicating statistically distinct groups.
Effect size distribution
To quantify the effect size of factors identified as significant in our machine learning analysis, we first gathered the predicted effect data from in silico mutagenesis for each factor and a corresponding set of random regions as controls. The effect size for each factor was determined by calculating the mean difference between the factor-specific predictions and the control values.
State local enrichment analysis at loop anchors and boundaries
Chromatin state enrichment within loop anchors and boundaries was assessed across developmental stages. Loop and boundary coordinates were intersected with stage-specific ChromHMM annotations and local background regions were defined by extending each feature ±150 kb. Enrichment was calculated as the observed/expected ratio of each state within loops or boundaries relative to their local background, log2(transformed).
ATAC–seq analysis
Paired-end ATAC–seq reads were processed as above. BAM files were filtered (MAPQ ≥ 10), deduplicated and merged with sambamba. Tn5 insertion bias was corrected using deepTools alignmentSieve –ATACshift. Coverage tracks were generated with bamCoverage (RPGC normalization, 5-bp bins, blacklist excluded).
Accessible regions were identified with HMMRATAC v1.2.10 (ref. 88). We note that, in DKD embryos, HMMRATAC detected an increased number of peaks, predominantly corresponding to accessible regions in controls. HOMER de novo motif enrichment was performed on regions that lost accessibility in DKD embryos. Called peaks are provided in Supplementary Data 7.
Data visualization
Chromatin contact data were visualized using HiGlass v0.4.7 (ref. 89). pyGenomeTracks v3.8 (ref. 90) was used to display chromatin contact data in Figs. 1 and 3 and Extended Data Figs. 1 and 6, using a natural log with scale factor set to 100. All other panels displaying chromatin contact data were plotted with CoolBox v0.3.8 (ref. 91) using a log10 scale.
Statistics and reproducibility
Statistical analyses were performed in R (v4.4.0) and data visualization was carried out using the ggplot2 package (v3.5.1). Box plots were defined with boxes spanning from the first quartile (25th percentile) to the third quartile (75th percentile), with the median represented by a horizontal line within the box. Whiskers extend to the most extreme data points within 1.5× the interquartile range from the box. No collected data were excluded and data collection and analysis blinding were not applicable to this study. All statistical tests were two sided.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
First Appeared on
Source link