Analysis and Modeling of Early Estradiolinduced GREB1 Single Allele Gene Transcription at the Population Level
Article Information
Seyyed Mahmood Ghasemi ^{1,2}, Pankaj K. Singh^{2,3}, Hannah L. Johnson^{2,4}, Ayse Koksoy^{2,3}, Michael A Mancini^{2,3,4,5,6}, Fabio Stossi ^{2,4,5 }and Robert Azencott^{1,*}
^{1}Department of Mathematics, University of Houston, Houston, TX, USA
^{2}GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, USA
^{3}Center for Translational Cancer Research, Institute of Biosciences and Technology, Texas A&M University, Houston, TX
^{4}Integrated Microscopy Core, Baylor College of Medicine, Houston, TX, USA
^{5}Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
^{6}Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, TX, USA
^{*}Corresponding author: Robert Azencott, Dept of Mathematics, University of Houston, Houston, TX, USA,
Received: 4 September 2023; Accepted: 11 September 2023; Published: 10 June 2024
Citation: S.Mahmood Ghasemi, Pankaj Singh K, Hannah Johnson L, Ayse Koksoy, Michael Mancini A, Fabio Stossi and Robert Azencott. Analysis and Modeling of Early Estradiolinduced GREB1 Single Allele Gene Transcription at the Population Level. Journal of Bioiformatics and Systems Biology. 7 (2024): 108128.
Share at FacebookAbstract
Single molecule fluorescence in situ hybridization (smFISH) can be used to visualize transcriptional activation at the single allele level. We and others have applied this approach to better understand the mechanisms of activation by steroid nuclear receptors. However, there is limited understanding of the interconnection between the activation of target gene alleles inside the same nucleus and within large cell populations. Using the transcriptional coactivator GREB1 gene as an early estrogen receptor (ER) response target, we applied smFISH to track E2activated GREB1 allelic transcription over early time points to evaluate potential dependencies between alleles within the same nucleus. We compared two types of experiments where we altered the initial status of GREB1 basal transcription by treating cells with and without the elongation inhibitor flavopiridol (FV). E2 stimulation changed the frequencies of active GREB1 alleles in the cell population, and this was independent of FV pretreatment. In FV treated cells, the response time to hormone was delayed, albeit still reaching at 90 minutes the same levels as in cells not treated by FV. We show that the joint frequencies of GREB1 activated alleles observed at the cell population level imply significant dependency between pairs of alleles within the same nucleus. We identify probabilistic models of joint alleles activations by applying a principle of maximum entropy. For pairs of alleles, we have then quantified statistical dependency between their GREB1 activations by computing their mutual information. To further analyze the time course of GREB1 activation observable at the population level, we have introduced a stochastic model compatible with allelic statistical dependencies, and we have fitted this model to our data by intensive simulations. This provided estimates of the average lifetime for degradation of GREB1 introns and of the mean time between two successive transcription rounds. Our approach informs on how to extract information on single allele regulation by the estrogen receptor from within a large population of cells, and should be applicable to many other genes.
Keywords
Gene Transcription, GREB1, smFISH, Single Allele
Gene Transcription articles; GREB1 articles; smFISH articles
Article Details
1. Introduction
Timedependent modulation of gene transcription is necessary for a cell to respond to stimuli in a dynamic and reversible manner. The difficulty in dissecting the complex mechanisms of biological responses is enhanced by the fact that gene transcription in individual cells within a population appears to be vastly heterogeneous [18].
We and others have used single molecule RNA FISH (smFISH) to study the effect of stimuli on gene transcription by population analysis of fixed samples [3,7], which facilitates the capture of a large number of events, their spatial location, and the nascent RNA from individual alleles; however, this is at the expense of time dynamics. Specifically, we have focused on the estrogen receptor (ER), a wellestablished model for transcriptional response to hormones (i.e., 17bestradiol, E2), using GREB1 as a prototypical early ER target gene [2,8,9]. In our previous study [8], we identified that GREB1 responded to E2 in a cell and alleledependent manner, and that the frequency of allele activation was tunable by specific epigenetic inhibitors, indicating that the cell has mechanisms in place to control the frequency of allelic responses to stimuli.
To complement the population analysis, several time dynamic studies have been performed. The method of choice has been engineering one or more copies of a gene to contain repeat sequences (e.g., MS2, PP7) that are recognized by specific fluorescent proteins [1012]. These live studies proposed that gene transcription occurs in stochastic bursts, where a gene is ON for a short period of time, followed by longer periods of inactivity. Live cell experiments of engineered GREB1 alleles [2], and another prototypical E2target, TFF1 [1], have shown a highlydynamic response to hormone in individual cells. From these pivotal studies, the following observations have been made: 1) E2 regulates the frequency of bursting by reducing the promoter OFF times; 2) extrinsic noise governs the cellbycell heterogeneity in response; 3) there is some correlation between alleles in the same nucleus; and, 4) at the level of individual cells, live imaging experiments can be accurately fitted by a stochastic model driven by a twostates promoter.
To model the stochastic gene transcription bursts observed in individual cells, several papers [1,2,1320] have introduced and simulated engineered gene promoter models randomly switching between one active state and several inactive states. The most popular have been two states models where the gene promoter is either ON or OFF. For instance, in Fritzch et al [2]., &"twostates&" models were fitted to GREB1 transcription bursts observed live in single cells, with model parameters exhibiting quite strong fluctuations from celltocell, which encouraged our present populationlevel study of endogenous GREB1 gene expression.
Here, we sought to study and model the initial phases of hormone stimulation by using smFISH on fixed MCF7 breast cancer cells in culture. In our experiments, smFISH images are acquired every 15 minutes, at times T_{0 }= 0, T_{1 }= 15 min,...,T_{6} = 90 min. At each time T = T_{j} a large cell population pop(T) of N(T) cells is imaged, with N(T) ranging from 400 to 1000. We compared two types of initial conditions:
 in (FV+E2)experiments, a flavopiridol (FV) pretreatment of cells started two hours before T_{0 }to block ongoing transcriptional elongation [21] until FV release at T_{0}, which synchronizes the initiation step of the transcriptional cycle.
 in E2experiments, cells were maintained in “native” state at T = T_{0}, so that the transcription cycle was random before T_{0} At T_{0} each experiment started from seven distinct initial cell populations {init(0), ..., init(7)}. Each population init(j) evolved separately from time T_{0} until T_{j} , and the state of pop(T_{j}) was only imaged at time T_{j} . This approach does not enable tracking the same cells and active alleles across time. At each time T = T_{j}, image analysis of smFISH data provided GREB1 transcription statistics across all cells of pop(T_{j}). For k=0,1,2,3,4 we computed the frequencies Q_{k}(T) of nuclei exhibiting “k” detectable nascent mRNAs (“active alleles”). The frequencies Q_{k}(T) aggregate transcription activities over the N(T) cells of pop(T). As seen in [1,2,1320], transcription bursts of these N(T) individual cells are random and clearly non synchronous, so that our transcription data which aggregate GREB1 transcriptions of individual cells across a large population pop(T) provide populationlevel activation frequencies which evolve smoothly in time with no significant bursts. To model the smooth time dynamics of the frequencies Q_{k}(T), we introduced a “populationlevel” stochastic transcription model involving four key parameters:
1) mean waiting time “A” between successive productive transcription rounds;
2) mean lifetime “L” of nascent mRNA;
3) mean elongation time “MTD” to complete one mRNA; and,
4) the minimal number VTH of RNA molecules enabling fluorescence detection.
Parameters estimation for our populationlevel model was implemented by massive stochastic simulations to reach a good fit to smFISH imaging data across biological replicate experiments.
A key step was to show that at each time T = T_{k }, the populationlevel transcription activation frequencies indicated statistical dependencies between pairs (AL_{1}, AL_{2}) of GREB1 alleles within individual nuclei. At each time T = T_{j}, we applied a maximum entropy principle to fit a probabilistic model to the frequencies of joint allele activations observed at the population level. This enabled the quantification of dependencies between pairs of alleles by computing their Mutual Information. While the present study focused on GREB1 gene transcription, we expect that the algorithmic modeling and data analysis techniques that we developed will be applicable for other hormoneinduced genes transcription.
2. Results
2.1 Time course analysis of early E2induced GREB1 gene transcription by smFISH before and after treatment by flavopiridol
As we have shown previously [8], E2 activates GREB1 gene transcription in a cell and alleledependent manner, as measured by smFISH using spectrally separated exon and intron probe sets. Here, we sought to focus on the initial phase of hormonal stimulation (first 90 minutes) by measuring cell population pop(T) responses at 15 minutes intervals (T_{0 }= 0, T_{1} = 15 min, …, T_{6} = 90 min) under two types of cell state initial conditions and three independent biological replicates per condition type. The first initial condition was to consider the initial state of gene transcription as random, i.e. all individual alleles already have their own past “history” with RNA polymerases II located at random phases during either initiation or elongation, which is represented by cells grown for 48 hours in hormonedepleted media. Cell population transcriptional data after this initial cell state are displayed by RED curves and labeled “E2curves” in every Figure. The second initial condition was designed to arrest transcription elongation by using the reversible CDK9 inhibitor, flavopiridol (FV), for 2 hours before E2 treatment at T_{0}, causing elongation to stop and RNA polymerase II to stall at gene promoters [21]. We then released the FV block by three washes, and stimulated GREB1 gene transcription by hormone treatment (E2, 10nM); in the Figures the corresponding population data are displayed by BLUE curves and labeled “FV+E2”.
For each independent biological replicate, the analyzed pop(T) ranged from approximately 400 to 1000 cells, captured by high resolution (60x/1.42NA) epifluorescence deconvolution microscopy (representative images are in Figure 1A). Our temporal resolution has limitations as GREB1 probe visualization requires ~20 individually labeled fluorescent oligo probes (out of 48) to bind to nascent RNA, which, based on their location on the gene, occurs when the GREB1 is ~6070% transcribed; for sequences of oligos refer to [8,22]. On average, the speed of RNA Polymerase II in mammalian cells (i.e., ~22.5Kb/min, [21,23,24]), suggests that detection of newly made, partial RNAs could occur only after ~30 minutes of E2 induction.
As smFISH experiments do not directly follow the activation of individual alleles live, the best proxy is to analyze a time series that evaluates transcriptional events across the population pop(T) of size N(T) using several statistical approaches describing the set of observable active alleles. In each nucleus, we used custom automated image analysis (described in detail in the Methods section), to identify the number k = 0,1,2,3,4 of active GREB1 alleles. From this we derived the total number N_{k}(T) of nuclei exhibiting k active alleles, thus yielding the five frequencies Q_{k}(T) = N_{k}(T)/N(T) The behavior of the cell population at each time point is then represented by the five GREB1 activation frequencies {Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T)},which naturally add up to 1 (Figure 1B). These frequencies were calculated across more than 400 cells per replicate, with standard error margins of the order of 2.5% (see Supplemental Materials). The effects of flavopiridol block/release on E2induced gene transcription (Figure 1B, blue curves) appear to result in: 1) a significant increase in Q_{k}(T) at all the time points T< 60 min; and, 2) a corresponding decrease in all the other Q_{k}(T), collectively indicating that the FV pretreatment was effective. More noteworthy are the two following observations: 1) the celltocell and allelebyallele variation in responses is maintained if transcription elongation is manipulated indicating that this part of the transcription cycle is not controlling synchronicity of responses to hormone; and, 2) the final frequencies of activation at time points T larger than 60min are virtually identical whether we used FV pretreatment or not, indicating that the E2 response “catches up” independently of the starting conditions, so that a random cell state starting condition (i.e. no FV pretreatment) does not offer an advantage in term of response to hormone over time.
Figure 1: GREB1 smFISH time course analysis in MCF7 breast cancer cells with and without flavopiridol block/release. A). MCF7 cells were treated for the indicated times with 10nM E2 and GREB1 smFISH was performed at each time point. Images are at 60x/1.42, deconvolved and max projected. Red spots represent intronic and green spots exonic probe sets. Samples labeled as FV+E2, were pretreated with 1µM flavopiridol (FV) for 2 hours, followed by three washes and E2 treatment. Scale bar: 10µm. B). The time courses of five frequencies {Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T)} of active GREB1 alleles/cell are shown as follows. The red curves display the frequencies Q_{k}(T) after averaging over three E2 experiments. The blue curves display the Q_{k}(T) after averaging over three FV+E2 experiments. The vertical bars display the dispersion of Q_{k}(T) values over three similar independent experiments. Note that the dispersion of the Q_{k}(T) values across 3 experiments is much larger than the standard error of estimation affecting Q_{k}(T) in each experiment. At the end of all experiments (T= 90min), all Q_{k}(T) stabilize to a value ≈ 20%.
2.1 Allelic activations by E2induced GREB1 transcription exhibit significant statistical dependency.
We explored whether GREB1 activation occurred independently at the four alleles present in each of the aneuploid MCF7 nuclei by estimating the probabilities of joint activation for pairs of alleles AL_{1}, AL_{2} At time T, in any nucleus NUC_{n}, each allele can either be ON or OFF, therefore yielding 16 distinct possible joint activation states {S_{0},S_{1},S_{2},.., S_{15}} for the 4 alleles AL_{1},AL_{2},AL_{3},AL_{4}._{. }Denote prob _{n}(S_{k}) the probability that the four alleles in NUC_{n} are in the joint state . The 16 probabilities prob _{n}(S_{0}),prob _{n}(S_{1}),…,prob _{n}(S_{15}) add up to 1, and depend on time T. Due to population heterogeneity, prob _{n}(S_{k}) will depend on many cell extrinsic and intrinsic factors specific to each NUC_{n} ; hence, our image data could only record averages F_{T}(S_{k}) of the probabilities prob _{n}(S_{k}) over all nuclei NUC_{n }of pop(T). Concretely, since image resolution did not enable allele matching between distinct cells, the probabilities F_{T}(S_{k}) were not directly computable from image analysis, which could only provide the five observed frequencies Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T) shown in Figure 1B. For each time point T, the algorithmic challenge was hence to compute 16 unknown probabilities F_{T}(S_{0}),…., F_{T}(S_{15}) starting only from the 5 observed frequencies Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T). We identified the five explicit linear relations (see Methods Equation 1) expressing the Q_{k}(T) in terms of the F_{T}(S_{k}), but this only provided five explicit linear constraints on our 15 unknowns F_{T}(S_{k}). To handle this estimation problem, a natural first approach was to assume that under the probability distribution , one had statistical independence of activations among the four GREB1 alleles. In practical terms, independence means that there is no quantifiable mechanism through which any allele interferes or influences activation potential in other alleles in the same nucleus. We have proved (see Methods Equation 2) that, if the probability of joint activations involved statistical independence between alleles activations, the observed frequencies Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T) would have to verify extremely restrictive polynomial constraints (see Methods MM5 and Equation 2).
All our experiments revealed that these polynomial constraints were never satisfied by the observed Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T). Thus, to evaluate the F_{T}(S_{k}) from frequencies of joint alleles activations observed at population level, we had to reject the hypothesis of statistical independence between activations of the four distinct alleles within a nucleus.
We have confirmed this theoretical result, by comparing, at each time point, the experimental activation frequencies Q_{0}(T),Q_{1}(T),Q_{2}(T),Q_{3}(T),Q_{4}(T) with the analogous activation frequencies generated by joint probability models F_{T} based on independence between allele activations. Indeed, independence would imply that each probability model F_{T} is fully determined once one specifies activation frequencies at time T for each single allele AL_{1},AL_{2},AL_{3},AL_{4}. For each time T, we have generated 10^{6} such models F_{T} based on the hypothesis of independence between alleles activations and computed the associated virtual activations frequencies [virQ_{0}(T),virQ_{1}(T),virQ_{2}(T),virQ_{3}(T), virQ_{4}(T)] to compare them to the observed [Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T)], Our results clearly show that none of these sets of virtual frequencies [virQ_{0}(T),virQ_{1}(T),virQ_{2}(T),virQ_{3}(T),virQ_{4}(T)] could match the experimentally observed [Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T)]. This was visualized by scatter plots, each one displaying a pair [Q_{i}(T), Q_{j}(T)] observed over time for each experiment. For example, Figure 2 separately displays scatter plots for the two pairs [Q_{0}(T), Q_{1}(T)] and [Q_{0}(T), Q_{4}(T)]. The red dots represent these pairs of frequencies observed via smFISH or E2 experiments. The blue dots similarly display the same pairs for FV+ E2 experiments. Arrows indicate the observations of frequencies [Q_{0}(T), Q_{1}(T)] and [Q_{0}(T), Q_{4}(T)]. at successive times T for an individual experiment. The 10^{6 }green dots represent the virtual pairs [virQ_{0}(T), virQ_{1}(T)] or [virQ_{0}(T), virQ_{4}(T)]. associated to 10^{6} virtual joint probability models F_{T} based upon the independence hypothesis. As shown in Figure 2, both blue and red dots are positioned well away from green dots, thus confirming that, for population level modeling, one must reject the hypothesis of independence between alleles.
Figure 2: Statistical dependency between GREB1 alleles activations. Scatter plot representation of two frequency pairs [Q_{0}(T), Q_{1}(T)] on the left and [Q_{0}(T), Q_{4}(T)] on the right. The blue (FV+E2) and red (E2) curves display the real data time evolutions for the pair of frequencies observed in two experiments. Arrows indicate the time direction. On the left panel, each of the 10^{6} green dots represent one pair of frequencies [q_{0}(T), q_{1}(T)] generated by at least one model with independent alleles. Note that the green dots always remain distinct from the experimental red and blue dots. Similar graphs were obtained for any pair Q_{i}(T), Q_{j}(T) with i ≠ j. This indicates that probabilistic modeling of transcription activities aggregated at cell population level requires assuming some dependency between alleles activations.
As just showed above, to be compatible with experimental data, the unknown average frequencies F_{T}(S_{0}),…, F_{T}(S_{15}) of jointly activated alleles across cell population pop(T) must exhibit dependencies between alleles activations. Each one of the 5 observed frequencies Q_{k}(T) is an explicit linear combination of the 16 unknowns F_{T}(S_{0}),…, F_{T}(S_{15}) (see Methods, Equation1). Since there was no probabilistic model F_{T} achieving zero dependencies between alleles activations and also verifying these 5 linear constraints, we decided to seek a model F_{T} compatible with these 5 constraints and minimizing dependencies between alleles activations. For fitting a joint probability distribution F_{T} to data under linear constraints, a generic principle is that minimizing dependencies is approximately equivalent to maximizing the entropy Ent(F_{T}) of the probability model F_{T} under the same linear constraints (see Methods MM6). This maximum entropy principle is well established in the physics of gases or of spinglass magnets arrays, and has also successfully been used to model images by Gibbs distributions [25,26]. Here we have applied this principle to theoretically compute the unique joint probability F_{T} of activated alleles which has maximum entropy among all probabilities compatible with the five observed frequencies Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T). We then proved that this maxentropy probability model F_{T} has full symmetry, meaning that arbitrary permutations of the four alleles do not change the frequencies of their joint state of activations. For instance, due to average probability modeling of the whole population pop(T), each of the four alleles has the same activation probability P_{AL}_{1 }(T) at time T. We have thus obtained explicit formulas expressing each one of the 16 unknowns F_{T}(S_{0}),…, F_{T}(S_{15}) in terms of the five observed frequencies Q_{k}(T) (see Methods, equation 1). These formulas also gave us explicit expressions for two key probabilities (see Methods, Equation 3), namely: 1) for each single allele AL_{1}, the probability P_{AL}_{1} (T) that AL_{1} will be active at time T; and 2) for each pair of alleles AL_{1}, AL_{2} in the same nucleus, the probability P_{AL1,AL2}(T) that AL_{1 }and AL_{2} will be simultaneously active at time T. The probabilities P_{AL}_{1} (T) and P_{AL}_{1} _{AL}_{2 }(T) do not change if we replace AL_{1}, AL_{2} by any other two alleles AL_{i}, AL_{j} within the same nucleus. This is due to the full symmetry of the joint probabilities F_{T }(S_{k}), a property which was derived from the maximum entropy principle.
In Figure 3 A, we display the evolution of P_{AL}_{1} (T) over time. For FV+ E2 experiments (blue curve), the initial P_{AL}_{1} (0) is nearly 0 and P_{AL}_{1} (T) remains practically equal to zero until T=30min since GREB1 transcription duration is of the order of 40 min and GREB1 transcriptions are nearly blocked by FV before T = 0. For E2 experiments (red curves), P_{AL}_{1} (0) is naturally higher than for FV+E2 experiments (blue curves) due to some GREB1 transcription activity at low level before infusion of E2 at T=0. For all experiments, P_{AL}_{1} (T) increases steadily with T and reaches a maximum ranging from 40% to 60% at T= 90min.
For each pair of alleles (AL_{1}, AL_{2}) in the same nucleus, their joint random activation states at time T can have only one of four possible configurations: active/active, active/inactive, inactive/active, or inactive/inactive. We have explicitly computed the probabilities of these four configurations in terms of the observed frequencies Q_{k}(T) The probability P_{AL}_{1} _{AL}_{2 }(T) that the two given alleles AL_{1} and AL_{2} are simultaneously active at time T was then plotted for all experiments (see Figure 3B). For FV+ E2 experiments (blue curves), P_{AL}_{1} _{AL}_{2 }(T) remains quite low until T=30min since most polymerase elongations started after time T=0 is still too incomplete at T=30min to be reliably detectable. For all experiments, P_{AL}_{1} _{AL}_{2 }(T) increases with T, and reaches a maximum ranging from 25% to 45% at T = 90 min.
Figure 3: Time course for GREB1 activation probabilities P_{AL}_{1} (T) and P_{AL}_{1} _{AL}_{2 }(T). In (A), P_{AL}_{1} (T) is the computed probability that a given single allele AL_{1} will be GREB1 activated at time T, and in (B), P_{AL}_{1} _{AL}_{2 }(T) is the joint probability that a given pair of alleles AL_{1}, AL_{2} will both be activated at time T. The time courses of P_{AL}_{1} (T) and P_{AL}_{1} _{AL}_{2 }(T) between T=15min and T=90min are displayed by red curves for three E2 experiments and by blue curves for three (FV+E2) experiments. The vertical bars display the standard errors on these probabilities.
Table 1a: Activation probability P_{AL}_{1} (T) displayed in % for single allele AL_{1}
E2 exp 1 
E2 exp 2 
E2 exp 3 
FV+E2 exp 4 
FV+E2 exp 5 
FV+E2 exp 6 

T = 0 
8.8 +/ 0.9 
25.2 +/ 1.6 
NA 
3.0 +/ 0.6 
3.0 +/ 0.7 
NA 
T = 15 min 
13.4 +/ 1.1 
31.7 +/ 1.7 
25.8 +/ 1.9 
4.5 +/ 0.6 
3.5 +/ 0.4 
3.0 +/ 0.4 
T = 30 min 
21.8 +/ 1.4 
39.5 +/ 1.7 
30.0 +/ 2.4 
8.8 +/ 0.9 
4.8 +/ 0.5 
6.0 +/ 0.5 
T = 45 min 
32.2 +/ 1.8 
48.0 +/ 2.1 
36.5 +/ 2.4 
22.3 +/ 1.6 
14.6 +/ 1.2 
14.0 +/ 0.8 
T = 60 min 
41.0 +/ 1.8 
54.0 +/ 2.1 
38.5 +/ 2.0 
43.2 +/ 1.5 
36.6 +/ 1.4 
21.8 +/ 0.9 
T = 75 min 
46.3 +/ 2.0 
59.0 +/ 2.0 
38.3 +/ 2.1 
56.5 +/ 1.4 
51.5 +/ 1.6 
29.5 +/ 0.9 
T = 90 min 
51.0 +/ 1.9 
63.6 +/ 1.9 
40.8 +/ 2.2 
62.9 +/ 1.3 
53.5 +/ 1.4 
36.5 +/ 1.1 
Table 1b: Joint activation probability P_{AL}_{1} _{AL}_{2 }(T) displayed in % for two alleles AL_{1}, AL_{2}
E2 exp 1 
E2 exp 2 
E2 exp 3 
FV+E2 exp 4 
FV+E2 exp 5 
FV+E2 exp 6 

T= 0 
2.8 +/ 0.6 
11.4 +/ 1.3 
NA 
0.5 +/ 0.2 
0.3 +/ 0.2 
NA 
T= 15 min 
4.6 +/ 0.7 
16.0 +/ 1.5 
12.6 +/ 1.6 
0.7 +/ 0.2 
0.3 +/ 0.1 
0.2 +/ 0.1 
T= 30 min 
9.6 +/ 1.1 
23.3 +/ 1.7 
16.7 +/ 2.2 
3.0 +/ 0.6 
0.9 +/ 0.2 
1.2 +/ 0.2 
T= 45 min 
17.2 +/ 1.6 
32.7 +/ 2.2 
21.0 +/ 2.3 
12.7 +/ 1.4 
8.1 +/ 1.0 
4.7 +/ 0.5 
T= 60 min 
24.5 +/ 1.7 
38.0 +/ 2.3 
21.0 +/ 1.9 
29.0 +/ 1.4 
24.3 +/ 1.4 
8.5 +/ 0.7 
T= 75 min 
30.2 +/ 2.0 
41.8 +/ 2.2 
20.8 +/ 1.9 
40.5 +/ 1.5 
36.2 +/ 1.7 
13.3 +/ 0.7 
T= 90 min 
35.0 +/ 2.0 
47.2 +/ 2.3 
23.1 +/ 2.1 
47.4 +/ 1.5 
39.3 +/ 1.4 
18.3 +/ 1.0 
2.2 Statistical Validation of Dependency between GREB1 alleles activations after E2 treatment
For our probabilistic model F_{T} fitted by maximum entropy to actual GREB1 activation frequencies aggregated at the cell population level, our explicit computation of the probabilities P_{AL}_{1} (T) and P_{AL}_{1} _{AL}_{2 }(T) enabled us to test whether the activation of alleles AL_{1}, AL_{2} is statistically dependent of each other, and to quantify their statistical dependency. Indeed at time T, statistical independence for the activation of alleles AL_{1}, AL_{2} would classically imply the equality P_{AL}_{1} _{AL}_{2 }(T) > P_{AL}_{1} (T) × P_{AL}_{2 }(T), In our experiments this equality is significantly not satisfied, as validated by our detailed analysis of estimation errors on P_{AL}_{1} _{AL}_{2 }(T), P_{AL}_{1} (T), P_{AL}_{2 }(T) (see Methods MM5,MM6).
At time T, the conditional probability that {AL_{2} is active} given that {AL_{1} is active} is classically computed by prob_{T}(AL_{2} active AL_{1} active) = P_{AL}_{1}_{ AL}_{2}(T) / P_{AL}_{1} (T), For all our 6 experiments (see Figure 4), we have P_{AL}_{1} _{AL}_{2 }(T) > P_{AL}_{1} (T) × P_{AL}_{2 }(T) at all time points T ≥ 15min. This forces the conditional probability prob_{T}(AL_{2} active AL_{1} active) to be always larger than the unconditioned probability P_{AL}_{2}(T) = prob_{T}(AL_{2} active. This is a clear indicator of statistical dependency between the activations of AL_{1} and AL_{2}. Indeed one can quantify the level of dependency between activations of and by comparing the dependency ratio dep(T) = P_{AL}_{1}_{ AL}_{2}(T) / P_{AL}_{1} (T) × P_{AL}_{2}(T) to the baseline value 1. Our detailed statistical study of estimation errors on dep(T) (see Methods section MM6) shows that the inequality dep(T) > 1 is significantly valid at the 95% confidence level for all our E2 experiments as soon as T ≥ 15min, and for all our FV+E2 experiments when T ≥ 30 min. In this time range this proves significant statistical dependency between the activations of any alleles pair AL_{1}, AL_{2}. In fact, the dependency ratio dep(T) remains larger than 1.15 at all analyzed time points. Hence, when AL_{1} is active at time T, the conditional probability that AL_{2} is also active is at least 15% higher than the unconditioned activation probability for AL_{2}. Note that for initial times T = 0 or T = 15min, the probabilities P_{AL}_{1}_{ AL}_{2 }(T) and P_{AL}_{1} (T) × P_{AL}_{2 }(T) are typically too small for reliable estimation of the ratio dep(T).
Figure 4: For pairs of alleles AL_{1}, AL_{2}, the dependency ratio dep(T) is significantly larger than 1 at confidence level 95%, which indicates significant statistical dependency between GREB1 activations of and . We display the time course of dep(T) at all T ≥ 15min for three E2 experiments (see A), and at all T ≥30min for three FV+E2 experiments (see B). The vertical bars display the standard errors of estimation on dep(T). Note that the ratio dep(T) remains larger than 1.15 in these time ranges.
The probabilistic dependency between the activation states of two alleles can also be quantified by their Mutual Information MutInf_{AL}_{1}_{,AL}_{2}(T) (see formulas in Methods section MM6). Recall that MutInf_{AL}_{1}_{,AL}_{2}(T) ≥ 0 evaluates how knowing that is active at time T improves the accuracy of predicting whether AL_{2} is active at time T. Complete independence of AL_{1} and AL_{2} would imply MutInf_{AL}_{1}_{,AL}_{2}(T) = 0, so strictly positive values of MutInf_{AL}_{1}_{,AL}_{2}(T) indicate dependency between the activation of AL_{1} and AL_{2}. Since the population average probability of jointly activated alleles has full symmetry, all allele pairs AL_{i}, AL_{j} must have mutual information identical to MutInf_{AL}_{1}_{,AL}_{2}(T). We have computed MutInf_{AL}_{1}_{,AL}_{2}(T) for all experiments, and all T. As detailed in Methods section MM6, the generic formula giving MutInf_{AL}_{1}_{,AL}_{2}(T) involves terms such as PP_{AL}_{1} (T) log(P_{AL}_{1} (T)) and P_{AL}_{1}_{ AL}_{2}(T)log(P_{AL}_{1}_{ AL}_{2}(T)) for which the estimation errors become high when the activation probabilities P_{AL}_{1} (T) and P_{AL}_{1}_{ AL}_{2}(T) are very small. For (FV+E2) experiments, and for T ≤ 30 min, both P_{AL}_{1} (T) and P_{AL}_{1}_{ AL}_{2 }(T) are very close to 0, so that the natural estimates of MutInf_{AL}_{1}_{,AL}_{2}(T) become statistically reliable only for T ≥ 45min. For E2 experiments, since GREB1 transcription activity starts before T=0, one can reliably estimate MutInf_{AL}_{1}_{,AL}_{2}(T) as soon as T ≥ 15 min.
In Figure 5, we display the time course of mutual information MutInf_{AL}_{1,AL2}(T) for each one of our experiments.
For FV+E2 experiments as well as for E2 experiments, and for 45min ≤ T ≤ 90min, the values of MutInf_{AL}_{1,AL2}(T) roughly range from 0.04 to 0.09. For our ranges of mutual information values m = MutInf_{AL}_{1,AL2}(T), the dependency ratio dep(T) can be roughly approximated by (1 + sqrt[m / (1  p)]) where p = P_{AL}_{1} (T). This mathematical approximation valid for small “m” explains why small mutual information values reflect much more sizeable positive values for the difference [dep(T)  1]. The estimation errors indicate that for 45min ≤ T ≤ 90min and for all our experiments, the mutual information MutInf_{AL}_{1,AL2}(T) is significantly positive at the 95% confidence level. This confirms that, at the population level, we detect significant statistical dependency between jointly activated allele pairs within the same nucleus.
Figure 5: Mutual Information between pairs of alleles. For each one of our experiments, and at all time points T, we have computed the Mutual Information MutInf_{AL}_{1,AL2}(T) between the stochastic GREB1 activations of any given pair of alleles AL_{1}, AL_{2} within the same nucleus. We display the time course of MutInf_{AL}_{1,AL2}(T) for three E2 experiments and T ≥ 15 min (see A), as well as for three FV+E2 experiments and T ≥ 30min (see B). The vertical bars display the standard errors on MutInf_{AL}_{1,AL2}(T) and show that in these time ranges, the mutual information is always significantly positive with confidence level 95%, which indicates a statistically significant level of dependency between GREB1 activations for pairs of alleles AL_{1}, AL_{2}. At times T = 0, T = 15min for FV+E2 experiments, and at for E2 experiments, the joint probabilities AL_{1}, AL_{2} are too small to accurately compute MutInf_{AL}_{1,AL2}(T).
2.3 Population level stochastic model to emulate time course of GREB1 allele activation frequencies
We next sought to fit a population level stochastic model dedicated to emulating the dynamics of GREB1 transcriptional frequencies computed across large cell populations. Several papers (see 2,18,20) have modeled the dynamics of random gene transcription bursts observed in live single cells by stochastic &"twostates” promoter models, in which gene promoters are viewed as stochastic automatons randomly cycling through an ONstate and an OFFstate. In these studies, the parameters of twostates promoter models are separately fitted to each single cell continuously observed at very short time intervals. As explicitly pointed out by (2), the estimated parameters of these single cell models vary quite strongly (up to 20%) from cell to cell, due to heterogeneities in cells biology and/or in their local chemical environment. In our smFISH experiments, the frequency Q_{k}(T) of nuclei exhibiting &"k&" GREB1 activated alleles at time T is estimated by averaging across several hundred cells of pop(T). Since random gene transcriptions bursts are highly decorrelated from cell to cell and have short duration, averaging joint activations frequencies across pop(T) essentially smooths out the random GREB1 transcription bursts occurring in single cells. We have verified this intuitive point by simultaneous simulation of N=400 &"twostates” promoter models for GREB1 transcription, followed by averaging at each time T the GREB1 transcription bursts occurring at time T among these N simulations. In our experiments, which involve large cell populations, the observed frequencies Q_{k}(T) indeed have rather smooth time evolutions, as well as the probabilities P_{AL}_{1 }(T) and P_{AL}_{1}_{ AL}_{2}(T) derived from the frequencies Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T)
To emulate the time course of GREB1 transcription frequencies observed across each cell population pop(T), we introduce a population level stochastic model, where successive ER DNA binding occurs randomly after exponentially distributed waiting times that can be followed by coregulator recruitment and transcription initiation. At these random transcription initiation times, GREB1 mRNA elongation proceeds with fixed Mean Transcription Duration (MTD). Several studies (21,23,24) indicate that gene transcription occurs at a roughly constant speed of 2 to 2.5 kb/min, which results in an MTD ≈ 44min for GREB1. Random time durations between successive rounds of GREB1 mRNA elongation are assumed to be independent of each other, and to have the same exponential density with mean value A, which is a model parameter. Such random time gaps are characteristic of Poisson stochastic processes.
We denote &"nas&" any complete nascent GREB1 mRNA, and &"exonas&" / &"intnas&" the and intronic parts of nas. The (random) lifetimes of exonas and intnas are assumed to have exponential decay. The mean halflife of exonas has been empirically calculated via actinomycin D pulsechase experiments and is approximately 3 hours. As our experiments last 90 minutes, the exonas decay does not significantly affect nas visibility during this time. However, the intronic component intnas was calculated to have a mean lifetime min, which does directly affect the lifetimes of completed nascent mRNAs. The random lifetime of any complete nascent GREB1 mRNA (from completion to nearly full decay) is assumed to have an exponential density with unknown mean value L.
Our population level model is thus determined by 3 unknown parameters {A^{+}, A, L, MTD}. Since analysis of our smFISH images suggest that the smallest nascent mRNA spots may not be reliably detected, we introduce another unknown parameter, the Visibility Threshold VTH such that nascent mRNA spots are detectable on our images only if they contain at least VTH molecules. For any plausible values of {A^{+}, A, L, MTD, VTH}, this model enables rapid simulations generating frequency F_{AL}_{1 }(T) of activation at time T for a single allele. Quality of fit is evaluated by the differences F_{AL}_{1 }(T)  P_{AL}_{1 }(T) over a range of time points T, where the probability P_{AL1}(T)is derived as above from the frequencies Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T) computed via analysis of smFISH images.
We point out that our population level stochastic model does not attempt to model GREB1 transcriptions in single cells. Indeed, our model aims only to emulate the allele activation frequencies resulting from the aggregation (at cell population level) of the random allele activations generated by several hundreds of independent twostates stochastic models of GREB1 transcription activities, with twostates model parameters varying slightly from cell to cell.
2.4 Estimation of parameters for our population level model by intensive simulations
To fit the population level model to GREB1 transcription data provided by each FV+E2 experiment, we had to estimate the four parameters {A^{+}, A, L, MTD, VTH} which were expected to belong to naturally predefined ranges (see Methods, section MM8). But for E2 experiments with no flavopiridol pretreatment, spontaneous GREB1 transcription events at low rates can start long before E2 treatment at T = 0. Some of the alleles activated within the last hour before T = 0 will generate incomplete nascent mRNAs that will not be detectable at T = 0 and will only be detected by image analysis after T = 15 min or T = 30 min. Taking into account these nascent mRNAs whose transcription started before T=0 complicates the data analysis for E2 experiments with no FV pretreatment, and requires introducing a new parameter, namely the mean value A^{+} of waiting times between successive GREB1 transcription rounds before E2 treatment. As shown in [1,2], E2 treatment increases the frequency of genes transcriptions, so we should assume that A^{+} > A. Thus, fitting population level data for E2 experiments with no FV pretreatment requires the estimation of five parameters {A^{+}, A, L, MTD, VTH}.
For the unknown values of the parameters {A^{+}, A, L, MTD} natural ranges were identified from existing literature (see Methods, section MM8, MM9). For the small integer VTH, the potential range of values was evaluated by analyzing the rough number of molecules within mRNA smFISH spots detected in the images.
To actually fit the parameters of our population level models to each GREB1 experiment, we performed intensive simulations of this stochastic model to systematically explore the full discretized ranges of the 5 parameters {A^{+}, A, L, MTD, VTH}. For each combination of parameters values, and each time point T, these simulations yielded estimates for the activation frequency F_{AL}_{1 }(T) of single alleles across a large population of virtual cells. We only retained the model parameters with good fit to data, i.e., ensuring that F_{AL}_{1 }(T)  P_{AL}_{1 }(T) ≤ 3% where the probability P_{AL}_{1}(T) of single allele activation was derived as above from image analysis. We selected final parameters values by enforcing parameter stability across all 6 experiments for L,MTD,VTH. Tables 2 and 3 display the best fit parameter values for three experiments with no FV pretreatment, and for three FV pretreated experiments. Our best model parameters achieved a quality of fit ≈ 3% for each experiment, in good compatibility with the margins of error on the P_{AL}_{1 }(T) derived from image data.
Table 2: E2 experiments: Parameters estimates for three Population Level Models:
Parameter 
E2 exp. 1 
E2 exp. 2 
E2 exp. 3 
MTD = Mean Transcription Duration 
44 ± 2 min 
44 ± 2 min 
43 ± 2 min 
L = mean lifetime of nascent mRNA 
21 ± 1 min 
21 ± 1 min 
20.5 ± 1 min 
A = mean time between transcriptions (after E2 treatment) 
18 ± 1 min 
15 ± 1 min 
22 ± 1 min 
A^{+} = mean time between transcriptions (before E2 treatment) 
36 ± 2 min 
22 ± 1 min 
29 ± 1 min 
VTH = smallest # of RNA molecules to detect mRNA spots 
2 molecules 
2 molecules 
2 molecules 
Table 3: FV+E2 experiments: Parameters estimates for three Population Level Models
Parameter 
FV+E2 exp. 4 
FV+E2 exp. 5 
FV+E2 exp. 6 
MTD = Mean Transcription Duration 
44 ±1 min 
44 ± 1 min 
44 ± 2 min 
L = mean lifetime of nascent mRNA 
21 ± 1 min 
20.5 ± 1 min 
20.5 ± 1 min 
A = mean time between transcriptions (after E2 treatment) 
14.5 ± 1 min 
16 ± 1 min 
23.5 ± 1 min 
VTH = smallest # mRNA molecules to detect mRNA spots 
2 molecules 
2 molecules 
2 molecules 
These two tables exhibit good stability across all 6 experiments for the mean transcription duration MTD ≈ 44 min, the mean lifetime L ≈ 21 min of nascent mRNAs, and the minimum number VTH = 2 molecules of mRNA necessary to detect a nascent mRNA spot. The mean waiting time A between successive GREB1 transcription cycles after E2 treatment had a wider range between 15 min and 23 min among all 6 experiments. We have not identified the main factors influencing the waiting times A, but cell population heterogeneity is likely to strongly impact the variations in A observed from one experiment to the next. Indeed in (2), the authors mentioned that for their twostates model focused on GREB1 transcription observed on separate single cells, the estimated model parameters (such as our parameter A) did strongly vary from cell to cell, with relative variations up to 20%. It is also possible that A may not remain strictly constant in time during E2 induction.
3. Discussion:
Stimuluscontrolled gene transcription is one of the essential ways a cell senses and responds to environmental changes. While this process has been heavily studied in multiple models, a full understanding of how the regulation of events leading to gene transcription unfolds is constantly evolving. From numerous studies across species and models (18, 11, 14, 16, 18,19), it appears that cells respond to stimuli in a very heterogeneous manner and, even within the same nucleus, different copies of the same target gene respond asynchronously. Is this because of fully stochastic biological reactions or given the evolutionary development of regulated mechanistic steps in gene expression, is regulated allelic activation a way to finelytune individual cell responses to external causes. In our earlier study (8), we suggested cells can use an epigenetic mechanism to control the frequency of active alleles in the nucleus. Here, we focused on the same biological system, the hormone (E2) stimulated GREB1 gene (2,8,9) in MCF7 breast cancer cells to ask a few additional basic questions: 1) can we synchronize the response of individual alleles by altering transcription elongation? 2) can we determine if alleles in the same nucleus are acting independently or not? 3) can we develop a simplified model to emulate, at the cell population level, the first phases of hormonal response over time, with stability of the model parameters across independent biological replicates?
To address these questions, we compared GREB1 transcription activity in large cell populations under two types of initial conditions: 1) FV+E2 experiments where prior to E2 treatment of our cell populations at T = 0, transcription elongation was synchronized and then restarted by addition and washout of the reversible CDK9 inhibitor, flavopiridol (FV). 2) E2 experiments where at T = 0, cell populations are still in their natural random state after several hours without hormone treatment and with their transcription cycles left untouched. Our experimental data strongly indicate that “synchronizing” RNA Polymerase II at the elongation step is not sufficient to synchronize hormonal responses at the cellbycell or allelebyallele levels. Indeed, for the two types of initial cell population conditions, at the end time point (90 minutes post E2), identical values are reached by key characteristics such as the activation probability of each single allele and the joint activation frequencies for pairs or triplets of alleles.
At each time T, our detailed analysis of observed frequencies for alleles jointly activated by GREB1 nascent mRNA spots demonstrated a significant statistical dependency between pairs of activated alleles within the same nuclei. This led us to apply, at each time T, a principle of maximum entropy under constraints to compute a probability distribution F_{T} for the joint activation states of the four alleles within typical nuclei. We then used the joint probability F_{T} to compute the mutual information MutInf _{AL}_{1}_{, AL}_{2} (T) between GREB1 activations of alleles AL_{1} and AL_{2} in order to quantify the dependency between pairs of alleles. A detailed error analysis for the estimated MutInf _{AL}_{1}_{, AL}_{2} (T) showed that this mutual information had statistically significant positivity for all T ≥ 30 min, a clear indicator of moderate but significant dependency between activations for pairs of alleles. An interesting still open question is to identify biochemical factors enabling these dependencies, such as extrinsic chemical factors that can jointly affect all 4 alleles in each nucleus. Other celllinked factors affecting GREB1 transcription of all 4 alleles were also invoked in (2) to explain the high variation of transcription model parameters fitted separately to single cells.
The probability distributions F_{T} were computed at each fixed time T from population level frequencies of joint alleles activations. To emulate the time dynamics of these probabilities F_{T} across time, we introduced a &"population level&" stochastic model, where random initializations of GREB1 transcriptions are driven by a Poisson process, and are always followed by actual elongation. We were led to introduce this population level model instead of the popular twostates models used for single cell transcription data (1,2) because averaging GREB1 transcription activity across large cell populations strongly smooths out the random transcription bursts occurring independently among individual cells. Since our smFISH image acquisition modalities do not enable the monitoring across time of single cells GREB1 transcription activity, we designed our population level model to roughly emulate the superposition of several hundreds of independent twostates models of single cells gene transcription dynamics. For each one of our experiments (three FV+E2 experiments and three E2 only experiments) the parameters of our population level model were fitted to experimental data by intensive simulations exploring a very large set of combined parameter values. After this fitting of model parameters to data, the quality of fit was quite precise (less than 3% error on emulated frequencies of GREB1 activations), and across all 6 experiments we achieved good stability for the estimates of the three main parameters, namely the mean elongation duration MTD ≈ 44 min, the mean lifetime L ≈ 21min of nascent mRNAs, and the number VTH = 2 of mRNA molecules necessary for reliable detection of a nascent mRNA spot. The estimated mean waiting time A between successive GREB1 transcription after E2 treatment had a wider range (15min to 23min) among our 6 experiments. This variation is quite compatible with the 20% variations range reported in (2) for the parameters of twostates models fitted separately to single cells.
We expect our innovative modeling approach for hormoneregulated target gene activity observed at population level to be applicable for many other genes and stimuli, a point we intend to validate through further experiments. An interesting and open challenge is to concretely identify the main celldependent factors simultaneously impacting transcriptional responses at individual alleles within the cell nucleus.
4. Materials and methods
4.1 Cell culture, materials and treatments: MCF7 cells were obtained from BCM Cell Culture Core, which routinely validates their identity by genotyping; cultures are constantly tested for mycoplasma contamination as determined by DAPI staining. MCF7 were maintained in MEM plus 10%FBS media, as recommended by ATCC, except phenol red free and kept in culture for less than 60 days before thawing a fresh vial. Three days prior to experiments, cells were plated on round polyLlysine coated coverslips in media containing 5% charcoaldextran stripped and dialyzed FBScontaining media. Treatments with 17βestradiol (E2, Sigma) were performed as in [8]. For flavopiridol (FV+E2) experiments, cells were treated with FV 1µM for 2 hours, then removed and cells were washed 3x with media prior to E2 1nM treatment for the indicated times.
4.2 Single molecule RNA FISH (smFISH): GREB1 smFISH was performed as described in detailed protocols [8, 22]. Briefly, cells were fixed with 4% paraformaldehyde in PBS, on ice for 20 min. After a PBS wash, cells were left in 70% ethanol for a minimum of 4 hours prior to hybridization (o/n, 37C) with the previously validated GREB1 probe sets (LGC Biosearch Technologies) covering introns (Atto647N) and exons (Quasar570) of the GREB1 gene.
4.3 Imaging: High resolution imaging for smFISH was performed on a Cytivia DVLive epifluorescence image restoration microscope with an Olympus PlanApo 60×/42NA objective and a 1.9k × 1.9k sCMOS camera. Z stacks (0.25 µm) covering the whole nucleus (∼10 µm) were acquired before applying a conservative restorative algorithm for quantitative image deconvolution. Ten or more random fields of view (FOVs) were acquired for each time point.
4.4 Image analysis: Each FOV has three fluorescence channels (DAPI, Q570 (exons) and A647N (introns)) in a 3Dimage of size ≃ 1780 x 1780 x 25. Each 3D image channel was projected on its maximum intensity horizontal layer and then analyzed as a 2D image. In the DAPI channel, we first detect and identify cell nuclei. The main steps are: contrast thresholding, connected components detection, elimination of holes, and size filtering. After dilating the detected nuclear mask, we estimate cytoplasm boundaries by the &"watershed&" segmentation algorithm.
Classical image segmentation techniques are applied in the two other channels to separately detect exonic and intronic spots. Contrast analysis is implemented by OTSU thresholding [27] for the exonic channel (Q570), and by maxentropy thresholding [28] for the intronic channel (A647N). We tested moderate variations of these thresholds to define accuracy margins for these two key spots detections. Nascent mRNAs spots are detected within each nucleus by identifying all pairs (exonic spot + intronic spot) having nonempty intersection. At each time point, the number of active alleles detected per nucleus ranges from 0 to 4 since each cell has 4 GREB1 alleles [29]. Segmentation errors due to local cell packing or nuclei overlaps may occasionally generate spurious detections of more than 4 activation spots in a very small percentage of nuclei, which are then automatically discarded. Mature RNAs are identified as exonic spots (Q570) located within the cytoplasmic mask.
4.5 Probability distribution of joint alleles activations at fixed time T:
A key modeling step was, at each fixed time T, to estimate the probabilities of joint alleles activation for pairs, triplets, quadruplets of alleles within a cell. We could only aim to estimate averages over pop(T) for each one of these joint probabilities, since in our experimental setup, distinct alleles are not identifiable. At time T, in any given cell, each allele AL_{j} (j = 1, 2, 3, 4) can either be active (ON = state &"1&"), or not (OFF = state &"0&"). The joint state of the four alleles AL_{1}, AL_{2}, AL_{3}, AL_{4} is then described by a fourdigit binary code, with 2^{4} = 16 possible joint states labeled S_{0 } = 0000, S_{1 } = 0001, S_{2} = 0010, S_{3 }= 0011, S_{4 }= 0100, …, S_{14 } = 1110, S_{15 }= 1111
At time T, the cell population pop(T) contains N = N (T) nuclei, denoted NUC_{1},…, NUC_{N}. For each nucleus NUC_{N}, the current joint state for the four alleles will be equal to S_{k} , with some unknown probability prob_{n}(S_{k}). The 16 probabilities prob_{n} (S_{0}), prob_{n} (S_{1}) ,…, prob_{n} (S_{15}) add up to 1, and depend on the time point T. But due to biochemical heterogeneity of the cells in the population at time T, the prob_{n} (S_{k}) may also depend on unknown biochemical factors specific to nucleus NUCn. Since the observed frequencies Q_{k} (T) of nuclei exhibiting k activated alleles at time T are computed across the whole of pop(T), they only provide reliable information on the average F_{T }(S_{k}) of the probabilities prob_{n} (S_{k}) over all nuclei indices n = 1…N. More precisely, for each possible joint state S_{k} _{ }of the four alleles, we want to estimate the average probability.
In a perfectly homogeneous cell population pop(T), the probabilities prob_{n} (S_{k}) would not depend on the nucleus NUCn at all and would hence also be equal to the average probability F_{T} (S_{k}). This ideal situation is blurred by the significant biological diversity of GREB1 transcriptional response from cell to cell, a point well documented in (2).
The frequencies F_{T} (S_{k}) are not directly observable, since smFISH images do not enable specific matching of alleles AL_{1}, AL_{2}, AL_{3}, AL_{4} from cell to cell. But as mathematically detailed in Methods and in Suppl. Materials 1 the observed activation frequencies Q_{k }(T) are linked to the unknown frequencies:
F_{T} (S_{0}) = F_{T} (0000), F_{T} (S_{1}) = F_{T} (0001) , …, F_{T} (S_{15}) = F_{T} (1111) by the following five linear relations:
Equation 1
Q_{0} (T) = F_{T }(0000)
Q_{1 }(T) = F_{T} (1000) + F_{T }(0100) + F_{T} (0010) + F_{T} (0001)
Q_{2} (T) = F_{T} (1100) + F_{T }(1010) + F_{T} (1001) + F_{T} (0110) + F_{T} (0101) + F_{T }(0011)
Q_{3} (T) = F_{T} (1110) + F_{T }(0111) + F_{T} (1011) + F_{T} (1101)
Q_{4 }(T) = F_{T} (1111)
Since these five linear relations cannot determine the 16 unknown probabilities F_{T }(S_{k}), we first checked if we could assume independence of alleles activation. Under the probability F_{T} of joint alleles activations, denote f_{j }(T) the probability that the specific allele AL_{j} is activated. Assume temporarily that under F_{T} the activations of each allele AL_{1}, AL_{2}, AL_{3}, AL_{4} are statistically independent of each other (i.e. there are no mechanisms through which alleles interfere / influence each other’s activations). Independence implies that each unknown joint probability F_{T} (S_{k}) can be expressed directly in terms of f_{1 }(T), f_{2 }(T), f_{3 }(T), f_{4 }(T) by simple product formulas such as
F_{T}(0100) = (1  f_{1 }(T)) f_{2 }(T) (1  f_{3}(T)) f_{4}(T), F_{T} (1110) = f_{1}(T) f_{2}(T) f_{3}(T) (1  f_{4} (T)), …, etc.
Combining these product formulas with equation 1 we have formally proved (see Suppl. Materials 2) that statistical independence of single allele activations under the joint probability F_{T} forces the following polynomial equation of degree 4
Q_{0 }(T) z^{4}  Q_{1 }(T) z^{3} + Q_{2} (T) z^{2}  Q_{3 }(T) z + Q_{4 }(T) = 0 Equation 2
to have four positive and real valued solutions z_{1},z_{2},z_{3},z_{4}, We also showed that f_{1}(T), f_{2}(T), f_{3}(T), f_{4}(T) are then given by f_{j}(T) = z_{j} / (1 + z_{j })for j = 1,2,3,4.
Requiring a polynomial of degree 4 to have four positive and real valued roots z_{1}, z_{2}, z_{3}, z_{4} imposes very restrictive polynomial constraints on the polynomial coefficients Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T).
In Suppl. Materials 2, we give examples of these polynomial constraints. Our experiments showed very clearly that these constraints are never satisfied by the observed Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T), and hence that, at the level of cell populations averages, one had to reject the hypothesis of statistical independence between activations of distinct alleles.
Maximum entropy model and dependency between alleles activations: Because full independence of the four alleles is not compatible with our experimental data, we computed, for each T, the joint probability F_{T} which minimizes dependency between alleles activation, and is still compatible with the observed Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T) via the linear relations imposed by Equation 1. For probability distributions verifying a set of linear constraints, a fairly generic principle is that minimizing dependencies is roughly equivalent to maximizing entropy. Recall that for any probability F = [F_{0},…,F_{15}] on the set S = [S_{0},…,S_{15}] of joint alleles activation, the entropy Ent(F) ≥ 0 of F is given by Ent (F) =  F_{0 }log (F_{0})  F_{1 }log (F_{1})  … F_{15 }log (F_{15}).
In Suppl. Materials 3, we apply this principle to compute the unique probability of joint alleles activation frequencies, denoted F_{T} = [F_{T}(S_{0}),…, F_{T}(S_{15})] which has maximum entropy among all probabilities compatible with the observed frequencies Q_{0}(T), Q_{1}(T), Q_{2}(T), Q_{3}(T), Q_{4}(T) via the five linear relations of Equation 1. Our formulas show that the maxiTmum entropy joint probability F_{T} must have full symmetry, meaning that permutations of the alleles AL_{1}, AL_{2}, AL_{3}, AL_{4} do not change the frequencies of their joint activation states. Indeed, we have the explicit formulas
F_{T}(0000) = Q_{0} (T)
F_{T}(1000) = F_{T}(0100) = F_{T}(0010) = F_{T}(0001) = Q_{1} (T) / 4
F_{T}(1100) = F_{T}(1010) = F_{T}(1001) = F_{T}(0110) = F_{T}(0101) = F_{T}(0011) = Q_{2} (T) / 6
F_{T}(1110) = F_{T}(0111) = F_{T}(1011) = F_{T}(1101) Q_{3} (T) / 4
F_{T} (1111) = Q_{4} (T)
Since in our smFISH experimental images it is not possible to match specific alleles from cell to cell, full symmetry for the average probability F_{T} of joint alleles activations is a natural feature for compatibility with our image data.
Fix any single allele AL_{1}. Due to the full symmetry of F_{T}, the frequency
P_{AL}_{1} (T) = F_{T}{allele AL_{1} is active at time T}
takes the same value for all four alleles, namely (see Suppl. Materials 3),
P_{AL}_{1} (T) = Q_{1}(T)/4+ Q_{2 }(T) /2 + 3Q_{3 }(T) /4 + Q_{4} (T) Equation 3
Consider now two alleles AL_{1} and AL_{2} in the same nucleus. At time T, the (random) state of AL_{1} is either &"active&" or &"inactive&", which we denote by AL_{1} = 1 or AL_{1} = 0. Same remark for the random state of AL_{2}. The mutual information MutInf_{AL}_{1}_{,}_{AL}_{2}(T) between the binary valued random states of AL_{1} and AL_{2} is given by the generic formula MutInf _{AL}_{1}_{,}_{AL}_{2 }(T) = MutInf _{AL}_{2,}_{ AL}_{1 }(T) = Ent(AL_{1}) + Ent(AL_{2})  Ent(AL_{1},AL_{2}) ≥ 0
where &"int&" denotes the entropy of a probability distribution. Higher values of MutInf _{AL}_{1}_{,AL}_{2} (T) indicate higher dependency between the random activation states of AL_{1}, AL_{2} under the probability F_{T}. The maximum possible value of MutInf _{AL}_{1}_{,AL}_{2} (T) is 0.69, which can only be reached if there is a fully deterministic relation between the random activations of alleles AL_{1} and AL_{2}. But MutInf _{AL}_{1}_{,AL}_{2} (T) values higher than 0.02 already indicate some significant level of dependency between and . Conversely, values of MutInf _{AL}_{1}_{,AL}_{2} (T) extremely close to 0 reflect near independence between the activation states of AL_{1} and AL_{2}. At time T, the pair (AL_{1}, AL_{2}) has 4 possible joint states (00), (01)(10), (11). Their frequencies f_{T}00, f_{T}01, f_{T}10, f_{T}11, are easily derived from the explicit expression (see equation 3) of the probability F_{T}, which yields
The probability P_{AL}_{1},_{AL}_{2}(T) = f_{T}11 that AL_{1} and AL_{2} are simultaneously active at time T is hence given by
Then the joint entropy Ent(AL_{1}, AL_{2}) is given by the following Equation 6
Ent(AL_{1}, AL_{2}) =  f_{T}00log(f_{T}00)  f_{T}01log(f_{T}01)  f_{T}10log(f_{T}10)  f_{T}11log(f_{T}11)
By definition of entropy, one has also Equation 7:
Ent(AL_{1}) = Ent(AL_{2}) =  P_{AL}_{1} (T) log(P_{AL}_{1} (T))  (1  P_{AL}_{1} (T)) log(1  P_{AL}_{1} (T))
The mutual information MutInf _{AL}_{1,AL2 }(T) between the random activations of and can then be computed by
MutInf _{AL}_{1,AL2 }(T) = 2 Ent(AL1)  Ent (AL_{1},AL_{2}) Equation 8
The equations 4,5,6,7,8 clearly express MutInf _{AL}_{1}_{, AL}_{2} (T) in terms of the 5 observed activations frequencies Q_{0}(0), Q_{1}(0), Q_{2}(0), Q_{3}(0), Q_{4}(0). Due to the full symmetry of joint probability F_{T}, the mutual information MutInf _{AL}_{1}_{, AL}_{2} (T) will be the same for all pairs of alleles (AL_{i}, AL_{j}) and this common value quantifies the average amount of activation dependency between pairs of alleles at time T.
4.6 Modeling the time course of GREB1 transcription frequencies observed at population level:
Since genes transcriptions are strongly decorrelated from cell to cell, the random transcription bursts occurring among (for instance) the 400 cells of population pop(T) will be highly desynchronized. Hence, averaging the random bursts that actual occur at fixed time T will clearly smooth out the impact of random bursts on the allele activation frequencies Q_{k}(T), observed across pop(T). We have validated this point by simulations of 400 twostates stochastic models of GREB1 transcription in single cells, and averaging at each time T the nascent mRNA outputs of these independent 400 models. As expected, in population averaged transcription activity, short transcription bursts were essentially no longer identifiable. So, to emulate the time course of our population averaged GREB1 transcription data, we have introduced a population level stochastic model.
In this simplified model successive GREB1 transcriptions are initialized at random times t_{1} < t_{2 }< … < t_{n}. Each initialized transcription launches the elongation of a GREB1 mRNA molecule at fixed linear speed and is completed after a fixed Mean Transcription Duration MTD which for GREB1 is ≈ 44 min. The time intervals (t_{k+1 } t_{k }are random and assumed to be independent and to have the same exponential density with unknown mean value A. The successive occurrences t_{k} of transcription initializations define then a Poisson stochastic process. The complete nascent mRNA &"nas_{k}&" generated by GREB1 transcription started at time t_{k} will then become fully visible at time (t_{k} + MTD). Both exonic and intronic parts of nas_{k}, denoted exonas_{k} and intnas_{k}, are naturally assumed to have exponential decay. The mean halflife of exonas_{k} is about 3 hrs, and hence does not impact the visibility of nas_{k} during the timecourse 90min of E2 treatment. However, the mean lifetime of intronic intnas_{k} is known to be shorter than ≈ 35 min, and hence directly affects the first visibility time of nas_{k} . The random lifetime of intnas_{k} from completion of to nearly full decay of intnas_{k} is assumed to have an exponential density with unknown fixed mean {Our population level stochastic model thus has only 3 key parameters {A, L, MTD}. But to take into account the limits imposed by image resolution, we introduce another integer valued parameter, the unknown Visibility Threshold VTH such that nascent mRNA spots are detectable only if they contain at least VTH molecules (after simulations outlined below, we obtained the estimate VTH = 2).
Our population level model is easy to simulate, and we have fitted its 4 parameters {A^{+}, A, L, MTD, VTH} to each FV+E2 experiment by intensive simulations as outlined below. For E2 experiments with no FV pretreatments, we must take account of actual GREB1 transcription activity occurring at low frequency in our cell populations during the last hour before the time T= 0 of E2 treatment. This requires the introduction of another parameter, namely the mean waiting time A^{+} between successive transcription initiations occurring before time T = 0.
4.7 Simulations and model fitting:
To fit our stochastic population level model to experimental data we performed intensive simulations to select the parameters {A^{+}, A, L, MTD, VTH} providing the best quality of fit to our smFISH image data. These parameters were constrained to have naturally predefined ranges:
 mean transcription duration MTD: 40min < MTD < 50 min since RNA Polymerase II speed and GREB1 length ≈110 kb
 mean lifetime L of nascent mRNAs: 5min < L < 35 min, based on actinomycin D pulsechase experiments
 mean waiting time A between rounds of transcription after T = 0: 5min < A < 35 min, based upon the on/off time ranges evaluated in (2)
 mean waiting time A^{+} > A between rounds of transcription before = 0: A^{+} < 60 min, based upon analysis of initial activations frequencies Q_{0}(0), Q_{1}(0), Q_{2}(0), Q_{3}(0), Q_{4}(0). Note: A^{+} is used only for experiments with no FV pretreatment.
The minimum number VTH of molecules needed for reliable detection of nascent mRNA spots had to be crudely precalibrated by image analysis. As detailed in section MM9 below, and similarly to (2), we calculated the integrated intensities of mature, cytoplasmic mRNA spots to roughly evaluate the number of molecules per detected nascent mRNA spot. This yielded a rough preliminary range 1 ≤ VTH ≤ 10 molecules per detectable spot.
These parameters ranges were discretized into finite grids with accuracies of 0.5 min to 1 min for all time variables. This gave us a multigrid of roughly 10^{6} possible parameter vectors PAR = {A^{+}, A, L, MTD, VTH}. For each potential vector PAR, we performed a first set of 1000 simulations of our population level model. Each such simulation outputs a random number NAS(T) of nascent mRNAs present at time T on a single virtual allele AL_{1}. Among the 1000 simulated NAS (T), we compute the percentage F_{AL}_{1 }(T) of integers NAS (T) which are larger than the visualization threshold VTH. Then F_{AL}_{1 }(T) is the model generated frequency of detectable single allele activations. For each experiment and each vector PAR we then evaluate the quality of fit between model and data by the distance dist (model, data) = max_{over all T }F_{AL}_{1 }(T)  P_{AL}_{1 }(T) 
For the three FV+E2 experiments, the early values P_{AL}_{1}(0), P_{AL}_{1}(15min), P_{AL}_{1 }(30min) were practically 0 up to errors of estimations (0.03), and the simulated F_{AL}_{1}(0), F_{AL}_{1}(15min), F_{AL}_{1 }(30min) were identically zero since MTD was known to be of the order of 44 min. Therefore, for FV+E2 experiments, the distance between model and data was actually replaced by dist (model, data) = max_{over all T ≥ 45min }F_{AL}_{1}(T)  P_{AL}_{1}(T) 
For each experiment, the best choices for the model parameters vector PAR are obtained by optimizing the quality of fit, i.e. by minimizing the distance dist (model, data).
We implemented the stochastic simulations of our population level model by the following algorithm. We first generate the times t_{k} by standard simulation of the sequence of independent random waiting times (t_{k}_{+1 } t_{k}) having the same exponential density with mean A. Elongation of the nascent mRNA nas_{k} begins at t_{k} and is completed at time (t_{k} + MTD).
The random lifetime U_{k} of nas_{k} is provided by a separately simulated sequence of independent random lifetimes U_{k} having the same exponential density with mean L. Then at time T, the number of present at time T is determined by the number of such that t_{k} + MTD < T < t_{k} + MTD + U_{k}. This simulation algorithm is naturally faster than the Gillespie algorithm used for more complex stochastic models. Our Python simulation code is accessible in the publicly available software package on GitHub (https://github.com/smahmoodghasemi/BCM). This &"brute force&" approach to model fitting required intensive computing and was implemented on the &"Sabine&" multicore computing center at University of Houston. Once the simulations have been completed for models, this large set of simulations outputs can be reused as a fixed massive lookup table for all our past or future experiments. After these simulations were successfully completed, we also outlined a more efficient computing approach which could be used in similar explorations for other genes. Namely, one can implement a multiscale exploration starting with a cruder grid of parameters vectors, and then focus on finer mesh grids tightly centered around promising first level estimates.
4.8 Estimation of number of molecules within detected nascent mRNA spots:
For each nascent mRNA spot &"nas&" detected within the nuclei present in an image J, we compute the integrated exonic intensity EXO (nas) as the sum of image intensities EXO(x) over all exonic pixels x of &"nas&". For each mature mRNA spot &"mat&" detected within the cytoplasm of all cells present in image J, we also compute the integrated exonic intensity EXO(mat) as the sum of image intensities EXO(x) over all pixels x of &"mat&". We then compute the median MED of these integrated intensities EXO(mat) over all mature RNA spots detected in image J.
In the spirit of an approach explored in [2], the number MOL(nas) of RNA molecules present within any detected nascent mRNA spot &"nas&" is crudely estimated by the ratio MOL(nas) = EXO (nas) / MED. This can only be a very rough calibration of MOL(nas) since the observed EXO(mat) values have a high dispersion around their median MED, even at fixed time T. Nevertheless, to evaluate reasonable ranges for our model parameter VTH (Visualization Threshold) on any given image J, we have computed the low quantiles for the histogram of all MOL(nas) values extracted by computer analysis of image J. These low quantiles identified a potential range of 2 to 5 for the minimum number VTH of RNA molecules necessary to detect a nascent mRNA spot in our images. Since the error margins for these estimates were likely to be high, we simply assigned a much wider potential range of 1 to 10 for the unknown parameter VTH. After fitting of our population level to experimental data, our results reported above showed that the best estimate of VTH was always VTH = 2.
4.9.1 Estimation errors for frequencies Q_{k}(T) and probabilities P_{AL}_{1}(T), P_{AL}_{1}_{,AL}_{2}(T) :
Tables given above and Table 6 in Supplemental Materials display the computed estimation errors for P_{AL}_{1}(T), dep(T),MutInf_{AL}_{1}_{,AL}_{2}_{(T),} P_{AL}_{1},_{AL}_{2}(T). Here we outline how these errors are computed. Let N(T) = # cells in population pop(T). Use shorter notations Q_{k} for Q_{k}(T), P_{AL}_{1}, for P_{AL}_{1} (T), P_{AL}_{1,}_{AL}_{2} for P_{AL}_{1}_{,AL}_{2}(T).
Let ΔQ_{k}, ΔP_{AL}_{1}, ΔP_{AL}_{1}, _{AL}_{2} be the corresponding random errors of estimation. The 5x5 covariance matrix cov_{Q} of the 5 errors [ΔQ_{0,} ΔQ_{1},…, ΔQ_{4}] is classically given by (i, j)
cov_{Q} (i, j) = cov ([ΔQ_{i}, ΔQ_{j }) =  Q_{i},Q_{j} / N (T) for all i ≠ j
cov_{Q} (i, i) = var(ΔQ_{i}) = Q_{i } (1  Q_{i }) / N(T)
Denote Q the column vector [Q_{0}; Q_{1 }; …; Q_{4}] and for any matrix H denote H^{tr} the transpose of H. Due to Equations 3 and 5, the formulas for P_{AL}_{1} and P_{AL}_{1} _{AL}_{2} can be rewritten in matrix form as
P_{AL}_{1 = }u * Q and P_{AL}_{1} _{AL}_{2 = v * Q} Equation 9
Where u = [0,1/4,1/2,3/4,1] and v = [0,0,1/6,1/2,1]
Known statistical formulas then give the variances of ΔP_{AL}_{1} and ΔP_{AL}_{1}, _{AL}_{2} as var(ΔP_{AL1}) = u * cov_{Q} * u^{tr}
and var(ΔP_{AL}_{1}, _{AL2}) = v * cov_{Q} * v^{tr}
4.9.2 Estimation errors for the dependency ratio dep(T) = P_{ AL}_{1,}_{ AL}_{2} / P _{AL}_{1}× P_{ AL}_{2 }
Due to Equation 9 we have dep(T) = v* Q/(u * Q)^{2} which is a nonlinear function K (Q) of Q. The variance of the random error Δdep(T) in the estimation of dep(T) is then classically given by var[Δdep(T)] = w * covQ * w^{tr} where w is the gradient of K (Q) with respect to Q. This gradient is given by w = (1/u * Q)^{2} * v  2[v * Q/(u * Q)^{3}]* u which completes the computation of var [Δdep(T)].
4.9.3 Estimation errors for the mutual information MutInf _{AL}_{1}_{, AL}_{2 }_{(T)}
At any given time T, the pair of alleles (AL_{1}, AL_{2}) can be in one of their four joint activation states (00), (01), (10), (11) with corresponding joint probabilities given above by Equation 4. These formulas can be rewritten in matrix form as
f_{T}00 = V00 * Q, f_{T}01 = V01 * Q, f_{T }10 = V10 * Q, f_{T}11 = V11 * Q
where the line vectors are given by
V00 = [1,1/2,1/6,0,0], V01 = V10 = [0,1/4,1/3, 1/4, 0], V11 = [0,0,1/6,1/2,1],
The entropies E = Ent (AL_{1}) = Ent (AL_{2}) and E_{12} = Ent (AL_{1}, AL_{2}) are given by
E =  P_{AL}_{1} (T) log (P_{AL}_{1} (T))  (1  P_{AL}_{1} (T)) log (1  P_{AL}_{1} (T))
E_{12 } =  f_{T }00 log (f_{T}00)  f_{T }01 log (f_{T}01)  f_{T }10 log(f_{T}10)  f_{T }11 log (f_{T}11)
which can be rewritten as
E =  (u * Q) log (u * Q)  (1  u * Q) log ((1  u * Q)
E_{12 }=  [(V11 * Q) log (V11 * Q) + (V00 * Q) log (V00* Q) + 2(V10 * Q) log (V10 * Q)]
The information M = MutInf _{AL}_{1}_{ AL}_{2 }_{(T)} is then given by M = 2E  E_{12} which is a non linear function M = G (Q). The random estimation error ΔM on M has variance var (ΔM) which can be computed as above using the gradient grad (G) of the function G (Q) with respect to Q. We have the classical formula
var (ΔM) = grad (G) * COV_{Q} * grad (G)^{tr}
Since grad (G) = 2grad (E)  grad (E_{12}), we compute the gradients of and from the preceding formulas to get
grad (E) = [ log (u * Q) + log (1  u * Q) ] * u
grad (E12) =  [1 + log(V11 * Q)] * V11  [1+ log(V00 * Q)] * V00  2 [1+ log(V10 * Q)] * V10
This clearly completes the computation of var(ΔM).
Acknowledgments
Imaging for this project was supported by the Integrated Microscopy Core at Baylor College of Medicine and the Center for Advanced Microscopy and Image Informatics (CAMII) with funding from NIH (DK56338, CA125123, ES030285), and CPRIT (RP150578, RP170719), the Dan L. Duncan Comprehensive Cancer Center, and the John S. Dunn Gulf Coast Consortium for Chemical Genomics. Modeling and computing research for this project at University of Houston were supported by CAMII and Baylor College of Medicine with funding from CPRIT (RP170719). Intensive Computer Simulations for this project were implemented at the SABINE Cluster of RCDC (Research Computing Data Core, University of Houston) under a CPUGPU computing time allocation to the University of Houston Department of Mathematics.
References
 Rodriguez J, Ren G, Day CR, et al. Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell 176 (2019): 213226.
 Fritzsch C, Baumgärtner S, Kuban M, et al. Estrogendependent control and celltocell variability of transcriptional bursting. Mol Syst Biol 14 (2018): 7678.
 Raj A, Peskin CS, Tranchina D, et al. Stochastic mRNA synthesis in mammalian cells. PLoS Biol 4 (2006): 17071719.
 Sepúlveda LA, Xu H, Zhang J, et al. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Sci 351 (2016): 12181222.
 Bohrer CH, Larson DR. The Stochastic Genome and Its Role in Gene Expression. Cold Spring Harb Perspect Biol 13 (2021): 040386.
 Bahar Halpern K, Tanami S, Landen S, et al. Bursty gene expression in the intact mammalian liver. Mol Cell 58 (2015): 147156.
 Battich N, Stoeger T, Pelkmans L. Control of Transcript Variability in Single Mammalian Cells. Cell 163 (2015): 1596610.
 Stossi F, Dandekar RD, Mancini MG, et al. Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Res 48 (2020): 18001810.
 Nwachukwu JC, Srinivasan S, Zheng Y, et al. Predictive features of ligandspecific signaling through the estrogen receptor. Mol Syst Biol 12 (2014): 864.
 Vera M, Tutucci E, Singer RH. Imaging Single mRNA Molecules in Mammalian Cells Using an Optimized MS2MCP System. Methods Mol Biol 2038 (2019): 320.
 Hocine S, Raymond P, Zenklusen D, et al. Singlemolecule analysis of gene expression using twocolor RNA labeling in live yeast. Nat Methods 10 (2013): 119121.
 Schmidt A, Gao G, Little SR, et al. Following the messenger: Recent innovations in live cell single molecule fluorescence imaging. Wiley Interdiscip Rev RNA 11 (2020): 1587.
 Paulsson J. Prime movers of noisy gene expression. Nat Genet 37 (2005): 925936.
 Suter DM, Molina N, Naef F, et al. Origins and consequences of transcriptional discontinuity. Curr Opin Cell Biol 23 (2011): 657662.
 Harper C v, Finkenstädt B, Woodcock DJ, et al. Dynamic analysis of stochastic transcription cycles. PLoS Biol (2011): 9(4).
 Dar RD, Razooky BS, Singh A, et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc Natl Acad Sci U S A 109 (2012): 1745417459.
 Larson DR, Fritzsch C, Sun L, et al. Direct observation of frequency modulated transcription in single cells using light activation. Elife 2 (2013): 00750.
 Zoller B, Nicolas D, Molina N, et al. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol Syst Biol 11(2015): 823.
 Chubb JR, Trcek T, Shenoy SM, et al. Transcriptional pulsing of a developmental gene. Curr Biol 16 (2006): 10181025.
 Ball AD, Adames NR, Reischmann N, et al. Measurement and modeling of transcriptional noise in the cell cycle regulatory network. Cell Cycle 12 (2019): 118.
 Jonkers I, Kwak H, Lis JT. Genomewide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. Elife 3 (2014): 024027.
 Mistry RM, Singh PK, Mancini MG, et al. Single Cell Analysis Of Transcriptionally Active Alleles By Single Molecule FISH. J Vis Exp 163 (2020): 115.
 Ardehali MB, Lis JT. Tracking rates of transcription and splicing in vivo. Nat Struct Mol Biol 16 (2009): 11231134.
 Danko CG, Hah N, Luo X, et al. Signaling pathways differentially affect RNA polymerase II initiation, pausing, and elongation rate in cells. Mol Cell 50 (2013): 212222.
 Guyon X. Random fields on a network: modeling, statistics, and applications. Springer (1995).
 Chalmond B. Modeling and Inverse Problems in Imaging Analysis. Springer New York 155 (2003).
 Otsu N. A Threshold Selection Method from GrayLevel Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9(1979): 6266.
 Wong AKC, Sahoo PK. A graylevel threshold selection method based on maximum entropy principle. IEEE Transactions on Systems, Man, and Cybernetics 19 (1989): 866871.
 Kocanova S, Kerr EA, Rafique S, et al. Activation of estrogen responsive genes does not require their nuclear colocalization. PLoS Genet 6 (2010): 1000922.
E2 experiment #1
T in minutes 
T = 0 
T = 15 
T = 30 
T = 45 
T = 60 
T = 75 
T = 90 
77 
67.3 
53.5 
39.6 
30 
26 
22 

15 
17.8 
19.8 
19.8 
18 
17 
16 

5 
9.9 
15.8 
19.8 
21 
19 
18 

2 
4 
7.9 
13.9 
20 
22 
24 

1 
1 
3 
6.9 
11 
16 
20 
E2 experiment #2
T in minutes 
T = 0 
T = 15 
T = 30 
T = 45 
T = 60 
T = 75 
T = 90 
45.5 
37.6 
31 
26 
19.8 
14 
11.9 

25.7 
23.8 
20 
15 
13.9 
13 
9.9 

14.9 
18.8 
20 
19 
19.8 
20 
18.8 

9.9 
13.9 
18 
21 
23.8 
29 
30.7 

4 
5.9 
11 
19 
22.8 
24 
28.7 
FV+E2 experiment #4
T in minutes 
T = 0 
T = 15 
T = 30 
T = 45 
T = 60 
T = 75 
T = 90 
91 
85.9 
78 
59 
32.3 
18 
13.9 

6 
10.1 
13 
15 
15.2 
13 
9.9 

3 
4 
6 
10 
16.2 
18 
16.8 

0 
0 
2 
10 
20.2 
27 
29.7 

0 
0 
1 
6 
16.2 
24 
29.7 
FV+E2 experiment #5
T in minutes 
T = 0 
T = 15 
T = 30 
T = 45 
T = 60 
T = 75 
T = 90 
90.1 
88 
84.7 
71.7 
41.6 
23 
23.8 

7.9 
10 
12.2 
12.1 
13.9 
14 
11.9 

2 
2 
2 
6.1 
14.9 
19 
15.8 

0 
0 
1 
6.1 
15.8 
22 
23.8 

0 
0 
0 
4 
13.9 
22 
24.8 