All data in Cyclebase are normalized in the same manner.
Expression values: The expression values were initially log2 transformed. In order to center the profile at zero, the mean expression value was subtracted over all time points. To make the y-axes comparable between experiments, the expression values were all normalized so that the standard-deviation is one across the entire experiment.
Time-scale: The original time-scale in minutes from the experiment start was first normalized with the interdivision time (the time it takes to complete a cell cycle). This creates a time-scale in percent of a cell-cycle. Since different experimental methods release cells from different points, each experiment was shifted such that Zero always corresponds to the time of cytokinesis (M/G1 transition).
The rank orders each gene of an organism by a score we have assigned based on its pattern of expression and magnitude of regulation. Those genes with the highest periodicity and that are most regulated are given the best ranks (lowest number).
Calculating the rank combines both the P-value for regulation and P-value for periodicity. First, a P-total value is calculated by multiplying P(per) with P(reg). As this P-total value can unfairly favor a single gene because of only one of the values, the combined score penalizes genes that are not both regulated and periodic:
This combined score is sorted and the genes are given their rank based on this order.
To calculate the total rank for a single gene across all available experiments, the total P-value for regulation and total P-value for periodicity are used (in the combined score equation) instead of just the single experiment P-values.
The P-value for periodicity is the chance of observing as great a periodicity by random shuffling of the individual time-point values of the expression profile. A small P(per) value therefore implies a highly periodic pattern of expression.
In order to calculate the P-value for periodicity, a Fourier score was obtained for each profile. This Fourier score is defined as:
Next, 1,000,000 artificial profiles were generated from random shuffling of the data within the original profile. The fraction of random profiles whose Fourier scores were greater than or equal to the gene's real Fourier score was then normalized to create the final P-value for periodicity.
The total P(per) value for a single gene across all available experiments is computed by multiplying all of the P(per)-values for each experiment.
The P-value for regulation estimates the chance that the magnitude of regulation will have occurred at random. A small P(reg) value therefore implies a strongly regulated gene.
In order to calculate the P-value for regulation, the standard deviation was obtained for the log-ratio profile. Next, 1,000,000 random profiles generated from the global distribution (entire experiment) were created. The fraction of random profiles whose standard deviations were greater than or equal to the gene's standard deviation were calculated. This fraction was then normalized to create the final P-value for regulation.
The total P(reg) value for a single gene across all available experiments is computed by multiplying all of the P(reg)-values for each experiment.
The peaktime describes when in the cell cycle a gene is maximally expressed. Peaktime is calculated as a percent, with both 0 and 100 representing the M/G1 transition in the cell cycle. These percents are displayed as discrete phases or transitions of the cell cycle.
A peaktime for a single expression profile first requires that a sine wave be fitted to the profile. The algorithm scans through all possible offsets and selects the sine wave has the best correlation with the observed expression profile. The peaktime is then computed as the peak of this sine wave.
To compute a peaktime for a single gene across all available experiments, the time scale was 'shifted' such that time was represented as a fraction of the cell cycle. In this scale, both 0 and 100 correspond to the M/G1 transition. As experiments with not very periodic profiles produce poor peaktimes, the combined peaktime was weighted to take this into account.
In certain cases the peaktime will be marked as uncertain. There are several reasons this uncertainty can occur:
The Gene Feature annotations that Cyclebase displays include degradation signals, overexpression phenotypes, CDK substrates, and siRNA knockdown phenotpes. Each annotation was manually curated from different sources. References describing the experimental and computational sources are included below:
For more detailed information about the analysis methodology and results, please see: