Cyclebase
 
 
 
Data Formats
 
Contents:
 
 
Cyclebase offers several types of data available for download. If you are looking for how to find and download data, please consult the FAQ. Cyclebase will only distribute expression-profile data if the original authors have given their consent. Use of processed data requires that you cite the original authors as well as Cyclebase. All data have been processed as described here. Each exported type is offered in two formats:
 
Tab-Delimited - data is easily parsed programmatically in a manner similar to other simple text formats such as Fasta, SwissProt, and GenBank. Information about the downloadable formats can be found below.
 
XML - most modern languages have created useful libraries for reading and writing XML files. These tools usually abstract the parsing of documents away from the programmer, allowing them to concentrate on using the data.

Cyclebase makes available an XSD (XML Schema Definition) file for each format exported. All XML files exported from Cyclebase are guaranteed to conform to their corresponding XSD format. Information about the XML formats available and their XSD files can be found below.

Important Note: It is bad practice to parse XML files on a "line by line" basis. Although XML exported from Cyclebase looks as if it allows this "line by line" approach, in fact a valid xml document can appear on only two lines (new lines are added only to make it more human readable, but this is not always guaranteed to be the case). The take-home message is: use the libraries available from your favorite programming language to parse XML files. More information on available libraries for some of the most popular programming languages can be found in the FAQ.
 
Entire Organism Analysis Results
 
Entire organism data-files consist of overall organism information and Cyclebase analysis-results. The files contain both the combined analysis (across all available experiments) and the analysis of each individual experiment.

Downloads are available with both the entire gene/probe set, or as only the gene/probe set which Cyclebase has deemed periodic. The format of the download with only periodic genes is the same as the full download.
 
Tab-Delimited Organism Analysis Download
  • Those lines that are blank or begin with ## are comments and should be ignored.
  • The first data after the initial comments will always be organism-info (OI) lines.
  • The next comment line starts the Cyclebase analysis-results section. This section will first have an analysis-results header line (AH), followed by repeating analysis-results lines (AR). Each of the analysis-results lines contains the gene/probe, it's respective overall analysis-results (Rank, Peaktime, P-value for Periodicity, and P-value for Regulation), and each experiment's analysis-results. The separation between each analysis-results 'block' is a Tab, while the separation within the 'block' is a pipe '|'.
  • The following figure depicts a simple Tab-Delimited organism analysis:
 
XML Organism Analysis Download
  • Again, please use libraries available from your favorite programming language to parse XML files. More information on available libraries for some of the most popular programming languages can be found in the FAQ.
  • The figure below depicts a simple example of an organism analysis exported in XML:
 
 
Entire Experiments
 
An entire experiment data-file consists of overall experiment information, Cyclebase analysis-results, and all of the processed time-series expression-profiles for that experiment.

For the experiments that Cyclebase has not been given permission to distribute, downloaded files will only contain the overall experiment information and Cyclebase analysis-results. These 'confidential' experiment files will be formated exactly like the 'full' files, but will not contain the processed time-series expression-profiles.
 
Tab-Delimited Experiment Download
  • Those lines that are blank or begin with ## are comments and should be ignored.
  • The first data after the initial comments will always be 4 experiment-info (EI) lines.
  • The next comment line will always start the Cyclebase analysis-results section. This section will first have an analysis-results header line (AH), followed by repeating analysis-results lines (AR). Each of the analysis-results lines contains the gene/probe and it's respective Rank, Peaktime, P-value for Periodicity, and P-value for Regulation.
  • The next comment line will start the time-series expression-profiles. The first line is a header (EH), which will specify the time (in percent of cell-cycle) from the M/G1 transition. Following the header line are expression-profile (EP) lines. Each of the EP lines will contain:
    • The gene/probe name
    • A tab separated list of expression values (normalized as described here). Note these values can are blank if there was no reading at a specific time for a specific gene/probe.
  • The following figure depicts a simple Tab-Delimited experiment:
 
XML Experiment Download
  • Again, please use libraries available from your favorite programming language to parse XML files. More information on available libraries for some of the most popular programming languages can be found in the FAQ.
  • The figure below depicts a simple example of an experiment exported in XML:
 
 
Single Genes
 
An individual gene export contains the gene's 'global' analysis-results, all available experiment analysis-results, and expression profiles.
 
Tab-Delimited Gene Download
  • Those lines that are blank or begin with ## are comments and should be ignored.
  • Immediately following the initial comments are 6 gene information (GI) lines.
  • A comment marks the end of the of the gene information lines and beings the start of the experiments that Cyclebase has been given permission to distribute. There can be zero or more shared experiments, each seperated with a blank line after the data (D1 and D2) lines. Cyclebase analysis-results (rank, peaktime, P-value for periodicity, and P-value for regulation) will be available in the EI lines. The data (D1 and D2) lines will be populated with the following:
    • D1 contains a tab-separated list of the time (in percent of cell-cycle) from the M/G1 transition.
    • D2 contains a tab separated list of expression values (normalized as described here).
  • The next comment will mark the beginning of the repeating confidential experiments. There can be zero or more confidential experiments.
  • Confidential experiments are experiments that Cyclebase has not been given permission to distribute, and will therefore have no treated expression-profile data. Cyclebase analysis-results (rank, peaktime, P-value for periodicity, and P-value for regulation) will still be available in the EI lines, but the word "CONFIDENTIAL" will be put on the data (D1 and D2) lines.
  • The figure below depicts a simple example of an exported gene:
 
XML Gene Download
  • Again, please use libraries available from your favorite programming language to parse XML files. More information on available libraries for some of the most popular programming languages can be found in the FAQ.
  • The figure below depicts a simple example of a gene exported in XML:
 
 
 
FAQ       Methods       Experiments       Download Data
©2007 Cyclebase.org