A data frame object stored inside a group of a HDF5 file. Simple columns are stored as one-dimensional datasets in the data subgroup, named by their positional 0-based index in the data frame. All such datasets should have the same length. Column names are stored in column_names, a 1-dimensional string dataset of length equal to the number of columns. Row names, if present, are stored in a row_names dataset. For complex columns, the corresponding dataset is omitted and the actual contents are obtained from other files; a pointer to the resource should be stored in the corresponding entry of the data_frame.columns property.
For any column represented by an integer dataset (including boolean columns), missing values are represented by -2147483648.
For any column represented by a string dataset, that dataset may contain a missing-value-placeholder attribute. This should be a scalar string dataset that contains the string used to represent missing values. If no attribute exists, it is assumed that all strings are non-missing. Note that the row_names dataset, if present, should not contain any missing values.
Derived from data_frame/v1.json: virtual data frame object stored in a yet-to-be-defined file format. Simple columns are stored directly in the file. For complex columns, their contents should be stored in other files, and a pointer to a resource is stored in the corresponding entry of columns (a placeholder column may be created in the file).
The schema to use.
Location of additional metadata for each column, stored as another data_frame. Omitted if there is no additional per-column metadata is provided.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Information about the columnar fields in the data frame. This follows the same order as the columns in the on-disk representation.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
Levels for the categorical factor. This is stored as a single-column data_frame. For ordered factors, the order is respected in the saved data frame.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"other" Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Name of the column. Each column must have a non-empty name. Column names should not be duplicated within columns.
Must be at least 1 characters long
What is the type of the column? Factors and ordered factors have an additional levels property specifying the levels. Dates are stored in YYYY-MM-DD format. Date-times should follow RFC 3339 Section 5.6. Columns listed as other are assumed to be non-atomic and should contain a resource property pointing towards the file containing the column's contents.
Dimensions of a two-dimensional object.
Must contain a minimum of 2 items
Must contain a maximum of 2 items
Location of additional metadata for this object, typically stored as a list (via the basic_list schema). Ommitted if no other metadata is provided.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Whether the data frame has row names. If true, these are stored in the first column of the CSV.
Description of the resource.
UCSC, Ensembl or other genome builds involved in generating this resource.
Identifier for this genome build.
"mm10"
"NCBIm37"
Source of the genome build identifier.
Name of the group inside the HDF5 file that contains the contents of the data frame.
Is this a child document, only to be interpreted in the context of the parent document from which it is linked? This may have implications for search and metadata requirements.
MD5 checksum for the file.
Origins of this resource.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"PubMed" ^[0-9]+$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"GEO" ^GSE[0-9]+$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"ArrayExpress" ^E-MTAB-[0-9]+$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"DOI" ^[0-9a-zA-Z\._-]+/[0-9a-zA-Z\._-]+$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"URI" ^(http|ftp|https|s3|sftp):// Identifier for the resource in the specified type.
Source database or repository.
Path to the file in the project directory.
NCBI taxonomy IDs of the species involved in this resource.
Terms from a controlled vocabulary, used to annotate this resource in a machine-readable manner.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"Experimental Factor Ontology" ^EFO:[0-9]{7}$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"Human Disease Ontology" ^DOID:[0-9]+$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"Cell Ontology" ^CL:[0-9]{7}$ If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"UBERON" "^UBERON:[0-9]{7}$" Identifier for the term.
"EFO:0008913"
"DOID:13250"
"CL:0000097"
"UBERON:0005870"
Name of the vocabulary or ontology that is the source for this term.
Version of the vocabulary.
Title of the resource.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
true