Bumpy Data Frame Matrix


A bumpy matrix of data frames, corresponding to the BumpyDataFrameMatrix class from the BumpyMatrix package. Each entry of the matrix contains a data frame with a variable number of rows but the same columns. A concatenated data frame is created by row-wise combining all individual data frames and is referenced by the concatenated property.

Derived from bumpy_matrix/v1.json: a virtual "bumpy" matrix of vector-like objects of the same type, corresponding to the BumpyMatrix class from the BumpyMatrix package. Each entry of the matrix contains a vector-like object with a variable number of subelements. For efficient storage, all matrix elements are concatenated into a single object of the same type, with length equal to the total number of subelements across all vector-like objects.

To recover the bumpy matrix, we can inspect the partitioning information in the data frame, which is saved to path as a CSV file under the comservatory specification. Each row of the data frame corresponds to an vector-like object in the bumpy matrix. The columns row, column and number are present in this data frame, specifying the 1-based row index, 1-based column index and length of the vector-like object. Subelements of the concatenated object are partitioned by assigning the specified number of consecutive subelements into a series of contiguous vector-like objects, each of which correspond to successive rows in the data frame. Matrix entries not listed in the data frame are assumed to be length-0 vector-like objects of the same type.

If row or column names are present on the bumpy matrix, they can be stored as separate data frames. These child objects should be referenced by the bumpy_matrix.row_names and bumpy_matrix.column_names properties.

Concrete subclasses are expected to provide a concatenated property that points to the concatenated object.

Derived from array/v1.json: some kind of multi-dimensional array, where we store metadata about the dimensions and type of data. The exact implementation of the array is left to concrete subclasses.

Type: object

Type: string

The schema to use.

Type: object
No Additional Properties

Type: array of integer

Dimensions of an n-dimensional array.

Must contain a minimum of 2 items

Must contain a maximum of 2 items

Each item of this array must be:

Type: enum (of string)

Type of data stored in this array.

Must be one of:

  • "boolean"
  • "number"
  • "integer"
  • "string"
  • "other"

Type: array of object

Authors of this resource.

Each item of this array must be:

Type: object

Type: string

Email of the author.

Must match regular expression: ^[^@]+@[^@]+$

Type: string

Name of the author.

Type: string

ORCID of the author.

Must match regular expression: ^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$

Type: object
No Additional Properties

Type: object

Pointer to the row-wise concatenated data frame.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: object
No Additional Properties

Type: object

Pointer to a data frame containing the column names for the bumpy matrix. This should be of length equal to the number of columns. If omitted, no column names were present.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: enum (of string)

Type of compression applied to the file.

Must be one of:

  • "none"
  • "gzip"
  • "bzip2"

Type: integer

Number of vector-like objects in the bumpy matrix, where each object is represented by a row in the data frame at path. If this is less than the product of the matrix dimensions, it is assumed that the entries missing from the CSV correspond to empty vector-like objects in the bumpy matrix.

Value must be greater or equal to 0

Type: boolean Default: false

Whether the individual vector-like objects are named. If true, the first column of the CSV is called names and contains the name for the object corresponding to each row. Note that this does not represent the row or column names of the bumpy matrix itself, but instead of the individual (non-empty) vector-like objects.

Type: object

Pointer to a data frame containing the row names for the bumpy matrix. This should be of length equal to the number of rows. If omitted, no row names were present.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: string

Description of the resource.

Type: array of object

UCSC, Ensembl or other genome builds involved in generating this resource.

Each item of this array must be:

Type: object

Type: string

Identifier for this genome build.


Examples:

"mm10"
"NCBIm37"

Type: enum (of string)

Source of the genome build identifier.

Must be one of:

  • "Ensembl"
  • "UCSC"
  • "Wormbase"
  • "Flybase"

Type: boolean Default: false

Is this a child document, only to be interpreted in the context of the parent document from which it is linked? This may have implications for search and metadata requirements.

Type: string

MD5 checksum for the file.

Type: array of object

Origins of this resource.

Each item of this array must be:


Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "PubMed"
Type: object

Type: string
Must match regular expression: ^[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "GEO"
Type: object

Type: string
Must match regular expression: ^GSE[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "ArrayExpress"
Type: object

Type: string
Must match regular expression: ^E-MTAB-[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "DOI"
Type: object

Type: string
Must match regular expression: ^[0-9a-zA-Z\._-]+/[0-9a-zA-Z\._-]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "URI"
Type: object

Type: string
Must match regular expression: ^(http|ftp|https|s3|sftp)://

Type: string

Identifier for the resource in the specified type.

Type: enum (of string)

Source database or repository.

Must be one of:

  • "PubMed"
  • "GEO"
  • "ArrayExpress"
  • "DOI"
  • "URI"

Type: string

Path to the file in the project directory.

Type: array of integer

Each item of this array must be:

Type: integer

NCBI taxonomy IDs of the species involved in this resource.

Type: array of object

Terms from a controlled vocabulary, used to annotate this resource in a machine-readable manner.

Each item of this array must be:


No Additional Properties

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Experimental Factor Ontology"
Type: object

Type: object
Must match regular expression: ^EFO:[0-9]{7}$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Human Disease Ontology"
Type: object

Type: object
Must match regular expression: ^DOID:[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Cell Ontology"
Type: object

Type: object
Must match regular expression: ^CL:[0-9]{7}$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "UBERON"
Type: object

Type: const
Specific value: "^UBERON:[0-9]{7}$"

Type: string

Identifier for the term.


Examples:

"EFO:0008913"
"DOID:13250"
"CL:0000097"
"UBERON:0005870"

Type: enum (of string)

Name of the vocabulary or ontology that is the source for this term.

Must be one of:

  • "Experimental Factor Ontology"
  • "Human Disease Ontology"
  • "Cell Ontology"
  • "UBERON"

Type: string

Version of the vocabulary.

Type: string

Title of the resource.

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.


Must not be:

Type: object

Type: const
Specific value: true
Type: object

The following properties are required:

  • title
  • description
  • authors
  • species
  • genome
  • origin
  • terms