HDF5 sparse matrix


Sparse matrix, saved in compressed sparse format inside a group of a HDF5 file. The HDF5 group should contain a data subgroup, which should in turn contain the typical contents of the compressed sparse matrix, i.e., indices, indptr and data. Specifically, data should be a 1-dimensional integer or numeric dataset contains the values of the non-zero elements; indices should be a 1-dimensional integer dataset containing the 0-based row/column index for each non-zero element in data; and indptr should be a 1-dimensional integer dataset of length equal to the number of columns/rows plus 1, containing pointers to the start and end of each column/row. The exact interpretation depends on the format specified in format.

The array.dimensions property should have exactly two elements. The first entry should be the number of rows, while the second entry should be the number of columns.

Dimnames may also be saved inside the same HDF5 file, as string datasets in another group. In such cases, the hdf5_sparse_matrix.dimnames property should be present and contain the name of that group.

If data is an integer dataset, missing values are represented by -2147483648.

Derived from array/v1.json: some kind of multi-dimensional array, where we store metadata about the dimensions and type of data. The exact implementation of the array is left to concrete subclasses.

Type: object

Type: string

The schema to use.

Type: object
No Additional Properties

Type: array of integer

Dimensions of an n-dimensional array.

Must contain a minimum of 2 items

Must contain a maximum of 2 items

Each item of this array must be:

Type: enum (of string)

Type of data stored in this array.

Must be one of:

  • "boolean"
  • "number"
  • "integer"
  • "string"
  • "other"

Type: array of object

Authors of this resource.

Each item of this array must be:

Type: object

Type: string

Email of the author.

Must match regular expression: ^[^@]+@[^@]+$

Type: string

Name of the author.

Type: string

ORCID of the author.

Must match regular expression: ^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$

Type: string

Description of the resource.

Type: array of object

UCSC, Ensembl or other genome builds involved in generating this resource.

Each item of this array must be:

Type: object

Type: string

Identifier for this genome build.


Examples:

"mm10"
"NCBIm37"

Type: enum (of string)

Source of the genome build identifier.

Must be one of:

  • "Ensembl"
  • "UCSC"
  • "Wormbase"
  • "Flybase"

Type: object
No Additional Properties

Type: string

Name of the group containing the dimnames. This group should contain zero or one string datasets for each dimension. The name of each string dataset is based on its dimension - "0" for rows, "1" for columns - and should have length equal to the extent of that dimension. If this property is not present, it can be assumed that no dimnames are available. Each dataset should not contain any missing values, so each string can be interpreted as-is.

Type: enum (of string)

Format of the sparse matrix.

The tenx_matrix is a compressed sparse column format where indices contains row indices and indptr contains the column index pointers. The group should contain a shape dataset, an integer vector of length 2 containing the number of rows and columns.

Must be one of:

  • "tenx_matrix"

Type: string

Name of the group inside the HDF5 file that contains the sparse matrix's data.

Type: boolean Default: false

Is this a child document, only to be interpreted in the context of the parent document from which it is linked? This may have implications for search and metadata requirements.

Type: string

MD5 checksum for the file.

Type: array of object

Origins of this resource.

Each item of this array must be:


Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "PubMed"
Type: object

Type: string
Must match regular expression: ^[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "GEO"
Type: object

Type: string
Must match regular expression: ^GSE[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "ArrayExpress"
Type: object

Type: string
Must match regular expression: ^E-MTAB-[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "DOI"
Type: object

Type: string
Must match regular expression: ^[0-9a-zA-Z\._-]+/[0-9a-zA-Z\._-]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "URI"
Type: object

Type: string
Must match regular expression: ^(http|ftp|https|s3|sftp)://

Type: string

Identifier for the resource in the specified type.

Type: enum (of string)

Source database or repository.

Must be one of:

  • "PubMed"
  • "GEO"
  • "ArrayExpress"
  • "DOI"
  • "URI"

Type: string

Path to the file in the project directory.

Type: array of integer

Each item of this array must be:

Type: integer

NCBI taxonomy IDs of the species involved in this resource.

Type: array of object

Terms from a controlled vocabulary, used to annotate this resource in a machine-readable manner.

Each item of this array must be:


No Additional Properties

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Experimental Factor Ontology"
Type: object

Type: object
Must match regular expression: ^EFO:[0-9]{7}$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Human Disease Ontology"
Type: object

Type: object
Must match regular expression: ^DOID:[0-9]+$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "Cell Ontology"
Type: object

Type: object
Must match regular expression: ^CL:[0-9]{7}$
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "UBERON"
Type: object

Type: const
Specific value: "^UBERON:[0-9]{7}$"

Type: string

Identifier for the term.


Examples:

"EFO:0008913"
"DOID:13250"
"CL:0000097"
"UBERON:0005870"

Type: enum (of string)

Name of the vocabulary or ontology that is the source for this term.

Must be one of:

  • "Experimental Factor Ontology"
  • "Human Disease Ontology"
  • "Cell Ontology"
  • "UBERON"

Type: string

Version of the vocabulary.

Type: string

Title of the resource.

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.


Must not be:

Type: object

Type: const
Specific value: true
Type: object

The following properties are required:

  • title
  • description
  • authors
  • species
  • genome
  • origin
  • terms