Multi-modal¶
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
MuData objects build on top of AnnData objects to store multimodal data.
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
→ connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Show code cell output
→ connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Show code cell output
MuData object with n_obs × n_vars = 200 × 300 obs: 'perturbation', 'replicate' var: 'name' 4 modalities rna: 200 x 173 obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mito' var: 'name' adt: 200 x 4 obs: 'nCount_ADT', 'nFeature_ADT' var: 'name' hto: 200 x 12 obs: 'nCount_HTO', 'nFeature_HTO', 'technique' var: 'name' gdo: 200 x 111 obs: 'nCount_GDO' var: 'name'
Validate annotations¶
curate = ln.Curator.from_mudata(
mdata,
var_index={
"rna": bt.Gene.symbol, # gene expression
"adt": bt.CellMarker.name, # antibody derived tags reflecting surface proteins
"hto": ln.Feature.name, # cell hashing
"gdo": ln.Feature.name, # guide RNAs
},
categoricals={
"perturbation": ln.ULabel.name, # shared categorical
"replicate": ln.ULabel.name, # shared categorical
"hto:technique": bt.ExperimentalFactor.name # note this is a modality specific categorical
},
organism="human",
)
Show code cell output
• 1 non-validated values are not saved in Feature.name: ['nCount_GDO']!
→ to lookup values, use lookup().columns
→ to save, run add_new_from_columns
• 2 non-validated values are not saved in Feature.name: ['nCount_ADT', 'nFeature_ADT']!
→ to lookup values, use lookup().columns
→ to save, run add_new_from_columns
✓ added 2 records with Feature.name for columns: 'perturbation', 'replicate'
• 37 non-validated values are not saved in Feature.name: ['gdo:S.Score', 'gdo:HTO_classification', 'hto:percent.mito', 'adt:NT', 'hto:technique', 'hto:orig.ident', 'gdo:replicate', 'gdo:Phase', 'adt:percent.mito', 'gdo:percent.mito', 'hto:replicate', 'adt:gene_target', 'hto:S.Score', 'hto:HTO_classification', 'adt:MULTI_ID', 'adt:replicate', 'gdo:NT', 'adt:S.Score', 'hto:NT', 'hto:guide_ID', 'gdo:MULTI_ID', 'adt:Phase', 'hto:Phase', 'hto:perturbation', 'gdo:G2M.Score', 'hto:gene_target', 'gdo:guide_ID', 'adt:perturbation', 'adt:orig.ident', 'hto:G2M.Score', 'hto:MULTI_ID', 'adt:G2M.Score', 'adt:guide_ID', 'gdo:orig.ident', 'gdo:gene_target', 'adt:HTO_classification', 'gdo:perturbation']!
→ to lookup values, use lookup().columns
→ to save, run add_new_from_columns
• 3 non-validated values are not saved in Feature.name: ['nCount_RNA', 'percent.mito', 'nFeature_RNA']!
→ to lookup values, use lookup().columns
→ to save, run add_new_from_columns
✓ added 1 record with Feature.name for columns: 'technique'
• 2 non-validated values are not saved in Feature.name: ['nCount_HTO', 'nFeature_HTO']!
→ to lookup values, use lookup().columns
→ to save, run add_new_from_columns
✓ added 100 records from public with Gene.symbol for var_index: 'SH2D6', 'MEF2C-AS2', 'ARHGAP26-AS1', 'GABRA1', 'H4C12', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', 'CSMD3', 'ZNF483', 'AK8', 'TMEM72-AS1', 'ARAP1-AS2', 'CRYAB', 'DNAI7', ...
! 84 non-validated values are not saved in Gene.symbol: ['RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'RP11-324E6.9', 'RP11-187A9.3', 'RP11-365N19.2', 'RP11-346D14.1', 'RP11-265N6.2', 'CTD-3065B20.2', 'RP11-304L19.11', 'AC026471.6', 'AC091132.1', 'RP11-138C9.1', 'RP11-75C10.9', 'RP11-835E18.5', 'RP11-760N9.1', 'RP11-17J14.2', 'CTD-3193O13.8', 'AC004019.13', 'RP11-465N4.4', 'RP11-434D9.1', 'RP11-325L7.1', 'RP11-134K13.4', 'RP5-855F16.1', 'RP3-327A19.5', 'RP11-546K22.3', 'RP11-473O4.4', 'RP13-582O9.7', 'RP11-12D24.10', 'RP11-120C12.3', 'RP11-80H5.7', 'RP11-496I9.1', 'AP000442.4', 'RP11-867G23.3', 'RP11-113K21.4', 'RP11-745O10.2', 'RP11-335O4.3', 'RP11-408E5.4', 'AE000662.93', 'AL132989.1', 'RP11-973N13.4', 'RP11-982M15.2', 'RP11-32B5.7', 'RP1-1J6.2', 'RP3-337O18.9', 'AC011558.5', 'CTA-373H7.7', 'RP11-415J8.5', 'AC092687.5', 'RP11-532F6.4', 'RP11-146I2.1', 'RP11-624M8.1', 'RP11-219B4.7', 'RP11-9M16.2', 'RP11-247A12.8', 'RP11-536K7.5', 'RP11-186N15.3', 'RP11-152H18.3', 'CTD-3012A18.1', 'CTD-2562J17.2', 'RP11-136I14.5', 'RP11-110I1.14', 'RP11-2H8.2', 'RP11-307N16.6', 'RP11-3D4.2', 'RP11-231C14.4', 'CTB-134F13.1', 'RP11-403P17.5', 'RP11-214C8.2', 'CTB-31O20.9', 'AC092295.4']!
→ to lookup values, use lookup().var_index
→ to save, run add_new_from_var_index
! 12 non-validated values are not saved in Feature.name: ['rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl']!
→ to lookup values, use lookup().var_index
→ to save, run add_new_from_var_index
! 111 non-validated values are not saved in Feature.name: ['eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', 'BRD4g1', 'BRD4g2', 'BRD4g3', 'BRD4g4', 'CAV1g1', 'CAV1g2', 'CAV1g3', 'CAV1g4', 'CD86g1', 'CD86g2', 'CD86g3', 'CD86g4', 'ETV7g1', 'ETV7g2', 'ETV7g3', 'ETV7g4', 'IFNGR1g1', 'IFNGR1g2', 'IFNGR1g3', 'IFNGR1g4', 'IFNGR2g1', 'IFNGR2g2', 'IFNGR2g3', 'IFNGR2g4', 'IRF1g1', 'IRF1g2', 'IRF1g3', 'IRF1g4', 'IRF7g1', 'IRF7g2', 'IRF7g3', 'IRF7g4', 'JAK2g1', 'JAK2g2', 'JAK2g3', 'JAK2g4', 'MARCH8g1', 'MARCH8g2', 'MARCH8g3', 'MARCH8g4', 'MYCg1', 'MYCg2', 'MYCg3', 'MYCg4', 'NFKBIAg1', 'NFKBIAg2', 'NFKBIAg3', 'NFKBIAg4', 'PDCD1LG2g1', 'PDCD1LG2g2', 'PDCD1LG2g3', 'PDCD1LG2g4', 'POU2F2g1', 'POU2F2g2', 'POU2F2g3', 'POU2F2g4', 'SMAD4g1', 'SMAD4g2', 'SMAD4g3', 'SMAD4g4', 'SPI1g1', 'SPI1g2', 'SPI1g3', 'SPI1g4', 'STAT1g1', 'STAT1g2', 'STAT1g3', 'STAT1g4', 'STAT2g1', 'STAT2g2', 'STAT2g3', 'STAT2g4', 'STAT3g1', 'STAT3g2', 'STAT3g3', 'STAT3g4', 'STAT5Ag1', 'STAT5Ag2', 'STAT5Ag3', 'STAT5Ag4', 'TNFRSF14g1', 'TNFRSF14g2', 'TNFRSF14g3', 'TNFRSF14g4', 'UBE2L6g1', 'UBE2L6g2', 'UBE2L6g3', 'UBE2L6g4', 'NTg8', 'NTg9', 'NTg10']!
→ to lookup values, use lookup().var_index
→ to save, run add_new_from_var_index
# add new gene symbols from the ['rna'].var.index
curate.add_new_from_var_index("rna")
# add new categories from the hto and gdo var.index
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Show code cell output
✓ added 84 records with Gene.symbol for var_index: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'RP11-324E6.9', 'RP11-187A9.3', 'RP11-365N19.2', 'RP11-346D14.1', ...
✓ added 12 records with Feature.name for var_index: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
✓ added 111 records with Feature.name for var_index: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
✓ added 3 records with Feature.name for rna obs columns: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
✓ added 2 records with Feature.name for adt obs columns: 'nCount_ADT', 'nFeature_ADT'
✓ added 2 records with Feature.name for hto obs columns: 'nCount_HTO', 'nFeature_HTO'
✓ added 1 record with Feature.name for gdo obs columns: 'nCount_GDO'
curate.validate()
Show code cell output
✓ rna_var_index is validated against Gene.symbol
✓ adt_var_index is validated against CellMarker.name
✓ hto_var_index is validated against Feature.name
✓ gdo_var_index is validated against Feature.name
• mapping perturbation on ULabel.name
! 2 terms are not validated: 'Perturbed', 'NT'
→ fix typos, remove non-existent values, or save terms via .add_new_from('perturbation')
• mapping replicate on ULabel.name
! 3 terms are not validated: 'rep3', 'rep1', 'rep2'
→ fix typos, remove non-existent values, or save terms via .add_new_from('replicate')
• mapping technique on ExperimentalFactor.name
! found 1 validated terms: ['cell hashing']
→ save terms via .add_validated_from('technique')
False
# add validated and new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
curate.add_validated_from("technique", modality="hto")
Show code cell output
✓ added 2 records with ULabel.name for perturbation: 'Perturbed', 'NT'
✓ added 3 records with ULabel.name for replicate: 'rep3', 'rep1', 'rep2'
curate.validate()
Show code cell output
✓ rna_var_index is validated against Gene.symbol
✓ adt_var_index is validated against CellMarker.name
✓ hto_var_index is validated against Feature.name
✓ gdo_var_index is validated against Feature.name
✓ perturbation is validated against ULabel.name
✓ replicate is validated against ULabel.name
✓ technique is validated against ExperimentalFactor.name
True
Register curated artifact¶
artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Show code cell output
! no run & transform got linked, call `ln.context.track()` & re-run
• path content will be copied to default storage upon `save()` with key `None` ('.lamindb/7BH9uDp5dLLFp4qM0000.h5mu')
✓ storing artifact '7BH9uDp5dLLFp4qM0000' at '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/7BH9uDp5dLLFp4qM0000.h5mu'
! run input wasn't tracked, call `ln.context.track()` and re-run
✓ loaded 2 Feature records matching name: 'perturbation', 'replicate'
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
• parsing feature names of X stored in slot 'var'
✓ 161 terms (93.10%) are validated for symbol
! 12 terms (6.90%) are not validated for symbol: CTC-467M3.1, HIST1H4K, CASC1, LARGE, NBPF16, C1orf65, IBA57-AS1, KIAA1239, TMEM75, AP003419.16, FAM65C, C14orf177
✓ linked: FeatureSet(uid='c7PAVfWoJOjeKVl0l5gm', n=172, dtype='float', registry='bionty.Gene', hash='DS8Cu_8rlAz4Ai344wA-1Q', created_by_id=1)
• parsing feature names of slot 'obs'
✓ 3 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='nnee1Lz982Ez7fV2az7I', n=3, registry='Feature', hash='CXdTp-zlYC_2Vwgt_HJ_pw', created_by_id=1)
• parsing feature names of X stored in slot 'var'
✓ 4 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='pLfniSJwmza5AI2KUTZL', n=4, dtype='float', registry='bionty.CellMarker', hash='yvqdsoxk1-1g8Fg7fpscNQ', created_by_id=1)
• parsing feature names of slot 'obs'
✓ 2 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='xLxiUyerKhb2baGWxEkM', n=2, registry='Feature', hash='5lC0lWyDNATooDvPAhBNHA', created_by_id=1)
• parsing feature names of X stored in slot 'var'
✓ 12 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='L7ZcieJwWxf1tKlhzD5C', n=12, dtype='float', registry='Feature', hash='13oATJjHoMssYFuN2_AaMw', created_by_id=1)
• parsing feature names of slot 'obs'
✓ 3 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='xh3BRwP1XI5wnwJPNsOk', n=3, registry='Feature', hash='uM-hBEkH5Pm0lIP5pzZ6ew', created_by_id=1)
• parsing feature names of X stored in slot 'var'
✓ 111 terms (100.00%) are validated for name
✓ linked: FeatureSet(uid='EB3BSBd7jeVRCpB7qubz', n=111, dtype='float', registry='Feature', hash='zBacHjYALJxaM4vjILiMrA', created_by_id=1)
• parsing feature names of slot 'obs'
✓ 1 term (100.00%) is validated for name
✓ linked: FeatureSet(uid='G1pyc1j8sO1qViMLnAGp', n=1, registry='Feature', hash='sZykLcBbPVm3tmmOoxX1iw', created_by_id=1)
✓ saved 9 feature sets for slots: 'obs','['rna'].var','['rna'].obs','['adt'].var','['adt'].obs','['hto'].var','['hto'].obs','['gdo'].var','['gdo'].obs'
artifact.describe()
Show code cell output
Artifact(uid='7BH9uDp5dLLFp4qM0000', is_latest=True, description='Sub-sampled MuData from Papalexi21', suffix='.h5mu', type='dataset', size=549984, hash='7yzkqhErAAdXCSJq0o6JxA', n_observations=200, _hash_type='md5', _accessor='MuData', visibility=1, _key_is_virtual=True, updated_at='2024-09-24 13:49:40 UTC')
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal'
.created_by = 'testuser1'
Labels
.experimental_factors = 'cell hashing'
.ulabels = 'Perturbed', 'NT', 'rep3', 'rep1', 'rep2'
Features
'perturbation' = 'Perturbed', 'NT'
'replicate' = 'rep3', 'rep1', 'rep2'
'technique' = 'cell hashing'
Feature sets
'obs' = 'perturbation', 'replicate'
'['rna'].var' = 'SH2D6', 'ARHGAP26-AS1', 'GABRA1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', 'CSMD3', 'ZNF483'
'['rna'].obs' = 'nFeature_RNA', 'percent.mito', 'nCount_RNA'
'['adt'].var' = 'CD86', 'PDL1', 'PDL2', 'CD366'
'['adt'].obs' = 'nCount_ADT', 'nFeature_ADT'
'['hto'].var' = 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
'['hto'].obs' = 'technique', 'nCount_HTO', 'nFeature_HTO'
'['gdo'].var' = 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4'
'['gdo'].obs' = 'nCount_GDO'
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Show code cell output
• deleting instance testuser1/test-multimodal