{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# *Dandelion* class\n",
    "\n",
    "![dandelion_logo](img/dandelion_logo_illustration.png)\n",
    "\n",
    "Much of the functions and utility of the `dandelion` package revolves around the `Dandelion` class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This notebook will run through a quick primer to the `Dandelion` class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Import modules***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dandelion==0.1.0 pandas==1.1.4 numpy==1.19.4 matplotlib==3.3.3 networkx==2.5 scipy==1.5.3 skbio==0.5.6\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "os.chdir(os.path.expanduser('/Users/kt16/Downloads/dandelion_tutorial/'))\n",
    "import dandelion as ddl\n",
    "ddl.logging.print_versions()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 1700\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'\n",
       "    distance: 'heavy', 'light_0', 'light_1', 'light_2'\n",
       "    edges: 'source', 'target', 'weight'\n",
       "    layout: layout for 838 vertices, layout for 24 vertices\n",
       "    graph: networkx graph of 838 vertices, networkx graph of 24 vertices "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vdj = ddl.read_h5('dandelion_results.h5')\n",
    "vdj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Basically, the object can be summarized in the following illustration:\n",
    "\n",
    "![dandelion_class <](img/dandelion_class.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Essentially, the `.data` slot holds the AIRR contig table while the `.metadata` holds a collapsed version that is compatible with combining with `AnnData`'s `.obs` slot. You can retrieve these slots like a typical class object; for example, if I want the metadata:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>clone_id</th>\n",
       "      <th>clone_id_by_size</th>\n",
       "      <th>sample_id</th>\n",
       "      <th>locus_heavy</th>\n",
       "      <th>locus_light</th>\n",
       "      <th>productive_heavy</th>\n",
       "      <th>productive_light</th>\n",
       "      <th>v_call_genotyped_heavy</th>\n",
       "      <th>v_call_genotyped_light</th>\n",
       "      <th>j_call_heavy</th>\n",
       "      <th>...</th>\n",
       "      <th>junction_aa_light</th>\n",
       "      <th>status</th>\n",
       "      <th>productive</th>\n",
       "      <th>isotype</th>\n",
       "      <th>vdj_status_detail</th>\n",
       "      <th>vdj_status</th>\n",
       "      <th>changeo_clone_id</th>\n",
       "      <th>d_call_heavy</th>\n",
       "      <th>d_call_light</th>\n",
       "      <th>clone_id_heavy_only</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC</th>\n",
       "      <td>102_3_1</td>\n",
       "      <td>563</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV1-69</td>\n",
       "      <td>IGKV1-8</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYYSYPRTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>110_33</td>\n",
       "      <td>IGHD3-22</td>\n",
       "      <td></td>\n",
       "      <td>102_3_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG</th>\n",
       "      <td>141_4_1</td>\n",
       "      <td>658</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV1-2</td>\n",
       "      <td>IGLV5-45</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CMIWHSSAWVV</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>467_34</td>\n",
       "      <td>IGHD3-16|IGHD4-17</td>\n",
       "      <td></td>\n",
       "      <td>141_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC</th>\n",
       "      <td>26_2_2</td>\n",
       "      <td>670</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV5-51</td>\n",
       "      <td>IGKV1D-8</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYYSFPYTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>306_35</td>\n",
       "      <td>IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26</td>\n",
       "      <td></td>\n",
       "      <td>26_2_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT</th>\n",
       "      <td>66_8_3</td>\n",
       "      <td>527</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-15</td>\n",
       "      <td>IGLV6-57</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CQSYDSSNVVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_j</td>\n",
       "      <td>Single</td>\n",
       "      <td>56_36</td>\n",
       "      <td>IGHD1-26</td>\n",
       "      <td></td>\n",
       "      <td>66_8_3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT</th>\n",
       "      <td>18_4_1</td>\n",
       "      <td>244</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-33</td>\n",
       "      <td>IGLV2-14</td>\n",
       "      <td>IGHJ6</td>\n",
       "      <td>...</td>\n",
       "      <td>CSSYTSSSTRVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>125_37</td>\n",
       "      <td>IGHD3-10</td>\n",
       "      <td></td>\n",
       "      <td>18_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG</th>\n",
       "      <td>15_8_1</td>\n",
       "      <td>653</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV4-59</td>\n",
       "      <td>IGKV1-12</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQANSFPLTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>348_483</td>\n",
       "      <td>IGHD6-19</td>\n",
       "      <td></td>\n",
       "      <td>15_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT</th>\n",
       "      <td>69_8_1</td>\n",
       "      <td>189</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-21</td>\n",
       "      <td>IGKV3-20</td>\n",
       "      <td>IGHJ6</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYGSSPLFTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>731_484</td>\n",
       "      <td>IGHD3-3</td>\n",
       "      <td></td>\n",
       "      <td>69_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG</th>\n",
       "      <td>90_7_2</td>\n",
       "      <td>713</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-48</td>\n",
       "      <td>IGLV2-14</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CSSYTSSSTRVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>229_485</td>\n",
       "      <td>IGHD3-3</td>\n",
       "      <td></td>\n",
       "      <td>90_7_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT</th>\n",
       "      <td>172_4_2</td>\n",
       "      <td>372</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV4-34</td>\n",
       "      <td>IGKV1D-39|IGKV1-39</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQSYSTPRTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_v</td>\n",
       "      <td>Single</td>\n",
       "      <td>702_486</td>\n",
       "      <td>IGHD3-22</td>\n",
       "      <td></td>\n",
       "      <td>172_4_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC</th>\n",
       "      <td>48_4_1_1|48_4_1_2</td>\n",
       "      <td>699|28</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL|IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T|F</td>\n",
       "      <td>IGHV4-4</td>\n",
       "      <td>IGLV1-51|IGLV1-40</td>\n",
       "      <td>IGHJ5</td>\n",
       "      <td>...</td>\n",
       "      <td>CQSYDRSLGGHYVF|CGTWDSSLSAGCA</td>\n",
       "      <td>IGH + IGL|IGL</td>\n",
       "      <td>T + T|F</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_j|Multi_light_v</td>\n",
       "      <td>Multi</td>\n",
       "      <td>155_487</td>\n",
       "      <td>IGHD4-17|IGHD4-23</td>\n",
       "      <td></td>\n",
       "      <td>48_4_1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>838 rows × 28 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               clone_id clone_id_by_size  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC            102_3_1              563   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG            141_4_1              658   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC             26_2_2              670   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT             66_8_3              527   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT             18_4_1              244   \n",
       "...                                                 ...              ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                 15_8_1              653   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                 69_8_1              189   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                 90_7_2              713   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                172_4_2              372   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      48_4_1_1|48_4_1_2           699|28   \n",
       "\n",
       "                                                sample_id locus_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "...                                                   ...         ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC          vdj_v1_hs_pbmc3         IGH   \n",
       "\n",
       "                                     locus_light productive_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC         IGK                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG         IGL                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC         IGK                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT         IGL                T   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT         IGL                T   \n",
       "...                                          ...              ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG             IGL                T   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC         IGL|IGL                T   \n",
       "\n",
       "                                     productive_light v_call_genotyped_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                T               IGHV1-69   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                T                IGHV1-2   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                T               IGHV5-51   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                T               IGHV3-15   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                T               IGHV3-33   \n",
       "...                                               ...                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                    T               IGHV4-59   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                    T               IGHV3-21   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                    T               IGHV3-48   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                    T               IGHV4-34   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                  T|F                IGHV4-4   \n",
       "\n",
       "                                     v_call_genotyped_light j_call_heavy  ...  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                IGKV1-8        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG               IGLV5-45        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC               IGKV1D-8        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT               IGLV6-57        IGHJ4  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT               IGLV2-14        IGHJ6  ...   \n",
       "...                                                     ...          ...  ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                   IGKV1-12        IGHJ4  ...   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                   IGKV3-20        IGHJ6  ...   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                   IGLV2-14        IGHJ4  ...   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT         IGKV1D-39|IGKV1-39        IGHJ3  ...   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC          IGLV1-51|IGLV1-40        IGHJ5  ...   \n",
       "\n",
       "                                                 junction_aa_light  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                   CQQYYSYPRTF   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                   CMIWHSSAWVV   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                   CQQYYSFPYTF   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                   CQSYDSSNVVF   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                  CSSYTSSSTRVF   \n",
       "...                                                            ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                       CQQANSFPLTF   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                      CQQYGSSPLFTF   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                      CSSYTSSSTRVF   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                       CQQSYSTPRTF   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      CQSYDRSLGGHYVF|CGTWDSSLSAGCA   \n",
       "\n",
       "                                             status productive  isotype  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC      IGH + IGK      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG      IGH + IGL      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC      IGH + IGK      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT      IGH + IGL      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT      IGH + IGL      T + T      IgM   \n",
       "...                                             ...        ...      ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          IGH + IGL      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      IGH + IGL|IGL    T + T|F      IgM   \n",
       "\n",
       "                                                         vdj_status_detail  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                Single + Multi_light_j   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                       Single + Single   \n",
       "...                                                                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                    Single + Multi_light_v   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      Single + Multi_light_j|Multi_light_v   \n",
       "\n",
       "                                      vdj_status  changeo_clone_id  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC      Single            110_33   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG      Single            467_34   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC      Single            306_35   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT      Single             56_36   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT      Single            125_37   \n",
       "...                                          ...               ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          Single           348_483   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          Single           731_484   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          Single           229_485   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          Single           702_486   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC           Multi           155_487   \n",
       "\n",
       "                                                              d_call_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                              IGHD3-22   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                     IGHD3-16|IGHD4-17   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC  IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                              IGHD1-26   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                              IGHD3-10   \n",
       "...                                                                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                                  IGHD6-19   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                                   IGHD3-3   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                                   IGHD3-3   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                                  IGHD3-22   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                         IGHD4-17|IGHD4-23   \n",
       "\n",
       "                                     d_call_light clone_id_heavy_only  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                          102_3_1  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                          141_4_1  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                           26_2_2  \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                           66_8_3  \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                           18_4_1  \n",
       "...                                           ...                 ...  \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                               15_8_1  \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                               69_8_1  \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                               90_7_2  \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                              172_4_2  \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                               48_4_1  \n",
       "\n",
       "[838 rows x 28 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vdj.metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### copy\n",
    "\n",
    "You can deep copy the `Dandelion` object to another variable which will inherit all slots:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>clone_id</th>\n",
       "      <th>clone_id_by_size</th>\n",
       "      <th>sample_id</th>\n",
       "      <th>locus_heavy</th>\n",
       "      <th>locus_light</th>\n",
       "      <th>productive_heavy</th>\n",
       "      <th>productive_light</th>\n",
       "      <th>v_call_genotyped_heavy</th>\n",
       "      <th>v_call_genotyped_light</th>\n",
       "      <th>j_call_heavy</th>\n",
       "      <th>...</th>\n",
       "      <th>junction_aa_light</th>\n",
       "      <th>status</th>\n",
       "      <th>productive</th>\n",
       "      <th>isotype</th>\n",
       "      <th>vdj_status_detail</th>\n",
       "      <th>vdj_status</th>\n",
       "      <th>changeo_clone_id</th>\n",
       "      <th>d_call_heavy</th>\n",
       "      <th>d_call_light</th>\n",
       "      <th>clone_id_heavy_only</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC</th>\n",
       "      <td>102_3_1</td>\n",
       "      <td>563</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV1-69</td>\n",
       "      <td>IGKV1-8</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYYSYPRTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>110_33</td>\n",
       "      <td>IGHD3-22</td>\n",
       "      <td></td>\n",
       "      <td>102_3_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG</th>\n",
       "      <td>141_4_1</td>\n",
       "      <td>658</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV1-2</td>\n",
       "      <td>IGLV5-45</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CMIWHSSAWVV</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>467_34</td>\n",
       "      <td>IGHD3-16|IGHD4-17</td>\n",
       "      <td></td>\n",
       "      <td>141_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC</th>\n",
       "      <td>26_2_2</td>\n",
       "      <td>670</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV5-51</td>\n",
       "      <td>IGKV1D-8</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYYSFPYTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>306_35</td>\n",
       "      <td>IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26</td>\n",
       "      <td></td>\n",
       "      <td>26_2_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT</th>\n",
       "      <td>66_8_3</td>\n",
       "      <td>527</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-15</td>\n",
       "      <td>IGLV6-57</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CQSYDSSNVVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_j</td>\n",
       "      <td>Single</td>\n",
       "      <td>56_36</td>\n",
       "      <td>IGHD1-26</td>\n",
       "      <td></td>\n",
       "      <td>66_8_3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT</th>\n",
       "      <td>18_4_1</td>\n",
       "      <td>244</td>\n",
       "      <td>sc5p_v2_hs_PBMC_10k</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-33</td>\n",
       "      <td>IGLV2-14</td>\n",
       "      <td>IGHJ6</td>\n",
       "      <td>...</td>\n",
       "      <td>CSSYTSSSTRVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>125_37</td>\n",
       "      <td>IGHD3-10</td>\n",
       "      <td></td>\n",
       "      <td>18_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG</th>\n",
       "      <td>15_8_1</td>\n",
       "      <td>653</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV4-59</td>\n",
       "      <td>IGKV1-12</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQANSFPLTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>348_483</td>\n",
       "      <td>IGHD6-19</td>\n",
       "      <td></td>\n",
       "      <td>15_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT</th>\n",
       "      <td>69_8_1</td>\n",
       "      <td>189</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-21</td>\n",
       "      <td>IGKV3-20</td>\n",
       "      <td>IGHJ6</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQYGSSPLFTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>731_484</td>\n",
       "      <td>IGHD3-3</td>\n",
       "      <td></td>\n",
       "      <td>69_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG</th>\n",
       "      <td>90_7_2</td>\n",
       "      <td>713</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV3-48</td>\n",
       "      <td>IGLV2-14</td>\n",
       "      <td>IGHJ4</td>\n",
       "      <td>...</td>\n",
       "      <td>CSSYTSSSTRVF</td>\n",
       "      <td>IGH + IGL</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Single</td>\n",
       "      <td>Single</td>\n",
       "      <td>229_485</td>\n",
       "      <td>IGHD3-3</td>\n",
       "      <td></td>\n",
       "      <td>90_7_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT</th>\n",
       "      <td>172_4_2</td>\n",
       "      <td>372</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGK</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>IGHV4-34</td>\n",
       "      <td>IGKV1D-39|IGKV1-39</td>\n",
       "      <td>IGHJ3</td>\n",
       "      <td>...</td>\n",
       "      <td>CQQSYSTPRTF</td>\n",
       "      <td>IGH + IGK</td>\n",
       "      <td>T + T</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_v</td>\n",
       "      <td>Single</td>\n",
       "      <td>702_486</td>\n",
       "      <td>IGHD3-22</td>\n",
       "      <td></td>\n",
       "      <td>172_4_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC</th>\n",
       "      <td>48_4_1_1|48_4_1_2</td>\n",
       "      <td>699|28</td>\n",
       "      <td>vdj_v1_hs_pbmc3</td>\n",
       "      <td>IGH</td>\n",
       "      <td>IGL|IGL</td>\n",
       "      <td>T</td>\n",
       "      <td>T|F</td>\n",
       "      <td>IGHV4-4</td>\n",
       "      <td>IGLV1-51|IGLV1-40</td>\n",
       "      <td>IGHJ5</td>\n",
       "      <td>...</td>\n",
       "      <td>CQSYDRSLGGHYVF|CGTWDSSLSAGCA</td>\n",
       "      <td>IGH + IGL|IGL</td>\n",
       "      <td>T + T|F</td>\n",
       "      <td>IgM</td>\n",
       "      <td>Single + Multi_light_j|Multi_light_v</td>\n",
       "      <td>Multi</td>\n",
       "      <td>155_487</td>\n",
       "      <td>IGHD4-17|IGHD4-23</td>\n",
       "      <td></td>\n",
       "      <td>48_4_1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>838 rows × 28 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               clone_id clone_id_by_size  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC            102_3_1              563   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG            141_4_1              658   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC             26_2_2              670   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT             66_8_3              527   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT             18_4_1              244   \n",
       "...                                                 ...              ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                 15_8_1              653   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                 69_8_1              189   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                 90_7_2              713   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                172_4_2              372   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      48_4_1_1|48_4_1_2           699|28   \n",
       "\n",
       "                                                sample_id locus_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT  sc5p_v2_hs_PBMC_10k         IGH   \n",
       "...                                                   ...         ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          vdj_v1_hs_pbmc3         IGH   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC          vdj_v1_hs_pbmc3         IGH   \n",
       "\n",
       "                                     locus_light productive_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC         IGK                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG         IGL                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC         IGK                T   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT         IGL                T   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT         IGL                T   \n",
       "...                                          ...              ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG             IGL                T   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT             IGK                T   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC         IGL|IGL                T   \n",
       "\n",
       "                                     productive_light v_call_genotyped_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                T               IGHV1-69   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                T                IGHV1-2   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                T               IGHV5-51   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                T               IGHV3-15   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                T               IGHV3-33   \n",
       "...                                               ...                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                    T               IGHV4-59   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                    T               IGHV3-21   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                    T               IGHV3-48   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                    T               IGHV4-34   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                  T|F                IGHV4-4   \n",
       "\n",
       "                                     v_call_genotyped_light j_call_heavy  ...  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                IGKV1-8        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG               IGLV5-45        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC               IGKV1D-8        IGHJ3  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT               IGLV6-57        IGHJ4  ...   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT               IGLV2-14        IGHJ6  ...   \n",
       "...                                                     ...          ...  ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                   IGKV1-12        IGHJ4  ...   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                   IGKV3-20        IGHJ6  ...   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                   IGLV2-14        IGHJ4  ...   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT         IGKV1D-39|IGKV1-39        IGHJ3  ...   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC          IGLV1-51|IGLV1-40        IGHJ5  ...   \n",
       "\n",
       "                                                 junction_aa_light  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                   CQQYYSYPRTF   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                   CMIWHSSAWVV   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                   CQQYYSFPYTF   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                   CQSYDSSNVVF   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                  CSSYTSSSTRVF   \n",
       "...                                                            ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                       CQQANSFPLTF   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                      CQQYGSSPLFTF   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                      CSSYTSSSTRVF   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                       CQQSYSTPRTF   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      CQSYDRSLGGHYVF|CGTWDSSLSAGCA   \n",
       "\n",
       "                                             status productive  isotype  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC      IGH + IGK      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG      IGH + IGL      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC      IGH + IGK      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT      IGH + IGL      T + T      IgM   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT      IGH + IGL      T + T      IgM   \n",
       "...                                             ...        ...      ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          IGH + IGL      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          IGH + IGK      T + T      IgM   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      IGH + IGL|IGL    T + T|F      IgM   \n",
       "\n",
       "                                                         vdj_status_detail  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                       Single + Single   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                Single + Multi_light_j   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                       Single + Single   \n",
       "...                                                                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                           Single + Single   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                    Single + Multi_light_v   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      Single + Multi_light_j|Multi_light_v   \n",
       "\n",
       "                                      vdj_status  changeo_clone_id  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC      Single            110_33   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG      Single            467_34   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC      Single            306_35   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT      Single             56_36   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT      Single            125_37   \n",
       "...                                          ...               ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG          Single           348_483   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT          Single           731_484   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG          Single           229_485   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT          Single           702_486   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC           Multi           155_487   \n",
       "\n",
       "                                                              d_call_heavy  \\\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                              IGHD3-22   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                     IGHD3-16|IGHD4-17   \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC  IGHD1/OR15-1a|IGHD1/OR15-1b|IGHD1-26   \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                              IGHD1-26   \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                              IGHD3-10   \n",
       "...                                                                    ...   \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                                  IGHD6-19   \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                                   IGHD3-3   \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                                   IGHD3-3   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                                  IGHD3-22   \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                         IGHD4-17|IGHD4-23   \n",
       "\n",
       "                                     d_call_light clone_id_heavy_only  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC                          102_3_1  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG                          141_4_1  \n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC                           26_2_2  \n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT                           66_8_3  \n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT                           18_4_1  \n",
       "...                                           ...                 ...  \n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                               15_8_1  \n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                               69_8_1  \n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                               90_7_2  \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                              172_4_2  \n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC                               48_4_1  \n",
       "\n",
       "[838 rows x 28 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vdj2 = vdj.copy()\n",
    "vdj2.metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieving entries with `update_metadata`\n",
    "\n",
    "The `.metadata` slot in Dandelion class automatically initializes whenever the `.data` slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the `.data` slot, we can update the metadata with `ddl.update_metadata` and specify the option `retrieve`. \n",
    "\n",
    "The following options determine how the retrieval is completed:\n",
    "\n",
    "`split` - splits the retrieval into heavy and light chains calls.\n",
    "\n",
    "`split_locus` - smiliar to `split` but splits the retrieval to `IGH/IGK/IGL`.\n",
    "\n",
    "`collapse` - Adds a `|` to separate every element.\n",
    "\n",
    "`combine` - similar to `collapse` but only retains unique elements (separated by a `|` if multiple are found)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Example 1 : retrieving junction amino acid sequences***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 1700\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'\n",
       "    distance: 'heavy', 'light_0', 'light_1', 'light_2'\n",
       "    edges: 'source', 'target', 'weight'\n",
       "    layout: layout for 838 vertices, layout for 24 vertices\n",
       "    graph: networkx graph of 838 vertices, networkx graph of 24 vertices "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ddl.update_metadata(vdj, retrieve = 'd_call')\n",
    "vdj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the additional `d_call` heavy and light columns in the metadata slot."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, `dandelion` will not try to merge numerical columns as it can create mixed dtype columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Example 2 : editing clone_id column***\n",
    "\n",
    "Perhaps you want to have a bit more control with how clones are called. We can edit this directly from the `.data` slot and retrieve accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>clone_id</th>\n",
       "      <th>clone_id_heavy_only</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC</th>\n",
       "      <td>102_3_1</td>\n",
       "      <td>102_3_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG</th>\n",
       "      <td>141_4_1</td>\n",
       "      <td>141_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC</th>\n",
       "      <td>26_2_2</td>\n",
       "      <td>26_2_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT</th>\n",
       "      <td>66_8_3</td>\n",
       "      <td>66_8_3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT</th>\n",
       "      <td>18_4_1</td>\n",
       "      <td>18_4_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG</th>\n",
       "      <td>15_8_1</td>\n",
       "      <td>15_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT</th>\n",
       "      <td>69_8_1</td>\n",
       "      <td>69_8_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG</th>\n",
       "      <td>90_7_2</td>\n",
       "      <td>90_7_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT</th>\n",
       "      <td>172_4_2</td>\n",
       "      <td>172_4_2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC</th>\n",
       "      <td>48_4_1_1|48_4_1_2</td>\n",
       "      <td>48_4_1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>838 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               clone_id clone_id_heavy_only\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC            102_3_1             102_3_1\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG            141_4_1             141_4_1\n",
       "sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC             26_2_2              26_2_2\n",
       "sc5p_v2_hs_PBMC_10k_AAAGATGGTCGAATCT             66_8_3              66_8_3\n",
       "sc5p_v2_hs_PBMC_10k_AACCATGCAAGCTGTT             18_4_1              18_4_1\n",
       "...                                                 ...                 ...\n",
       "vdj_v1_hs_pbmc3_TCTTCGGTCCTAAGTG                 15_8_1              15_8_1\n",
       "vdj_v1_hs_pbmc3_TGCACCTCAGACAAAT                 69_8_1              69_8_1\n",
       "vdj_v1_hs_pbmc3_TGTATTCTCTGTTGAG                 90_7_2              90_7_2\n",
       "vdj_v1_hs_pbmc3_TTTATGCTCAGGATCT                172_4_2             172_4_2\n",
       "vdj_v1_hs_pbmc3_TTTATGCTCCTAGAAC      48_4_1_1|48_4_1_2              48_4_1\n",
       "\n",
       "[838 rows x 2 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if we only want to keep the light chain clone assignment \n",
    "clones = []\n",
    "for clone in vdj.data['clone_id']:\n",
    "    if '|' in clone: # this is because clones were merged into the the same column if they have different pairing of BCR combinations\n",
    "        clone_list = clone.split('|')\n",
    "        clones.append('|'.join(list(set([clone_2.rsplit('_', 1)[0] if clone_2.count('_') == 3 else clone_2 for clone_2 in clone_list]))))\n",
    "    else:\n",
    "        if clone.count('_') == 3: # this means it's looking for X_X_X_X, 3 underscores\n",
    "            clones.append(clone.rsplit('_', 1)[0]) # split the 3rd underscore but only keep the first entry\n",
    "        else:\n",
    "            clones.append(clone)\n",
    "vdj.data['clone_id_heavy_only'] = clones\n",
    "ddl.update_metadata(vdj, retrieve = 'clone_id_heavy_only', split = False, collapse = True)\n",
    "vdj.metadata[['clone_id', 'clone_id_heavy_only']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `concat`enating multiple objects\n",
    "\n",
    "This is a simple function to concatenate (append) two or more `Dandelion` class, or `pandas` dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 1700\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'\n",
       "    distance: 'heavy', 'light_0', 'light_1', 'light_2'\n",
       "    edges: 'source', 'target', 'weight'\n",
       "    layout: layout for 838 vertices, layout for 24 vertices\n",
       "    graph: networkx graph of 838 vertices, networkx graph of 24 vertices "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# for example, the original dandelion class has 838 unique cell barcodes and 1700 contigs\n",
    "vdj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 5100\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_heavy_2', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'umi_count_light_3', 'umi_count_light_4', 'umi_count_light_5', 'umi_count_light_6', 'umi_count_light_7', 'umi_count_light_8', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status'\n",
       "    distance: None\n",
       "    edges: None\n",
       "    layout: None\n",
       "    graph: None"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# now it has 5100 contigs instead, and the metadata should also be properly populated\n",
    "vdj_concat = ddl.concat([vdj, vdj, vdj])\n",
    "vdj_concat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### read/write\n",
    "\n",
    "`Dandelion` class can be saved using `.write_h5` and `.write_pkl` functions with accompanying compression methods. `write_h5` primarily uses pandas `to_hdf` library and `write_pkl` just uses pickle. `read_h5` and `read_pkl` functions will read the respective file formats accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.53 s, sys: 65.7 ms, total: 1.59 s\n",
      "Wall time: 1.64 s\n"
     ]
    }
   ],
   "source": [
    "%time vdj.write_h5('dandelion_results.h5', complib = 'bzip2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 564 ms, sys: 54.6 ms, total: 619 ms\n",
      "Wall time: 631 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 1700\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'\n",
       "    distance: 'heavy', 'light_0', 'light_1', 'light_2'\n",
       "    edges: 'source', 'target', 'weight'\n",
       "    layout: layout for 838 vertices, layout for 24 vertices\n",
       "    graph: networkx graph of 838 vertices, networkx graph of 24 vertices "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time vdj_1 = ddl.read_h5('dandelion_results.h5')\n",
    "vdj_1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The read/write times using `pickle` can be situationally faster/slower and file sizes can also be situationally smaller/larger (depending on which compression is used)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 9.14 s, sys: 68 ms, total: 9.21 s\n",
      "Wall time: 9.41 s\n"
     ]
    }
   ],
   "source": [
    "%time vdj.write_pkl('dandelion_results.pkl.gz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 89.9 ms, sys: 9.16 ms, total: 99.1 ms\n",
      "Wall time: 106 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Dandelion class object with n_obs = 838 and n_contigs = 1700\n",
       "    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'c_call', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'sample_id', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sequence_end', 'c_score', 'c_identity', 'c_support', 'c_call_10x', 'junction_aa_length', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_freq', 'duplicate_count', 'clone_id', 'changeo_clone_id', 'clone_id_heavy_only'\n",
       "    metadata: 'clone_id', 'clone_id_by_size', 'sample_id', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_genotyped_heavy', 'v_call_genotyped_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_light_0', 'umi_count_light_1', 'umi_count_light_2', 'junction_aa_heavy', 'junction_aa_light', 'status', 'productive', 'isotype', 'vdj_status_detail', 'vdj_status', 'changeo_clone_id', 'd_call_heavy', 'd_call_light', 'clone_id_heavy_only'\n",
       "    distance: 'heavy', 'light_0', 'light_1', 'light_2'\n",
       "    edges: 'source', 'target', 'weight'\n",
       "    layout: layout for 838 vertices, layout for 24 vertices\n",
       "    graph: networkx graph of 838 vertices, networkx graph of 24 vertices "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time vdj_2 = ddl.read_pkl('dandelion_results.pkl.gz')\n",
    "vdj_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (dandelion)",
   "language": "python",
   "name": "dandelion"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
