Samples

The research project complied with all relevant ethical regulations, and the protocols were approved by research ethics committees (details below). All participants were recruited voluntarily; provided written, informed consent; and were not financially compensated. The procedures used to derive fibroblasts and EPCs from two donors (S2 and S7) were previously described22. Two F-hiPSCs were obtained from S2. Four and two EPC B-hiPSCs were obtained from S7 and S2, respectively. A total of 452 F-hiPSCs were obtained from 288 donors and 17 erythroblast B-hiPSCs were obtained from 9 donors from the HipSci project. A total of 78 erythroblast B-hiPSCs and 141 subclones were obtained from 78 donors from the Insignia project. The details of HipSci and Insignia iPSC lines are described below.

All lines were incubated at 37C and 5% CO2 (ref. 35). Primary fibroblasts were derived from 2-mm punch biopsies collected from organ donors or healthy research volunteers recruited from the NIHR Cambridge BioResource under ethics for hiPSC derivation35 (Cambridgeshire and East of England Research Ethics Committee REC 09/H306/73, REC 09/H0304/77-V2 04/01/2013, REC 09/H0304/77-V3 15/03/2013). Biopsy fragments were mechanically dissociated and cultured with fibroblast growth medium (knockout DMEM with 20% fetal bovine serum; 10829018, ThermoFisher Scientific) until outgrowths appeared (within 14 days, on average). Approximately 30 days following dissection, when fibroblasts cultures had reached confluence, the cells were washed with phosphate-buffered saline (PBS), passaged using trypsin into a 25-cm2 tissue-culture flask and then again to a 75-cm2 flask upon reaching confluence. These cultures were then split into vials for cryopreservation and those seeded for reprogramming, with one frozen vial later used for DNA extraction for WES or WGS.

For erythroblast derivation, two blood samples were obtained for each donor. All samples were scanned and checked for ethical approval. Peripheral blood mononuclear cell (PBMC) isolation, erythroblast expansion and hiPSC derivation were done by the Cellular Generation and Phenotyping facility at the Wellcome Sanger Institute, Hinxton. Briefly, whole-blood samples collected from consented patients were diluted with PBS, and PBMCs were separated using standard Ficoll Paque density gradient centrifugation method. Following the PBMC separation, cells were cultured in expansion media containing StemSpan H3000, stem cell factor, interleukin-3, erythropoietin IGF-1 and dexamethasone for a total of 9 days66.

The EPCs were isolated using Ficoll separation of 100ml peripheral blood from organ donors (REC 09/H306/73) and the buffy coat transferred onto a 5g/cm2 collagen (BD Biosciences, 402326)-coated T-75 flask. The EPCs were grown using EPC media (EGM-2MV supplemented with growth factors, ascorbic acid plus 20% Hyclone serum; CC-3202, Lonza and HYC-001-331G; ThermoFisher Scientific Hyclone respectively)67. EPC colonies appeared after 10 days, and these were passaged using trypsin in a 1 in 3 ratio and eventually frozen down using 90% EPC media and 10% dimethyl sulfoxide.

Fibroblasts and erythroblasts were transduced using nonintegrating Sendai viral vectors expressing human OCT3/4, SOX2, KLF4 and MYC51 (CytoTune, Life Technologies, A1377801) according to the manufacturers instructions and cultured on irradiated mouse embryonic fibroblasts (MEFs; CF1). The EPCs were transduced using four Moloney murine leukemia retroviruses containing the coding sequences of human OCT4, SOX2, KLF4 and C-MYC and also cultured on irradiated MEFs.

Following all reprogramming experiments, the medium was changed to hiPSC culture medium35 containing advanced DMEM (Life Technologies), 10% knockout serum replacement (Life Technologies), 2 mM l-glutamine (Life Technologies), 0.007% 2-mercaptoethanol (Sigma-Aldrich) and 4ng ml1 recombinant zebrafish fibroblast growth factor 2 (CSCR, University of Cambridge) and 1% penicillin/streptomycin (Life Technologies). Cells with an iPSC morphology first appeared approximately 14 to 28 days after transduction, and undifferentiated colonies (six per donor) were picked between days 28 and 40, transferred onto 12-well MEF-CF1 feeder plates and cultured in hiPSC medium with daily medium changes until ready to passage.

Successful reprogramming was confirmed via genotyping array and expression array35. Pluripotency quality control (QC) was performed based on the HipSci QC steps, including the PluriTest using expression microarray data from the Illumina HT12v4 platform and copy number variation and loss of heterozygosity (CNV/LOH) detection using the HumanExome BeadChip Kit platform.

Pluripotent hiPSC lines were transferred onto feeder-free culture conditions, using 10g ml1 Vitronectin XF (Stemcell Technologies)-coated plates and Essential 8 (E8) medium (DMEM/F12 (HAM), E8 supplement (50) and 1% penicillin/streptomycin; Life Technologies)35. The media was changed daily, and cells were passaged every 57 days, depending on the confluence and morphology of the cells, at a maximum 1:3 split ratio until established, usually at passage five or six. The passaging method involved washing the confluent plate with PBS and incubating with PBS-EDTA (0.5mM) for 58min. After removing the PBS-EDTA, cells were resuspended in E8 media and replated onto Vitronectin-coated plates35. Once the hiPSCs were established in culture, lines were selected based on morphological qualities (undifferentiated, roundness and compactness of colonies) and expanded for banking and characterization. DNA from fibroblasts and hiPSCs was extracted using Qiagen Chemistry on a QIAcube automated extraction platform.

A 96-well plate containing 500ng genomic DNA in 120l was cherry-picked and an Agilent Bravo robot used to transfer the gDNA into a Covaris plate with glass wells and adaptive focused acoustics (AFA) fibers. This plate was then loaded into the LE220 for the shearing process. The sheared DNA was then transferred out of this plate and into an Eppendorf TwinTec 96 plate using the Agilent Bravo robot. Samples were then purified ready for library prep. In this step, the Agilent NGS Workstation transferred AMPure-XP beads and the sheared DNA to a Nunc deep-well plate, then collected, and washed the bead-bound DNA. The DNA was eluted and transferred along with the AMPure-XP beads to a fresh Eppendorf TwinTec plate. Library construction comprised end repair, A-tailing and adapter ligation reactions, performed by a liquid handling robot.

In this step, the Agilent NGS Workstation transferred PEG/NaCl solution and the adapter ligated libraries containing AMPure-XP beads to a Nunc deep-well plate and size-selected the bead-bound DNA. The DNA was eluted and transferred to a fresh Eppendorf TwinTec plate. Agilent Bravo and Labtech Mosquito robotics were used to set up a 384-well quantitative polymerase chain reaction (qPCR) plate, which was then ready to be assayed on the Roche Lightcycler. The Bravo was used to create a qPCR assay plate. This 384-well qPCR plate was then placed in the Roche Lightcycler. A Beckman NX08-02 was used to create an equimolar pool of the indexed adapter ligated libraries. The final pool was then assayed against a known set of standards on the ABI StepOne Plus. The data from the qPCR assay was used to determine the concentration of the equimolar pool. The pool was normalized using Beckman NX08-02. All paired-end sequencing was performed using a range of Illumina HiSeq platforms as the lines were generated over many years (HiSeq 2000 onwards). The sequencing coverage of WGS, WES and hcWES in hiPSC lines are 41, 72 and 271, respectively.

Reads were aligned to the human genome assembly GRCh37d5 using bwa version 0.5.10 (ref. 68) (bwa aln -q 15 and bwa sampe) followed by quality score recalibration and indel realignment using GATK version 1.5-9 (ref. 69) and duplicate marking using biobambam2 version 0.0.147. VerifyBamID version 1.1.3 was used to check for possible contamination of the cell lines, and all but one passed (Supplementary Fig. 4).

Variable sites were called jointly in each fibroblast and hiPSC sample using BCFtools/mpileup and BCFtools/call version 1.4.25. The initial call set was then prefiltered to exclude germline variants that were above 0.1% minor allele frequency in 1000 Genomes phase 3 (ref. 70) or ExAC 0.3.1 (ref. 71). For efficiency we also excluded low coverage sites that cannot reach statistical significance and for subsequent analyses considered only sites that had a minimum sequencing depth of 20 or more reads in both the fibroblast and hiPSC and at least 3 reads with a nonreference allele in either the fibroblast or hiPSC sample. At each variable site a Fishers exact test was performed on a two-by-two contingency table, with rows representing the number of reference and alternate reads and the columns the fibroblast or hiPSC sample. This approach for mutation calling is implemented through BCFtools/ad-bias, and we have adopted it preferentially instead of existing tumor-normal somatic-variant calling tools because, by definition, tools developed for the analysis of tumor-normal data assume that mutations of interest are absent from the normal tissue. However, in our experiment, many mutations were present, albeit at low frequency, in the source tissue fibroblasts.

More information on bcftools ad-bias can be found on the online bcftools-man page at http://samtools.github.io/bcftools/bcftools-man.html. The ad-bias protocol is distributed as a plugin in the main bcftools package, which can be downloaded from http://www.htslib.org/download/. Bcftools ad-bias implements a Fisher test on a 2 2 contingency table that contains read counts of reference/alternate alleles found in either the iPSC or fibroblast sample. We ran bcftools ad-bias with default settings as follows:

$$begin{array}{l}{{{mathrm{bcftools}}}} + {{{mathrm{ad}}}} - {{{mathrm{bias}}}},{{{mathrm{exome}}}}.{{{mathrm{bcf}}}} - - , - {{{mathrm{t}}}}1 - {{{mathrm{s}}}},{{{mathrm{sample}}}}.{{{mathrm{pairs}}}}.{{{mathrm{txt}}}} - {{{mathrm{f}}}}\ ^{prime} % {{{mathrm{REF}}}}backslash {{{mathrm{t}}}}% {{{mathrm{ALT}}}}backslash {{{mathrm{t}}}}% {{{mathrm{CSQ}}}}backslash {{{mathrm{t}}}}% {{{mathrm{INFO}}}}/{{{mathrm{ExAC}}}}backslash {{{mathrm{t}}}}% {{{mathrm{INFO}}}}/{{{mathrm{UK1KG}}}}^{prime} end{array}$$

where exome.bcf was the BCF file created by our variant calling pipeline, described in Methods, and sample.pairs.txt was a file that contained matched pairs of the iPSC and corresponding fibroblast sample, one per line, as follows:

$$begin{array}{l}{{{mathrm{HPSI}}}}0213{{{mathrm{i}}}} - {{{mathrm{koun}}}}_2quad {{{mathrm{HPSI}}}}0213{{{mathrm{pf}}}} - {{{mathrm{koun}}}}\ {{{mathrm{HPSI}}}}0213{{{mathrm{i}}}} - {{{mathrm{nawk}}}}_55quad {{{mathrm{HPSI}}}}0213{{{mathrm{pf}}}} - {{{mathrm{nawk}}}}\ {{{mathrm{HPSI}}}}0313{{{mathrm{i}}}} - {{{mathrm{airc}}}}_2quad {{{mathrm{HPSI}}}}0313{{{mathrm{pf}}}} - {{{mathrm{airc}}}}end{array}.$$

We corrected for the total number of tests (84.8M) using the BenjaminiHochberg procedure at a false discovery rate of 5%, equivalent to a P value threshold of 9.9 104, to call a mutation as a significant change in allele frequency between the fibroblast and iPSC samples. Furthermore, we annotated sites from regions of low mappability and sites that overlapped with copy-number alterations previously called from array genotypes35 and removed sites that had greater than 0.6 alternate allele frequency in either the fibroblast or hiPSC, as these sites are likely to be enriched for false positives. Dinucleotide mutations were called by sorting mutations occurring in the same iPSC line by genomic position and marking mutations that were immediately adjacent as dinucleotides.

Single substitutions were called using CaVEMan (Cancer Variants Through Expectation Maximization; http://cancerit.github.io/CaVEMan/) algorithm72. To avoid mapping artefacts, we removed variants with a median alignment score <90 and those with a clipping index >0. Indels were called using cgpPindel (http://cancerit.github.io/cgpPindel/). We discarded indels that occurred in repeat regions with repeat count >10 and variant call format (VCF) quality <250. Double substitutions were identified as two adjacent single substitutions called by CaVEMan. The ten HipSci lines are HPSI0714i-iudw_4, HPSI0914i-laey_4, HPSI0114i-eipl_1, HPSI0414i-oaqd_2, HPSI0414i-oaqd_3, HPSI1014i-quls_2, HPSI1013i-yemz_3, HPSI0614i-paab_3, HPSI1113i-qorq_2 and HPSI0215i-fawm_4.

Mutational signature analysis was performed on S7 EPC-hiPSCs, S2 F-hiPSCs, S2 EPC-hiPSCs and the HipSci F-hiPSC WGS dataset. All dinucleotide mutations were excluded from this analysis. We generated 96-channel single substitution profiles for 324 hiPSCs and 204 fibroblasts. We fitted previously discovered skin-specific substitutions to each sample using an R package (signature.tools.lib)31. Function SignatureFit_withBootstrap() was used with default parameters. In downstream analysis, the exposure of two UV-caused signatures Skin_D and Skin_J were summed up to represent the total signature exposure caused by UV (signature 7). A de novo signature extraction was performed on 324 WGS HipSci F-hiPSCs to confirm that the UV-associated skin signatures (Skin_D and Skin_J, signature 7) and culture-associated one (Skin_A, signature 18) are also the most prominent signatures identified in de novo signature extraction (Supplementary Fig. 5).

Reference information of replication timing regions were obtained from Repli-seq data of the ENCODE project (https://www.encodeproject.org/)73. The transcriptional strand coordinates were inferred from the known footprints and transcriptional direction of protein coding genes. In our dataset, we first orientated all G>A and GG>AA to C>T and CC>TT (using pyrimidine as the mutated base). Then, we mapped C>T and CC>TT to the genomic coordinates of all gene footprints and replication timing regions. Lastly, we counted the number of C>T/CC>TT mutations on transcribed and nontranscribed gene regions in different replication timing regions.

We classified mutations (substitutions and indels) in HipSci F-hiPSCs into fibroblast-shared mutations and private mutations. Fibroblast-shared mutations in hiPSCs are the ones that have at least one read from the mutant allele found in the corresponding fibroblast. Private mutations are the ones that have no reads from the mutant allele in the fibroblast. Mutational signature fitting was performed separately for fibroblast-shared substitutions and private substitutions in hiPSCs. For indels, only the percentage of different indel types was compared between fibroblast-shared indels and private indels.

We inspected the distribution of VAFs of substitutions in HipSci fibroblasts and HipSci F-hiPSCs. Almost all hiPSCs had VAFs distributed around 50%, indicating that they were clonal. In contrast, all fibroblasts had lower VAFs, which distributed around 25% or lower, indicating that they were oligoclonal. We computed kernel density estimates for VAF distributions of each sample. Based on the kernel density estimation, the number of clusters in a VAF distribution was determined by identification of the local maximum. Accordingly, the size of each cluster was estimated by summing up mutations having VAF between two local minimums.

Variant consequences were calculated using the Variant Effect Predictor74 and BCFtools/csq75. For dinucleotide mutations, we recorded only the most impactful consequence of either of the two members of the dinucleotide, where the scale from least to most impactful was intergenic, intronic, synonymous, 3 untranslated region, 5 untranslated region, splice region, missense, splice donor, splice acceptor, start lost, stop lost and stop gained. We identified overlaps with putative cancer driver mutations using the COSMIC All Mutations in Census Genes mutation list (CosmicMutantExportCensus.tsv.gz) version 92, 27 August 2020.

To detect genes under positive selection, we used dN/dS ratios as implemented in the dNdScv R package (https://github.com/im3sanger/dndscv)49. dNdScv uses maximum likelihood models to calculate the ratio of nonsynonymous to synonymous mutations per gene, normalized by sequence composition, trinucleotide substitution rates and the local mutability of each gene based on epigenetic covariates.

Three analyses were run for 452 F-hiPSCs and 78 B-hiPSCs sequencing data:

default dNdScv (exome-wide, looking at all genes in the genome or exome for selection);

restricted hypothesis testing of known cancer genes (to increase the statistical power on known drivers, using the gene list from Martincorena et al.49); and

detection of mutational hotspots (using the sitednds function in dNdScv on hotspots detected in The Cancer Genome Atlas).

Erythroblasts were derived from PBMCs, following appropriate ethics committee approvals (REC 13/EE/0302), and reprogrammed using the nonintegrating CytoTune Sendai virus reprogramming kit (OCT3/4, SOX2, KLF4 and C-MYC) by the Cellular Generation and Phenotyping facility at the Wellcome Sanger Institute in the same way as for the HipSci lines (described above). After establishment of B-hiPSCs lines that had passed all QC steps (described above) and at cell passage equivalent to about 30 doublings, expanded clones were single-cell subcloned to generate two to four daughter subclones for each B-hiPSC line. WGS was run on germline, erythroblasts, B-hiPSC parental clones and B-hiPSC subclones. The average sequencing coverage of WGS was 38 (Supplementary Table 14). Single-nucleotide polymorphism genotyping was performed as a QC measure to ensure matches between all hiPSCs and respective original starting sample. RNA sequencing was run on 78 iPSC parental clones.

Single substitutions were called using CaVEMan (Cancer Variants Through Expectation Maximization; http://cancerit.github.io/CaVEMan/) algorithm72. To avoid mapping artefacts, we removed variants with a median alignment score <140 and those with a clipping index >0. Indels were called using cgpPindel (http://cancerit.github.io/cgpPindel/). We discarded indels that occurred in repeat regions with repeat count >10 and VCF quality <250. Double substitutions were identified as two adjacent single substitutions called by CaVEMan. Mutation calls were obtained for erythroblasts, iPSC parental clones and subclones.

The BCOR-mutant B-hiPSCs (MSH40i2, MSH93i6) and the BCOR-wild-type B-hiPSCs (MSH34i2, MSH30i3) were maintained in feeder-free conditions cultured in Essential E8 medium (ThermoFisher Scientific, A1517001) on Vitronectin FX (Stemcell Technologies, 07180)-coated plates. hiPSC medium was changed daily, and the cells were monitored to ensure there were no signs of spontaneous differentiation. hiPSCs were expanded every 3 or 4 days as small clumps using 0.5mM UltraPure EDTA (ThermoFisher Scientific, 1557020) diluted in Dulbeccos phosphate buffered saline (DPBS) (ThermoFisher Scientific, 14190342).

Before neural induction, three independent replicates of hiPSC from each donor line were generated and cultured for 1 week as described above. Healthy, nondifferentiating hiPSCs colonies were dissociated into single-cell suspension using TrypLE Express Enzyme (ThermoFisher Scientific, 12605010) and plated on Vitronectin FX-coated plates at 50,000 cells/cm2 density in the presence of RevitalCell Supplement (ThermoFisher Scientific, A26445-01, lot 2170092). The cells were cultured for another 2 days until they reached 6075% confluence. At day 0, the culture medium was switched to neural induction medium (NIM) containing V/V DMEMF12 HEPES (ThermoFisher Scientific, 11330032, lot 2186798) and neurobasal medium (ThermoFisher Scientific, 21103-049, lot 2161553), 1 B-27 Supplement (ThermoFisher Scientific, 17504-044, lot 2188886), 1 N2 Supplement (ThermoFisher Scientific, 17502-048; Lot: 2193551), MEM NEAA (ThermoFisher Scientific, 11140-035, lot 2202923), 1 Glutamax-I (ThermoFisher Scientific, 35050-061, lot 2085268), 1 penicillin/streptomycin in the presence of 10M SB431542 (Tocris, 1414/10) and 200nM LDN193189 (Tocris, 6053/10) with an addition of 1 RevitaCell. Starting from day 1, NIM without RevitaCell was changed every day until day 12.

At the end of the neural induction process (day 12), the cells were dissociated into single-cell suspension using TrypLE Express Enzyme and plated at high cell density (200,000 cells/cm2) in double-coated plates of PDL (ThermoFisher Scientific, A3890401, lot 881772E) and 15g ml1 Cultrex mLaminin I Pathclear (Biotechne, 3400-010-02; Lot: 1594368). The NIM was switched to neuron differentiation medium (NDM) containing BrainPhys Neural Medium (Stemcell Technologies, 05790, batch 1000031535), 1 B-27 Supplement (ThermoFisher Scientific, 17504-044, lot 2188886), 1 N2 Supplement (ThermoFisher Scientific, 17502-048, lot 2193551), 50M dibutyryl-cAMP, sodium salt (Tocris, 1141/50), 200 nM l-ascorbic acid (Tocris, 4055), 20ng ml1 BDNF (Cambridge Bioscience, GFH1-100), 20ng ml1 GDNF (Cambridge Bioscience, GFM37-100) in presence of 10M Y-27632 (Tocris, 1254/10). On day 13, the medium was changed to NDM without Y-27632, and the cells were allowed to differentiate for another 14 days. Two thirds of the medium was changed three times a week.

During the cell differentiation process, cell pellets from all culture replicates were harvested at days 0, 6, 12 and 27 (endpoint) for an RNA-sequencing serial time study. Immunostaining characterization was performed at days 0, 12 and 27 of differentiation to assess the differentiation efficiency.

Total RNA was extracted using PureLink RNA Mini Kit (ThermoFisher Scientific, 12183018A) following the manufacturers recommendations. The RNA was quality controlled; the cDNA libraries were prepared and sequenced using Illumina NovaSeq 6000 technology. Each sequenced sample had 20 million read pairs of 150-bp paired-end reads.

Splice-aware STAR v2.5.0a76 was used to map RNA-sequencing data to the reference genome. For the human decoy reference genome hs37d5.fa.gz, a genome index was first generated. Then, using the splice junction information from Gencode GTF annotation file v19, fastq files were mapped. The fragments of reads linked with the gene features were then counted using featureCounts v2.0.1 (ref. 77). The samples raw counts matrices were then analyzed in R version 4.0.4. Differential gene expression was performed using the DESeq2 R package.

The expression of pluripotency markers at day 0 (hiPSC) of differentiation was assessed using a commercially available PSC (OCT4, SSEA4) Immunocytochemistry Kit (ThermoFisher Scientific, A25526, lot 2194558).

NSCs at day 12 and neurons at day 27 of differentiation were stained as already described before with minor modifications78. Briefly, the medium was discarded from the plates, and the cells were rinsed gently with DPBS. A 4% solution of paraformaldehyde was used to fix the cells for 20min at room temperature. The cells were rinsed twice with DPBS and permeabilized for 20min with 0.1% Triton X-100 (Sigma-Aldrich, T8787-50ML). Nonspecific epitopes were blocked with 0.5% BSA solution for 1h at room temperature. Cells were incubated overnight at 4C with the primary antibodies as follows: on day 12, cells were incubated with an anti-PAX6 antibody (ThermoFisher Scientific, 14-9914-82, dilution 1:100); on day 27, cells were incubated with an anti-Tubulin beta III (TUBB3) (Millipore, MAB1637, dilution 1:400). The cells were then rinsed three times with 1 DPBS and incubated with the secondary antibody Alexa Fluor 488 donkey anti-mouse (ThermoFisher Scientific, A21202, dilution 1:500). The cells were rinsed three times with DPBS and the nuclei counterstained with NucBlue Fixed Cell Stain ReadyProbes (ThermoFisher Scientific, R37606). The images were acquired within 48h using EVOS FL Auto 2 microscope (ThermoFisher Scientific, AMAFD2000), and the figures were made using the FigureJ plugin in ImageJ software.

All statistical analyses were performed in R79. The effects of age and sex on mutation burden of F-hiPSCs were estimated using MannWhitney test, wilcox.test() in R. Tests for correlation in the study were performed using cor.test() in R.

For cancer driver mutations identified in HipSci F-hiPSCs, a two-sided Fisher test was used to call a mutation as a significant change in allele frequency between the fibroblast and iPSC samples (Supplementary Table 5). A BenjaminiHochberg procedure for multiple hypothesis testing was used.

For the differentiation and immunostaining experiments, each BCOR-mut and BCOR-wt cell line had three independent biological replicates differentiated, and for each of these replicates, three wells were stained and imaged using immunofluorescence. At every stage of the neural differentiation (day 0, day 12 and day 27), a total of 36 images were analyzed for both BCOR-mut and BCOR-wt cell lines.

Differential gene expression analysis of Insignia B-hiPSCs was performed using DESeq2, which fits each genes negative binomial generalized linear model. The default DESeq2 Wald test was used for significance testing, and a threshold of <0.05 of the adjusted P value was applied (Supplementary Table 13).

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

View original post here:

Substantial somatic genomic variation and selection for BCOR mutations in human induced pluripotent stem cells - Nature.com

Related Post

Leave a comment

Your email address will not be published. Required fields are marked *


Refresh