Methods xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Methods journal homepage: www.elsevier.com/locate/ymeth

Efficient strategies for TALEN-mediated genome editing in mammalian cell lines Julien Valton ⇑, Jean-Pierre Cabaniols ⇑, Romàn Galetto, Fabien Delacote, Marianne Duhamel, Sebastien Paris, Domique Alain Blanchard, Céline Lebuhotel, Séverine Thomas, Sandra Moriceau, Raffy Demirdjian, Gil Letort, Adeline Jacquet, Annabelle Gariboldi, Sandra Rolland, Fayza Daboussi, Alexandre Juillerat, Claudia Bertonati, Aymeric Duclert, Philippe Duchateau Cellectis SA, 8 rue de la croix Jarry, 75013 Paris, France

a r t i c l e

i n f o

Article history: Received 19 December 2013 Revised 24 June 2014 Accepted 25 June 2014 Available online xxxx Keywords: TALEN Gene editing Gene tagging Gene inactivation Gene insertion Methylation

a b s t r a c t TALEN is one of the most widely used tools in the field of genome editing. It enables gene integration and gene inactivation in a highly efficient and specific fashion. Although very attractive, the apparent simplicity and high success rate of TALEN could be misleading for novices in the field of gene editing. Depending on the application, specific TALEN designs, activity assessments and screening strategies need to be adopted. Here we report different methods to efficiently perform TALEN-mediated gene integration and inactivation in different mammalian cell systems including induced pluripotent stem cells and delineate experimental examples associated with these approaches. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction Transcription activator-like effectors (TALEs), a group of bacterial plant pathogen proteins, have recently emerged as new engineerable scaffolds for production of tailored DNA binding domains with chosen specificities [1]. A TALE DNA binding domain is composed of a variable number of 33–35 aminoacids repeat modules that are nearly identical to one another except for two variable aminoacids named Repeat Variable Di residues (RVD) located at positions 12 and 13. The nature of residues 12 and 13 determines base preferences of individual repeat module and it is now well established that there is a preferential pairing between A, C, G, T nucleotides and the repeat modules harboring respectively NI, HD, NN, and NG RVDs. Based on this specificity cipher, engineered TALE DNA binding domains have been generated and fused to different active domains to generate a vast portfolio of

Abbreviations: TALEN, transcription activator like effector; NHEJ, non homologous end joining; HR, homologous recombination; indels, insertion deletion; 5mC, 5-methylated cytosine; CpG, cytosine-phospho-guanine; GOI, gene of interest; POI, protein of interest. ⇑ Corresponding authors. Fax: +33 (0)1 81 69 16 08. E-mail addresses: [email protected] (J. Valton), jean-pierre.cabaniols@ cellectis.com (J.-P. Cabaniols).

gene editing tools with custom DNA specificity [1,2]. It includes fusions to transcriptional activator [3] and transcriptional repressor domains [4] as well as chromatin remodeling enzyme [5–7] and recombinase [8]. More importantly it also includes fusion to different nuclease domains such as FokI, I-TevI, PvuII and I-AniI [9–12] enabling the generation of the transcription activator like effector nuclease named TALEN1 or compact TALEN. Today, TALEN is one of the most widely used TALE-based tools in the field of genome editing. It was used to specifically inactivate, integrate or correct genes of interest for biotechnological and therapeutic applications [13,14]. Each of these different applications was reported to be achieved with high success rate in various organisms including human, mouse, rat, zebrafish and plant as non exhaustive examples [13,14]. Regarding gene inactivation in mammalian cells, Reyon et al. reported that among 96 loci tested in U-2 OS mammalian cells, 88% displayed insertion and deletion (indels) events above 3% [15]. Such high efficiency was further confirmed by Kim et al. that reported that 98% of the 103 loci studied in HEK293 cells, displayed indels P0.5% [16]. Concerning gene integration in mammalian cells, it has been reported to be successfully promoted by TALEN at different loci including AAVSI, OCT4,

1

TALEN™ is a trademark owned by Cellectis bioresearch.

http://dx.doi.org/10.1016/j.ymeth.2014.06.013 1046-2023/Ó 2014 Elsevier Inc. All rights reserved.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

2

J. Valton et al. / Methods xxx (2014) xxx–xxx

PITX3 and tyrosine hydroxylase loci found in human embryonic stem cell (hESc), human induced pluripotent stem cells (HiPSC) and in fertilized zebrafish eggs [17–19]. Although to a much lower throughput, mammalian gene correction was also successfully performed on COL7A, HBB, DMD and XPC genes, respectively involved in the Recessive dystrophic epidermolysis bullosa, in the sickle cell disease, in the Duchenne dystrophy and in the xeroderma pigmentosum syndrome [20–23]. Although very attractive, the apparent simplicity and high success rate of TALEN technology could be misleading for novices in the field of gene editing. Indeed, they are a number of different key parameters that need to be taken into account to get successful editing outcomes. Depending on the application, specific TALEN designs, activity assessments and screening strategies need to be adopted. Here we report different methods to efficiently perform TALEN-mediated gene integration and inactivation and delineate experimental examples associated with these approaches. 2. Material and method 2.1. TALEN design When designing a TALEN, the following key points should be considered:  Verify that the TALEN target sequence is present in the cell line of interest. Notably, the presence of SNPs should be carefully check.  Avoid target sites containing (i) at least 5 identical consecutive nucleotides (iii), methylated CpG and (iii) containing AA dinucleotides immediately downstream the first T0.  For methylated loci, assemble a methyl insensitive TALE DNA binding domain by substituting HD TALE repeat for N⁄. To our knowledge, up to two substitutions could be performed per TALEN arms although we cannot exclude that a higher number of substitution could lead to an efficient TALEN.  Identify off-target sites bio-informatically. TALEN showing potential off target sites bearing less than 5 total mismatches compared to the targeted sequence should be left aside.  Synthesize at least two TALEN per locus to edit. These TALEN should then be validated at the endogenous locus in the biological system of interest. In most of the cases, the fastest and easiest way is to determine the ability of the TALEN to produce mutations via error-prone NHEJ using the different screening methods delineated below. The choice of the final TALEN to use will depend on the balance between the best localization and activity level. 2.2. TALEN activity assay in yeast 2.2.1. Generation and transformation of TALEN and TALEN target constructions for single-strand annealing (SSA) assays  Insert TALEN DNA target into the yeast LacZ reporter vectors (pFL39-ADH-LACURAZ for yeast previously described in [10,24] using the Gateway protocol (Invitrogen). Please note that all targets should contain a control I-SceI target site to monitor baseline SSA activity. In addition, target should contain, between the two LacZ homologous region, a URA gene to enable sorting-out recombined target on a URA-medium (Supplementary Fig. 1).  Insert left and right TALEN constructions onto Leu2 or G418R marker-containing plasmids, respectively.  Transform the TALEN target-containing LacZ reporter vector into Saccharomyces cerevisiae strain FYBL2-7B (Mat alpha, ura3-D851, trp1D63, leu2D1, lys2D202) and select transformants onto solid synthetic medium supplemented by glucose and lacking histidine and uracil.

 Transform the two TALEN constructions into FYC2-6A (mat alpha, trp1D63, leu2D1, his3D200) and select transformants onto synthetic solid medium lacking leucine and lysine and supplemented by G418 and glucose. 2.2.2. Mating of TALEN-expressing clones with reporter plasmidexpressing clones and determination of b-galactosidase activity  Use a colony gridder (QpixII, Genetix) to perform the matting between TALEN-containing yeast strains and those harboring the respective LacZ-TALEN target reporter vector.  Grid TALEN-containing yeast strains on nylon filters placed on YPGlycerol plates, using high gridding density (about 20 spots/ cm2). On the same filter, perform a second gridding by spotting reporter vector-harboring yeast strains. Place membranes on solid agar containing YPGlycerol rich medium and incubate them overnight at 30 °C to allow mating.  Transfer filters onto synthetic medium, lacking leucine, tryptophan, lysine and histidine, with glucose (2%) as the carbon source and supplemented with G418, and incubated for 3 days at 30 °C to select for diploids carrying the two TALEN and reporter vectors.  Transfer filters onto YPGalactose rich medium supplemented with G418 for 5 days at either 30 °C (stringent) or 37 °C (standard) to induce the expression of the TALEN.  Determine b-galactosidase activity resulting from TALENinduced cleavage of reporter plasmid by placing filters on solid agarose medium with 0.05% X-Gal in 0.5 M sodium phosphate buffer, pH 7.0, 0.1% SDS, 2% agarose, and incubate at 37 °C.  Scan the resulting filters and quantify each spot using the median values of the pixels constituting the spot. Arbitrary values of 0 and 1 are associated to white and black pixels, respectively. bgalactosidase activity is directly associated with the efficiency of homologous recombination. Relative values are determined with respect to a positive control known to saturate the signal under the conditions tested. 2.3. DNA delivery methods DNA delivery protocol is among the most important parameter to take into account for successful gene editing, thus great care must be taken to set it up. To achieve optimal DNA delivery:  Explore different transfection reagents or electroporation systems (FuGENEÒ, lipofectamineÒ, AMAXAÒ NucleofectorÒ, NEONÒ transfection system and Cytopulse as non exhaustive examples) and select the one that give the best percentage of transfected cells and the higher level of protein expression.  Optimize quantities of transfected plasmid to find the appropriate balance between TALEN activity and toxicity.  Consider performing a transient cold shock immediately after transfection. Such treatment has been shown to greatly enhance the outcome of gene processing using Zinc finger nucleases [25]. 2.3.1. Transfection of HCT116 and A549 cell lines using lipid-based DNA transfection reagents 2.3.1.1. Low throughput format. HCT116 and A549 cell lines, obtained from ATCC (CCL-247™ and CCL-185™ respectively), need to be cultured at 37 °C with 5% CO2 in McCoy’s5A (PAA Laboratories, Austria) supplemented with 10% FBS, penicillin (100 IU/ml) and streptomycin (100 lg/ml). One day prior transfection  Seed the cells in complete medium at 1  106 cells/10 cm dish or 2  105 cells/6 wells plates.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

On transfection day  Pre warmed FuGENEÒ HD reagent (Promega, cat. E2311) at room temperature before use.  Mix the TALEN plasmids or GFP encoding control plasmid (10 lg for 10 cm dish or 2 lg for 6 well plates) in 500 ll of serum-free and antibiotic-free medium and vortex for 1–2 s.  Add 45 ll of FuGENEÒ HD reagent in the diluted DNA mix and vortex for 2–5 s.  Incubate mix for 15 min at room temperature.  Dispense total transfection mix over plated cells.

3

using 0.4 cm gap cuvettes (Bio-Rad Laboratories, Hercules, California) and AgilePulse MAX System (BTX Harvard Apparatus, Holliston, Massachusetts). Set electroporation parameters to two 0.1 mS pulses at 2000 V/cm, followed by four 0.2 mS pulses at 325 V/cm.  Immediately dilute cells in complete media after electroporation and incubate them at 37 °C until harvesting them. Two days post electroporation (EP)  Assess the transfection efficiency by flow cytometry using the protocol described for HCT116 and A549 cell lines.

Two days post transfection Four days post electroporation (optional)  Assess your transfection efficiency. Transfection efficiency can be determined by flow cytometry analysis using cells transfected with the GFP encoding control plasmid. The percentage of GFP positive cells obtained allows determining transfection efficiency. At least 50% of transfection efficiency should be reached to be able to detect TALEN activity at the endogenous locus. 2.3.1.2. High throughput format. HCT116 cell line are seeded in 96 well flat bottom plates at 105 cells/well in McCoy’s5A media supplemented with 10% Fetal bovin serum (FBS, PAA Laboratories, Austria), penicillin (100 IU/ml) and streptomycin (100 lg/ml). The next day, 100 ng of TALEN (50 ng of each plasmid), 10 ng of a GFP expression vector and 0.6 ll of FUGENE transfection reagent are diluted in McCoy’s5A medium (without FBS) in a final volume of 50 ll. Transfection mix is then added to seeded cells. One day post transfection  Determine the transfection efficiency using a CellInsight device. Add 50 ll/well of Hoechst 33342 and quantify the fluorescence signal. Three days post transfection  Harvest transfected cells and washed them two times with 100 ll of PBS. Lyse the cells by adding 40 ll/well of Lysis buffer (10 mM TRIS, pH8; 0.45% (V/V) NP40; 0.45% (V/V) Tween20; 100 lg/ml proteinase K (Eurobio)) and incubate plate for 10 min at room temperature and for 2H at 55 °C on an orbital shaker Transfer cell lysates in PCR microplates for an additional incubation of 10 min at 95 °C (denaturation step) in a thermocycler (at that point, cell lysates can be stored at 4 °C for at least one week). PCR amplify these samples and assess TALEN activity using T7 Endonuclease assay, PCR products bandshift and other screening protocols described herein. 2.3.2. Transfection of Jurkat cells by DNA electroporation Jurkat cells obtained from DMSZ, Germany, need to be cultured at 37 °C with 5% CO2 in RPMI-1640 media (PAA Laboratories, Austria) supplemented with 10% FBS, penicillin (100 IU/ml) and streptomycin (100 lg/ml). Use the electroporation protocol described below. One day prior transfection  Seed the cells in complete medium at 4.5  105 cells/ml in 75 cm2 flasks (up to 20 ml per flask).  On electroporation day.  Electroporate 1  106 Jurkat cells with 10 lg of each TALENencoding plasmids in a final volume of 200 ll of ‘‘Cytoporation buffer T’’ (BTX Harvard Apparatus, Holliston, Massachusetts)

 Recover 1  106 cells from the first EP and reproduce the same electroporation protocol to maximize the efficiency of TALENmediated gene inactivation. Eight days post electroporation (optional)  Recover 1  106 cells from the second EP and reproduce the same electroporation protocol to maximize the efficiency of TALEN-mediated gene inactivation. Please note that when repeating the electroporation step, cells should be diluted at 4.5  105 cells/ml one day prior transfection. 3 days after each electroporation, harvest cells and determine the % of gene inactivation with the different genotypic or phenotypic screening strategies delineated below. 2.3.3. Transfection of U-2 OS cells by DNA electroporation U-2 OS obtained from ATCC (HTB-96™), need to be cultured at 37 °C with 5% CO2 in McCoy’s5A media (Life Tech, USA) supplemented with 10% FBS, penicillin (100 IU/ml) and streptomycin (100 lg/ml). On transfection day  Recover cells by trypsinization and collect cells in a 15 ml conical tube (cells confluency should not be higher than 80%). Use 106 cells per transfection point. The volume corresponding to the number of cells you aim to transfect (e.g. to perform 5 electroporation points, 5  106 cells are needed) is transferred in a 15 ml tube and centrifuged at 300 g for 5 min. Supernatant is discarded.  Resuspend cells in Cell Line NucleofectorÒ Solution V (LONZA) at the concentration of 106 cells/100 ll. 100 ll of cell suspension is added into Amaxa cuvettes containing 5 lg of TALENencoding plasmids and 5 lg of integration plasmid.  Electroporate cells (immediately after resuspending them in NucleofectorÒ) using the X-001 program. After electroporation, add 0.5 ml of pre-warmed (37 °C) complete medium to the cells and gently transfer them into 10 cm dishes containing 10 ml of pre-warmed (37 °C) complete medium before incubation at 37 °C in the presence of 5% CO2. 2.3.4. Transfection of mouse L cells by DNA electroporation L cells obtained from ATCC (CRL-2648), need to be cultured at 37 °C with 5% CO2 in DMEM Glutamax media (Life Tech, USA) supplemented with 10% FBS, penicillin (100 IU/ml) and streptomycin (100 lg/ml). On transfection day  Recover cells by trypsinization (at that point, cells confluency should not be higher than 80%), collect them in a 15 ml conical tube and prepare the amount of cells needed according to the protocol described for the U-2 OS cell line.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

4

J. Valton et al. / Methods xxx (2014) xxx–xxx

 Resuspend cells in Cell Line NucleofectorÒ Solution T at the concentration of 106 cells/100 ll. 100 ll of cell suspension is added into Amaxa cuvettes containing 5 lg of TALEN-encoding plasmids and 5 lg of integration plasmid.  Electroporate cells (immediately after resuspending them in NucleofectorÒ) using the T-020 program. After electroporation, use the protocol described for U-2 OS cells.

2.3.5. Transfection of hiPSC by DNA electroporation DEF-iPS™ (Cellectis Bioresearch)2 are maintained in DEF-CS™ defined culture system. On transfection day  Collect hiPSC using TrypLE™ Select dissociation agent (Life technology).  Electroporate cells with TALEN-encoding plasmid and integration matrix according to Human Stem Cells NucleofectorÒ kit 2 protocol (Lonza) and cGPSÒ custom iPSC full kit (Cellectis bioresearch).  Transfer cells into pre-coated 10 cm dishes containing prewarmed (37 °C) complete DEF-CS™ medium before incubation at 37 °C, in the presence of 5% CO2.  Add neomycin selection few days after electroporation as described in the cGPSÒ custom iPSC full kit (Cellectis bioresearch). Add ganciclovir after NeoR colonies picking to remove random or multicopy integration of the matrix within the host genome.

2.4. Selection protocols Selection of transfected cells harboring stable integration of DNA matrix could be done via chemical selection (using neomycin and ganciclovir) or via single cell sorting using flow cytometry. These two approaches, successfully used in examples 4–5 and 6 respectively, are delineated below.

2.4.1. Clonal selection of L cells harboring stable integration at the ROSA26 locus using neomycin and ganciclovir Following the transfection protocol of L cells described earlier, one should select L cells harboring stable integration using the selection cassette present in the integration matrix according to the protocol described below.  1 day post transfection, replace medium with fresh complete medium.  3 days post transfection replace medium with fresh complete medium supplemented with 0.6 mg/ml of G418 and perform such replacement every 2 or 3 days during 7 days.  8 days post transfection, replace medium with fresh complete medium supplemented with 0.6 mg/ml of G418 and 50 lM of ganciclovir and incubate cells for a total of 5 days.  13 days post transfection, replace medium with fresh complete medium supplemented with 0.6 mg/ml of G418 only.  17 days post transfection, resistant clones are picked and re-arrayed in 96 well plates.  22 days post transfection (5 days after picking), 96 well plates are duplicated: one plate is used for maintenance, the other one is used for genomic DNA extraction and PCR screening. After such selection, identification of relevant clones should be done by PCR screening as described in Section 2.5.2. 2

DEF-CF™ and DEF-hiPS™ are trademarks owned by Cellectis AB.

2.4.2. Screening for GFP positive cells and clonal selection by single cell sorting using FACS To identify cells harboring stable incorporation of GFP in frame with the gene of interest (GOI), a screening procedure was developed and successfully applied in HeLa and U-2 OS cell lines. The detailed protocol of such procedure is described below.  2 days post transfection (performed using AMAXAÒ, kitV, pg X001 for U-2 OS cells and FuGENEÒ kit for Hela cells), trypsinize cells transfected with GFP-encoding plasmid control and analyze them by flow cytometry to determine transfection efficiency. At least 50% of transfection efficiency is recommended to move further.  Culture cells (sample cells and negative control) for two weeks using the regular culture protocol.  Check for fluorescence in both cultures regularly until the negative control loses fluorescence. Negative cells control should become fluorescence-negative within 15 days. Please note that, at this stage, it is recommended to freeze vials of sample cells as a backup.  Recover the remaining fluorescent sample cells and use them for selection process using either single cell sorting (FACS, recommended). Fluorescence microscope or manual clone picking could be alternatively used for that purpose. Usually, single cell sorting in two or three 96 well plates is sufficient for the next steps.  Perform single cell sorting according to the protocol provided by the FACS facility and recover 2–3 96 well plates.  Grow fluorescent single cell colonies for about 10 days. Depending on the plating efficiency of the cell line, this will result in about 5–20 clones per 96 well plate.  Re-array the growing clones in a new 96 well plate. Please note that, at this stage, it is recommended to generate two daughter plates from the aforementioned re-arrayed plate. The first one will be used as a frozen back-up while the second one will be used for ganciclovir selection. 2.5. Screening protocols 2.5.1. Screening using PCR products bandshift TALEN-mediated NHEJ events result in insertions or deletions at the target endogenous locus. These events could be detected by performing PCR of the locus followed by high resolution polyacrylamide gel separation (PAGE). Such process provides an easy to implement approach to detect clones of interest. The efficacy of the process relies on the ability to detect the smallest deletion possible (between 5 and 10 bp). This level of resolution is achievable with PCR amplicons of about 100–150 bp and having a high resolution gel system. We have successfully applied this approach with 150 bp amplicons to detect as low as 5 bp deletion on polyacrylamide gel. QIAxcel technology (QIAGEN) has been successfully used to detect down to 10 bp deletion on a 500 bp amplicon. The sensitivity of PCR products bandshift assay is expected to be around 1–5% of indels. 2.5.2. Screening by T7 Endonuclease assay After transfecting your biological system with the TALENencoding plasmids mix, a T7 Endonuclease assay can be performed to estimate the mutagenesis activity of the TALEN.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

5

 Extract genomic DNA (gDNA) 3 days post transfection using any appropriate gDNA extraction kit (e.g. QIAGEN, QIAamp DNA Mini Kit [50], cat. 51304).  Amplify the TALEN targeted locus by PCR using at least 200 ng of gDNA and a high proof reading polymerase (e.g. Invitrogen, PlatinumÒ Taq DNA Polymerase High Fidelity, cat. 11304-011).

 Extract genomic DNA and amplify the TALEN target locus according to the procedure described for the T7 Endonuclease-1 assay.  Purify PCR amplicons and digest them using an appropriate restriction enzyme.  Run digestion sample on agarose gel.

Please note that the PCR size should be around 350–800 bp (we generally obtain much better end-result using long PCR amplicons) and should be verified by sequencing to ensure the presence of the targeted locus and the absence of single nucleotide polymorphism. In addition, design the PCR primers so that the TALEN recognition site is roughly in the middle of the PCR amplicon in order to obtain two different sizes of cleaved PCR products. After PCR amplification, check the PCR amplicon using agarose gel analysis and purify it using AMpure beads (Beckman Coulter, cat. A63880).

Sensitivity of restriction test is expected to be around 1% of indels for NHEJ-mediated gene inactivation and around 1% of insertion for HR-mediated gene inactivation.

 Melt and re-anneal PCR amplicons using the protocol below.

 Amplify genomic DNA using two primers containing adaptor sequences. One of these two primers should comprise two specific short sequences (i) a key sequence for internal calibration and (ii) a multiplex identifier (also known as MID) that is unique for each PCR product. This MID serves as a ‘‘bar-code’’ for each locus-specific PCR product and thus allows parallel processing of multiple different samples (multiplexing). Depending of the minimal number of reads that are desired per sample (often >1000 to detect at least a frequency of mutagenesis events superior to 0.01%), up to 20 different samples are commonly mixed.  Purify PCR amplicons by AMPure kit (invitrogen).  Analyze purified PCR amplicons using 454 sequencing system.

Add 50 ng of the purified PCR amplicon in 19 ll total volume of NEB2 1X buffer (New England Biolabs, cat. B7002S) and melt/re-anneal them using a PCR machine and the following protocol: 95 °C 95 °C 85 °C 22 °C

for 10 min. to 85 °C (–3.0 °C/s). to 25 °C (–0.3 °C/s). hold.

 Digest the annealed PCR amplicons using the protocol below. Add 1 ll of T7 Endonuclease-1/well (New England Biolabs, cat. M0302L). Incubate for 15 min at 37 °C. Stop the reaction by adding 2 ll of 0.25 M EDTA. Purify digested PCR products with the AMPure kit (Beckman Coulter, cat. A63880). Elute with 20 ll of TE buffer.  Analyze digested products by SDS PAGE. Mix purified PCR digestions with an appropriate amount of loading buffer. Load 1/4 of the purified digestion on a 10 or 15% acrylamide gel (BioRad, cat. 456-5055). Run the gel for 1 h and 30 min in 1 TBE at 200 V. Color the gel in SYBRÒ Green 1X (Sigma Aldrich, cat. S94301ML) in TBE 1X for 20 min in the dark. Visualize the DNA bands using a UV transilluminator at 250–300 nm. Quantify the intensity of the different products using appropriate image analyzer software (such as ImageJ, available at rsbweb.nih.gov/ij/).  Quantify the TALEN-induced mutagenesis. The amount of indels induced by TALEN can be estimated according to the following formula: Intensity of PCR product cleaved by T7 Endonuclease-1/intensity of cleaved and uncleaved PCR product. Sensitivity of T7 Endonuclease-1 assay is expected to be around 1–5% of indels. 2.5.3. Screening by restriction test After transfecting your biological system with the TALENencoding plasmids mix, a restriction test can be performed on the locus-specific PCR amplicons to estimate the mutagenesis activity of the TALEN.

2.5.4. Screening by high throughput DNA sequencing (deep sequencing) The analysis of the targeted mutagenesis frequency due to the TALEN activity as well as the molecular characterization of the individual events is performed by deep sequencing of locus specific PCR amplicons using the 454 sequencing system.

Depending on the number of sequence analyzed (from 1000 to 10,000), sensibility of deep sequencing analysis is expected to be ranging from 0.01% to 0.1%. 2.5.5. Screening by flow cytometry analysis 2.5.5.1. Detection of TCRa deficient Jurkat cells by flow cytometry. 3 days post transfection  Harvest about 105 Jurkat cells by centrifugation for 50 at 300 g.  Resuspend cellular pellet in 30 ll of a PE-conjugated anti-TCRa/ b antibody solution (1:30 in 2%FBS PBS) and incubated for 15 min at 4 °C in the dark (Miltenyi Biotech cat # 130-091-236).  Wash the cells by adding 120 ll of 2% FBS/PBS and centrifuged for 50 at 300 g. Then resuspend the pellet in 100 ll of 2% FBS/PBS and analyze them immediately by FACS.  Perform FACS analysis using a MACSQuant Analyzer flow cytometer (Miltenyi Biotech). Viable cells could be gated on the basis of their morphological characteristics (FSC and SSC parameters) and determine the efficiency of gene inactivation by analyzing the fluorescence intensity on the PE channel. Depending on the number of cells analyzed, the flow cytometry screening sensitivity is expected to range from 1% to 5% of TCRa deficient cells. 2.5.5.2. Isolation of TCRa deficient Jurkat cells using magnetic separation. To purify TCRa negative cells, the cell population generated upon TALEN electroporation need to be labelled with CD3 MicroBeads and loaded onto a MACSÒ LD-Column placed in the magnetic field of a MACS Separator (all from Miltenyi Biotech). The magnetically labeled CD3 positive cells are retained in the column while the unlabeled cells run through. The recovered fraction will then constitute a high purity population of TCRa negative cells.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

6

J. Valton et al. / Methods xxx (2014) xxx–xxx

2.5.5.3. Magnetic labeling of TCRa deficient cells.  Centrifuge 107 cells at 300 g for 50 and resuspend the cellular pellet in 80 ll of labeling buffer (PBS pH 7.2, 0.5% BSA and FBS and 2 mM EDTA).  Add 20 ll of CD3 MicroBeads, mix and incubate for 150 at 4 °C.  Wash cells with labeling buffer, centrifuge for 50 at 300 g and resuspend the pellet in 500 ll of labeling buffer.

2.5.5.4. Magnetic separation of CD3-labeled TCRa deficient cells.  Place the LD-Column on the MACS Separator and rinse with 2 ml of labeling buffer.  Apply cell suspension onto the column  Collect unlabelled cells by recovering flow through, wash column with 2  1 ml of labeling buffer and recover the flowthrough.  Centrifuge cells at 300 g for 50 , resuspend the cellular pellet in complete RPMI medium and grow them in their classical cellculture conditions. 3. Results and discussion 3.1. General rules for TALEN designs 3.1.1. Overview of TALEN architecture The general TALEN architecture consists of the fusion of a custom TALE DNA binding domain linked to the N-terminal end of the non-specific FokI nuclease domain. Because FokI needs to dimerize to perform a double strand break (DSB), TALEN work by pairs. Each pair unit binds to adjacent binding sites that are separated by a spacer sequence and respectively located on the sense and antisense strand (Fig. 1). Each binding site are generally composed of 14–19 nucleotides and must start by thymine, also called T0, known to be absolutely require for efficient TALE/DNA binding [1,26,27]. Because FokI is non-specific, the sequence of the target spacer does not affect TALEN activity. However, as explained below, its length is critical and greatly depends on the TALEN scaffold variant used by investigators. 3.1.2. Choose the TALEN scaffold you want to work with and look for its optimal target Since the past three years, numerous TALEN scaffold optimizations have been reported to improve TALEN activity. Such scaffold optimizations usually consisted in the reduction of N- and C-terminal domains and, to a lower extent, in the utilization of obligatory heterodimer FokI mutants. Whereas TALEN activity is not significantly enhanced by N-terminal truncations, it was reported to be markedly improved by several different C-terminal domain truncations [9,16,28,29]. Among the mostly widely used ones are the truncations leaving the first remaining 18, 23, 28, or 63 aminoacids of the C-terminal domain. Activity of each of these C-terminal TALEN scaffold variants is intimately linked to the length of the target spacer and optimal combinations are thoroughly documented

[9,16,28,29]. Among others, we have developed two different scaffolds encompassing the well documented N-terminal truncation named ND152 and two C-terminal truncations leaving the first remaining 11 or 40 aminoacids. These two scaffolds named ND152/C11 and ND152/C40, were found optimal for 11 and 15 bp target spacers respectively and their success rate is similar to those already reported in the literature. Indeed, among 249 TALEN tested in HCT116 mammalian cells and analyzed by deep sequencing, 75% displayed more than 1% of insertion and deletion (indels) events with a mean and maximum of activity of 14% and 46% of indels respectively. Thus, the take home message is: when a given TALEN scaffold variant is chosen, the first parameter to set is the length of the target spacer.

3.1.3. Be aware of the rules to design a TALEN and know their limits When an optimal TALEN scaffold/target spacer combination is chosen, additional rules/parameters should be taken into account to maximize the success rate of gene editing. Several rules were reported to modulate TAL effector and TALEN efficiency. On the basis of computational analysis of natural TAL effector bound to their cognate target as well as yeast assays, Cermak et al. proposed to avoid TALEN target sites bearing T and A at positions 1 and 2 of the TALE DNA binding domain respectively. They also advised to keep a T at the last position of the target site [29]. In addition, based on a surrogate transcription assays, Streubel et al. reported that target site should contain at least three to four repeats that bind C or G, whereas stretches of repeats that interact with A or T should be avoided [30]. These different rules were challenged by a large scale in vivo assessment of TALEN activity in U-2 OS cells and in zebrafish using 48 and 34 different TALEN respectively [15,31]. These studies failed to detect any correlation between the rules delineated above and TALEN activity. Accordingly, by investigating more than 900 different TALEN/DNA combinations, we couldn’t observe any significant correlation between TALEN activity and the presence of a T in the last position of TALEN target site (Supplementary Fig. 2A). In addition, even though significant, we found a poor correlation (cor = 0.12, p value = 0.01) between TALEN activity and the amount of stretch containing A and T polynucleotide (P6 nucleotides, Supplementary Fig. 2B). However, we found that their activity was affected by the presence of at least 5 identical and consecutive nucleotides within the TALEN binding site, independently of the nature of TALE repeat (Supplementary Fig. 2C). In addition, we found that the presence of NI-NI di-repeat (targeting AA dinucleotides) at the first and second position of the TALEN DNA binding domain could reduce the TALEN activity [32]. Even though some active TALEN have been reported to contain such sequences in their target site [33] they could be suboptimal. We thus prefer to exclude them from our hit search to maximize chances to generate highly active TALEN. Thus, for a given genomic locus, none of our TALEN target hits contain AA di-nucleotides as well as stretches of consecutive and identical nucleotides (P5 nucleotides stretch).

Fig. 1. General TALEN architecture. Schematic representation of the left and right TALEN bound to its DNA target consisting in left and right target separated by a spacer sequence. The canonical RVD/nucleotide association is indicated on the right part of the figure.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

3.1.4. Consider the epigenetic status of the locus to edit and tune your TALEN design accordingly Although successfully used to engineer genomes from different cellular systems, we and other have recently demonstrated that engineered TALE DNA binding capacity could be significantly affected by the presence of 5-methylated cytosine (5mC) in their endogenous cognate target [16,31,34–37]. Such sensitivity has been rationalized by different structural studies related to TALE/ DNA complexes [38]. They indeed indicated that methyl moiety of the 5mC was likely to generate a steric clash with TALE repeat HD, commonly used to target cytosine. Preliminary data showed that TALEN sensitivity to 5mC depends on the targeted DNA sequence. We indeed found that the basal activity of conventional TALEN toward mono and di-methylated loci was variable [36]. Such sensitivity also depends on the amount of 5mC present within the target site. Different alternative methods could be used to bypass the inhibitory effect of cytosine methylation. 3.1.4.1. Seek CpG free TALEN targets. The first obvious alternative is to choose a TALE DNA binding site free of any CpG. This method was successfully used by Kim et al. to process the two methylated loci CYP27B1 and FGFR3 present in mammalian HEK293T cells [16]. Such method could be readily applied to differentiated mammalian cells. However, one should be aware that it is less likely to be applicable in other species such as those related to the plant kingdom, where 5mC is not only present in CpG but also in CpA, CpC and CpT dinucleotides. 3.1.4.2. Use demethylating agents. When one has no other option but to target a heavy methylated locus, a second alternative strategy, relying on the utilization of the demethylating agent such as 5-aza-dC, could be used. This approach was found to enhance the ability of dTALE [39] and TALEN to respectively activate the transcription of oct4 pluripotency gene or to process different methylated loci [16,36]. However, when opting for such approach, one should be aware of the well-known pleiotropic cytotoxicity of demethylating agent [40] that may alter genome integrity and probably, cell fate. Utilization of demethylating agents could thus be risky for therapeutic and biotechnological gene editing applications that require high degree of safety. 3.1.4.3. Design TALE DNA binding domains insensitive to 5mC. When using demethylating agent is proscribed, a third alternative approach could be considered. It consists in the utilization of engineered TALE DNA binding domains insensitive to cytosine methylation. Their engineering is based on the substitution of the conventional TALE repeat HD by other naturally occurring TALE repeat harboring affinity for 5mC. Among the TALE repeat tested, NG was found in vitro and in vivo to efficiently accommodate 5mC and, as expected, displayed a much lower affinity for C [36,37]. Such property implies that NG TALE repeat could be used only if the locus to process is known to be methylated. Thus for successful utilization of NG TALE repeat, the methylation status of the locus to target must be determined by bisulfite sequencing before designing the appropriate TALE DNA binding domain. The TALE repeat N⁄ has also been shown to accommodate 5mC [36]. Whereas conventional TALE repeat harbor 34 aminoacids, N⁄ is composed of 33 aminoacids long. Such non-canonical sequence is translated into a peculiar RVD loop structure that extend less deeply into the DNA major groove, and allows it to accommodate 5mC [36]. Interestingly, in contrast with NG TALE repeat, N⁄ was found to proficiently accommodate C to a similar extent than 5mC. The ability of TALE repeat N⁄ to successfully accommodate 5mC has been assessed at multiple loci containing one or two 5mC at different positions of the TALEN binding site. We indeed reported that

7

5mC at positions +2,+3,+6 and +15 was successfully accommodated by TALE repeat N⁄ and found similar results with regard to the positions +1,+4,+8 and +9 (unpublished data). In addition HD substitution for N⁄ did not increase cellular toxicity of TALEN under our experimental conditions. Thus, to design a methyl insensitive TALE DNA binding domain, one could substitute HD by N⁄ at any positions of the TALE DNA binding site. To our knowledge, up to 2 HD/N⁄ substitutions could be perform per TALEN arm. However, beyond that limit, the benefit of such substitution still needs to be assessed. Other naturally occurring and engineered TALE repeats could be used to accommodate 5mC. Generation of TALEN insensitive to DNA methylation will be illustrated later on in example # 1. 3.1.5. Assess potential offsite target sequences When a set of relevant target has been selected, potential offtarget sites should be carefully identified. To do so, several webbase off-target sites mapping software are available [41–44]. We systematically perform this analysis using a proprietary software and exclude any TALEN showing off-target sites containing less than a total of 5 mismatches with respect to the recognition site and a spacer length ranging from 9 to 30 bp. These ‘‘high risk’’ off-target sites are rare on coding sequence but could be found especially when pseudogenes are present or when the target sequence corresponds to a recurrent protein domain (i.e. highly conserved catalytic site family such as kinase active sites, as a non exhaustive list). 3.2. TALEN-mediated gene inactivation As indicated earlier, the binding of TALEN to its endogenous target, promote the formation of an efficient FokI dimer that catalyzes a DSB approximately at the center of the target spacer. Such DSB is recognized and repaired by two different pathways named nonhomologous end joining (NHEJ) and homologous recombination (HR). As explained and exemplified below, these two endogenous pathways can be exploited to create sequence alterations at the DSB site. For gene inactivation purposes, each of them has pros and cons and should be chosen depending on the end project, screening capacity and timelines. 3.2.1. TALEN-mediated gene inactivation via error-prone NHEJ pathway In the vast majority of cases, the NHEJ pathway is usually harnessed by investigators to perform TALEN-mediated gene inactivation. A site specific DSB is generated within the coding sequence of a gene of interest (GOI) and is eventually repaired imprecisely via insertions or deletions (indels) of multiple nucleotides (Fig. 2A). Such indels usually lead to gene inactivation via incorporation of premature stop codons or by changing translation frame. Several easy to implement screening strategies could be employed to determine the extent of indels created at the target site (Fig. 2C). Among them are the T7 Endonuclease-1 assay (and its counterpart CEL1 Surveyor assay [45] the site specific restriction test [46] and the high resolution melting analysis (HMRA, [47]. All of them rely on a locus specific PCR amplification followed by an indirect analysis of PCR amplicons homogeneity. One the one hand, the T7 Endonuclease-1 assay and HRMA detect either enzymatically (cleavage) or physico-chemically (melting curve), mismatched amplicons duplex generated after a denaturation/ renaturation process. On another hand, the restriction test detects the loss of an endogenous restriction site located within a given TALEN target site. While signals generated by these three methods allow for quantitative or semi quantitative assessments of indels, they do not allow determination of their precise molecular signature. Such molecular signature can be however alternatively

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

8

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 2. General TALEN-mediated gene inactivation strategies and their associated genotypic screening methods. (A) and (B) Schematic representation of TALEN-mediated gene inactivation via error-prone NHEJ and via HR-mediated pathway respectively. (C) and (D) Screening strategies used to indirectly assess the extent of indels generated at the TALEN target site. Expected profiles of T7 Endonuclease-1 assay and restriction test obtained after analyzing wild-type (WT) or mutant (M) cell population via electrophoresis. The expected melting curve patterns obtained by HRMA are also shown. (E) Example of deep sequencing analysis of PCR amplicons obtained from cell population obtained after TALEN-mediated gene inactivation. (F) Legend describing the different genetic elements illustrated in the overall figure.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

defined and quantified by high throughput DNA sequencing (also known as deep sequencing) of PCR amplicons (Fig. 2E). Beside the genotypic screening methods delineated above, in some particular cases, one can also directly assess the extent of gene inactivation by phenotypic screening. Such phenotypic screening will be illustrated in the example # 2 (TALEN-mediated TCRa inactivation via error-prone NHEJ). 3.2.2. TALEN-mediated gene inactivation via HR pathway The general approach delineated above to specifically inactivate GOIs, rely on the generation of imprecise indels at a specific locus. However, because the nature of indels cannot be controlled, small insertions or deletions of 3 bp or multiple of 3 bp could be generated without affecting the translation frame or generating stop codon. In these cases, such events could still lead to active protein translation and there is no guarantee for gene inactivation. There is however an alternative approach to fully control inactivation of a protein function. It consists in inserting stop codons downstream from its initial start codon (Fig. 2B). This approach, named HRmediated gene inactivation in the following, can be achieved by using a TALEN specific for the locus to inactivate along with a single strand oligonucleotide (ssODN) integration matrix bearing multiple out of frame stop codons flanked by homology arms. In this particular case, the double strand breaks generated by the TALEN could be harnessed to insert the oligonucleotide via HR. By this mean, insertion of stop codons at a desired locus allows controlling precisely the molecular events altering gene translation and function. Such methodology will be exemplified later on in example #3. Two different strategies could be considered to easily screen for ssODN insertion at the intended target site. They both require a PCR amplification of the edited locus. Once generated, PCR amplicons size could be determined by a simple agarose or polyacrylamide gel and compared to those generated from untreated cells (Fig. 2D). Alternatively, one could introduce a restriction site adjacent to the stop codons present in the ssODN, and use it to perform a restriction test of the PCR amplicon. In that case, appearance of digestion product would account for a locus specific insertion of ssODN. Deep sequencing analysis of PCR amplicons could also be performed to verify the integrity of the ssODN sequence inserted as well as sequences adjacent to the TALEN target site (Fig. 2E). Again, beside the genotypic screening methods delineated above, a direct assessment of gene function inactivation could be done via a phenotypic screening. 3.2.3. Relevant genomic locus for efficient TALEN-mediated gene inactivation via NHEJ or HR pathways When a given gene inactivation methodology has been chosen (NHEJ or HR-mediated inactivation), the genomic organization of the locus of interest should be investigated in silico before seeking relevant TALEN targets. It includes identification and localization of key genomic elements such as start and stop codons, exons and introns and important domains for protein function. Several databases could be used to perform such analysis. Among the most documented ones are Uniprot, Ensembl and Encode databases. Once identified, these key genomic elements could be exploited to choose relevant TALEN target sites in an educated manner. Depending on the amount of information gathered in silico, different target sites could be chosen. It includes target sites located at the start codon or in the downstream proximal region, in domains important for protein function, or in the exon/intron interface regions (Fig. 3). Choosing a TALEN target site close to the start codon of a gene of interest is the most obvious design to inactivate it. Such approach is likely to promote disruption of the start codon or to generate a translation frame shift and premature stop codons that should in

9

theory, lead to a successful gene inactivation. Since the double strand break (DSB) induced by TALEN occurs in the middle of the target spacer (cf Fig. 1), the TALEN target site center should be located slightly downstream the start codon to maximize the odds of frame shift generation. It is however noteworthy to mention that, in some particular cases, generating a frame shift close to a start codon could not be the best strategy to inactivate a gene of interest. Indeed, if a start codon is present immediately downstream from frame shift induced by the TALEN, it could be used as an alternative transcription start site leading to the production of truncated protein isoform [48]. Indeed, alternative transcription initiation is known to be a natural mechanism allowing the generation of several eukaryotic protein isoforms [49]. Thus, to avoid such issue, we usually check for the presence of alternative start codons within the first 120 bp of the transcript and, when it is affordable, choose a TALEN target site located immediately downstream from it. In addition, when the GOI has several transcripts that do not share the same start codon, we try to set the TALEN target on the first common coding exon. Alternatively, one can choose a TALEN target site located in different subdomains identified to play key roles in the catalytic, folding and interaction properties of the protein of interest, as well as those involved in the recruitment of key protein partners as a non exhaustive list. Again, designing a TALEN able to bind upstream or on those sequences could lead to generation of truncated and inactive protein. However, when opting for such strategy, one should be aware that the redundancy and high % of identity of such domains (i.e. kinase active site, SH3/SH2 docking sites as examples) may lead to potential offsites activity of TALEN as discussed earlier. Finally, one can choose a TALEN target site at the exon/intron interfaces born by the gene of interest. Targeting such regions is highly likely to disrupt donor and acceptor splice sites, eventually leading to aberrant splice variants and to the production of inactive protein. Noteworthy again, when a GOI displays several different transcripts, such target site should be common to all of them to maximize the efficiency of TALEN-mediated gene inactivation. 3.2.3.1. Example #1. TALEN-mediated inactivation of XPC gene using methyl insensitive TALE DNA binding domain (genotypic screening). To exemplify the TALEN-mediated inactivation of methylated genes, we chose to describe the methodology we set up to generate methyl insensitive TALEN able to inactivate the monomethylated XPC locus. The rational design and/or selection of methyl insensitive TALE repeats is also developed. At first, the methylated status of XPC locus prevented us to find relevant 5mC-free TALEN target close to the locus to edit. We thus identified a set of target bearing the least amount of 5mC, selected the XPC1 target containing one 5mC at position +2 (Fig. 4A and Supplementary Table 3) and undergone the design of methyl insensitive TALEN. Our design rational was based on the hypothesis that substituting TALE repeat HD by a non-canonical TALE repeat harboring a dual specificity toward C and T would generate TALE DNA binding domain able to accommodate 5mC. We envisioned that such dual specificity would allow the TALE repeat to cope with 5mC because of its structural similarities toward C and T. We thus selected such TALE repeats from the collection of natural and engineered TALE repeats (N⁄, H⁄, NG, HG (27, 50) and Q⁄, T⁄ from Cellectis proprietary collection) and inserted them at the position +2 of the Left TALE DNA binding domain of XPCT1 TALEN, designed to target the XPC1 endogenous methylated sequence (Fig. 4A). TALEN variants (Supplementary Tables 1–2) were first tested for their ability to cleave the unmethylated XPC1 target using a yeast single strand annealing assay (SSA, see M&M section). The same test was performed with 3 different XPC1 target isoforms bearing either A, G or T at position +2, to indirectly define the

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

10

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 3. Choice of TALEN target sites to inactivate a gene of interest.

overall specificity pattern of non-canonical TALE repeat variants (Fig. 4B). Our results showed that all TALEN harboring X⁄ TALE repeats ⁄ (Q , H⁄, T⁄ and N⁄) showed an activity bias toward T while retaining significantly high activity toward A, C and G. TALEN harboring XG TALE repeats (NG and HG) also showed strong activity toward T but a much lower one toward A, C, and G. These activity patterns contrasted with the one obtained with the HD control variant that lacked activity toward T and G while displaying strong and medium activities toward C and A respectively (Fig. 4B). TALEN variants were then tested for their ability to disrupt the endogenous methylated XPC1 target in 293H cells via error-prone NHEJ. 293H cells were independently transfected with the different XPCT1 TALEN variants and three days post transfection, genomic DNA was extracted and amplified by PCR using XPCT1 locus specific primers. High throughput DNA sequencing analysis of PCR amplicons showed that the conventional HD TALEN variant displayed negligible amount of indels consistent with its inability to accommodate 5mC. In contrast, all the XG and X⁄ variants were able to process XPCT1 target with a notable advantage for the latter ones (Fig. 4C). These data indicated a correlation between the ability of TALE repeat to accommodate 5mC and their broad specificity toward A, C, G and T in favor for the latter one. Together, our results showed that utilization of 5mC-insensitive TALEN enables efficient methylated gene processing. 3.2.3.2. Example # 2. TALEN-mediated TCRa inactivation via errorprone NHEJ (phenotypic screening). In some particular cases, inactivation of certain proteins can be assessed phenotypically few days post transfection without having to rely on genotypic screenings. To exemplify such phenotypical screening, we chose to describe the methodology we used to inactivate the T cell receptor alpha chain (TCRa). TCR is a disulfide-linked membrane-anchored heterodimer consisting of the two highly variable chains (a and b) embedded in a multimeric complex involving CD3 chain molecules. When one of the highly variable chains is knocked-out, oligomerization and plasma membrane addressing of TCR are impaired. Such impairment results in the depletion of the cell surface TCR complex that can be easily quantified by flow cytometry using TCR specific fluorescent antibodies. To inactivate the alpha chain of TCR receptor, we first designed and generated a pair of TALEN targeting the first exon of the constant segment of the TCR gene (Fig. 5A, Supplementary Tables 1– 3). Such design allowed targeting every T-cell, independently of their specificity, since the targeted sequence is a common exon of all mature TCR molecules. Plasmids encoding right and left TALEN were electroporated in Jurkat cells using Agilepulse technology as described in M&M. 3 days post electroporation, cells were recovered and analyzed by flow cytometry using TCRab specific fluorescent antibodies. Our results showed that about 10% of cells lacked surface expression of the TCRab complex 3 days post electroporation (Fig. 5B). Such success rate could be improved by

factor 3 by means of a second electroporation performed 4 days after the first one. Indeed, by doing so, the efficiency of TCRa inactivation increased from 10% to about 28%. To purify TCRa deficient cells, the cell population obtained after TALEN electroporation was labeled with CD3 MicroBeads, loaded onto a MACSÒ LD-Column and then placed in the magnetic field of a MACS Separator (all from Miltenyi Biotech). Under these conditions, the magnetically labeled CD3 positive cells are retained in the column while the unlabeled cells (TCRa deficient cells) run through. The flow through was recovered and analyzed by flow cytometry using the protocol described in the M&M section. Our results indicate that TCRab negative cells were successfully purified to homogeneity (Fig. 5B). Taken together, our results showed that TALEN-mediated processing of TRAC followed by a simple purification process, enabled to generate and isolate a homogenous population of TCRa deficient lymphoid cells within about 10 days. 3.2.3.3. Example #3. TALEN-mediated GSK3b inactivation via HR (genotypic screening). The HR-mediated gene inactivation approach was used to precisely insert 4 stop codons downstream the initial ATG of GSK3b gene, a serine–threonine kinase used here as a model target. To do so, we designed a ssODN encompassing right and left homology sequences (50 nucleotides each) flanking a 26 nucleotides sequence containing 4 out of frame stop codons (Fig. 6A). For screening purposes, a NotI restriction site was inserted upstream from stop codons. HCT116 cells (a colorectal cancer cell line) used here as cellular model, were transfected with 5 lg of TALEN right and left encoding plasmids along with 100 pmol of ssODN matrix using FuGENEÒ. Three days post transfection, genomic DNA was extracted and used to perform a locus specific PCR. PCR amplicons (495 bp) were then subjected to restriction test using NotI endonucleases and eventually separated via electrophoresis using agarose gel (Fig. 6B). Restriction results showed that cells transfected with TALEN and ssODN displayed two restriction bands that were not present in the two control experiments obtained from cells transfected by TALEN or ssODN alone. This result indicated that the ssODN matrix was successfully inserted at the intended locus in the presence of GSK3b TALEN. To confirm ssODN insertion at the GSK3b locus, a gel shit assay was performed. To do so, the GSK3b locus was amplified with a second pair of oligonucleotides to generate a short PCR amplicon (122 bp), allowing discriminating wt versus mutants on a regular agarose gel (Fig. 6C). Results showed the appearance of an additional band of higher molecular weight in cells transfected by TALEN and ssODN, again consistent with successful insertion of ssODN matrix at the GSK3b locus. Such insertion was further characterized by Deep sequencing of locus-specific PCR amplicons. Deep sequencing results revealed that among approximatively 1400 sequences analyzed, 3.2% contained the expected additional 26 bp sequence (NotI-stop-X-stop-X-stop-X-stop, Fig. 6D), a frequency suitable for easy identification and isolation of clones inactivated for

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

11

Fig. 4. Ability engineered and naturally occurring TALE repeats to overcome TALE DNA binding domain sensitivity to 5-methylated cytosine. (A) Schematic representation of the XPCT1 TALEN variants used for this study. Sequence of the XPCT1 DNA target is indicated, the 5-methylated cytosine (5mC) located at position +2 of the left target is indicated by a dot and the 11 bp spacer between the right and left targets is colored in blue. Different natural and engineered TALE repeats were inserted at the position +2 of the TALE DNA binding domain Left to assess their ability to accommodate C, A, G, T in yeast and 5mC in mammalian 293H cells. (B) Intrinsic nuclease activity of XPCT1 TALEN variants investigated by single strand annealing assay (SSA) in yeast using a extrachromosomal XPC1 targets bearing C, A, G or T in position +2. Activity measurements were used to generate a logo that represents the relative activity of a given TALEN variant toward the four different target isoforms. (C) Frequency of indels induced by 10 lg of XPCT1-HD, NG, HG, N⁄, H⁄, Q⁄ and T⁄ TALEN encoding plasmids in 293H cells, determined by deep sequencing of locus specific PCR amplicons. The results shown in this figure were obtained from a number of experiments P2.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

12

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 5. TALEN-mediated TCRa gene inactivation. (A). The TCRa genomic locus (TRA) is shown, with the different variable (TRAV), diversity (TRAD), joining (TRAJ) and constant (TRAC) exons indicated in vertical bars, as well as the position of first exon of the constant segment targeted by the TALEN (cyan). (B) Flow cytometry analysis of Jurkat cells electroporated once or twice in the presence or in the absence of TALEN™. Viable cells were gated on the basis of their morphological characteristics (FSC and SSC parameters), and the efficiency of TCRa inactivation was determined by analyzing the fluorescence intensity upon labeling with a PE-conjugated anti-TCRab antibody. Efficiencies of inactivation obtained 3 days after the first or second electroporation (EP) are indicated along with the transfection efficiency determined via electroporation of a BFP encoding control plasmid. (B) Lower panel, flow cytometry analysis of TCRa deficient cells purified by negative selection using anti-CD3 magnetic beads.

GSK3b. Interestingly, small indel events were also observed to similar or lower frequencies, indicating a competition between NHEJ and HR-dependant DNA modification events, slightly in favor of the latter one. To isolate clones inactivated for GSK3b, the transfected cell population was seeded in cloning conditions, onto a 10 cm dish at 300 cells/plate and allow to grow for 10 days. Individual colonies

were then picked and screened using the locus-specific PCR restriction test described above to identify mono- or bi-allelically inactivated clones. Our results showed that among 96 colonies screened, 3 displayed positive restriction test. Further characterization of these 3 clones by deep sequencing allowed us to isolate one clone harboring a bi-allelic gene inactivation, thus representing a success rate of total inactivation of 1%. Taken together, our results

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

13

Fig. 6. TALEN-mediated GSK3b gene inactivation. (A) Sequence of the GSK3b genomic locus targeted by the GSK3 TALEN and of the ssODN used to inactivate it via HR. The left (gray) and right (cyan) homology sequences flanking a 26 nucleotides sequence containing the NotI restriction site and 4 out of frame stop codons (upper case, red) are indicated. (B) Screening of transfected cells population by restriction of locus-specific PCR amplicons using NotI endonucleases. The expected sizes of digestion products obtained after ssODN integration at gsk3b locus are indicated. (C) Screening of transfected cell population by a gel shift assay. Short locus-specific PCR amplicons, obtained from cell transfected either by ssODN alone or in combination with GSK3 TALEN, were separated by electrophoresis. The size expected of amplicons obtained from wild-type (wt) and GSK3b inactivated cell population are indicated. (D) Deep sequencing analysis of locus-specific amplicons. The wild-type sequence is displayed on the top of the most frequent deletion (Del) or insertion (Ins) events observed. Frequency of indels occurrence are indicated.

showed that controlled gene inactivation could efficiently be achieved via HR by using of TALEN in combination with ssODN matrix bearing stop codon. 3.3. TALEN-mediated gene integration 3.3.1. Overview Gain of function studies requires the stable integration of a transgene bearing a GOI and the necessary elements to drive its expression (promoter, terminator). A plasmid bearing the GOI expression cassette is transfected into the host cells and usually, a second expression cassette containing a drug resistance gene is added to the plasmid, allowing the selection of integrants. Random integration has been widely used in the past decades but suffers from many drawbacks such as uncontrolled site of integration, uncontrolled copy number of transgene and instability of transgene expression. These different drawbacks were recently overcome by the development of engineered nucleases such as TALEN, which enable efficient targeted integration of a single transgene copy at specific loci of a given genome. Loci of interest for such applications are the so-called safe harbor loci (Fig. 7). They are defined as transcriptionally active genomic regions that can accommodate exogenous transgene expression cassettes without impairing the transcription of neighboring genes. Such characteristics were shown to be fulfilled by the human AAVS1, CCR5 and SH6 loci [51–53]. In addition, they were partially fulfilled by the mouse ROSA26 locus that has been extensively used as a site of choice for ubiquitous transgene expression [53]. Furthermore, the tremendous diversity of attainable sequences by the TALEN technology allows precise gene modification such as endogenous gene tagging (Fig. 7). Indeed, since it is possible to design a specific TALEN approximately every 5 nucleotides, the tar-

geted integration of a fluorescent protein in frame with the coding sequence of a given gene is nowadays easily achievable. Such targeted integration could be performed at the 50 or 30 ends of the coding sequence as well as inside some of its specific domains. Endogenous gene tagging displays different main advantages with respect to conventional methodologies that relied on the transient expression of a protein of interest fused to fluorescent marker. First, it does not require the antibiotic selection that was originally required to maintain ectopic plasmid throughout cell divisions. Second, it allows generating a tagged protein that is still under the control of endogenous regulatory elements including promoter, enhancer or insulator. Such physiological regulation of protein expression, make the fluorescence signal output more relevant to investigate protein function and localization. Finally, TALEN could also be used to promote gene replacement or gene correction (Fig. 7). Such applications are particularly useful to generate cell line models for drug screening purposes by introducing specific mutation associated with different diseases. It could be also used for gene augmentation or gene correction therapies as recently described [20–22]. In summary, 3 main types of targeted gene integration can be successfully promoted by TALEN. For the sake of space, only two of these approaches, the gene integration at a safe harbor locus and the gene tagging, will be develop in the next section. Their specific experimental design including integration matrix architectures and TALEN designs as well as screening strategies will be discussed and exemplified. 3.3.2. TALEN design for gene integration at safe harbor loci 3.3.2.1. Design of integration matrix. As a general design, a matrix used for targeted integration is a circular DNA plasmid containing a left homology arm, a sequence of interest and a right homology

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

14

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 7. General strategies of gene integration. Schematic representation of TALEN-mediated targeted gene insertion (top), targeted gene tagging (middle) and targeted gene replacement or gene correction (bottom). The different genetic elements potentially present in integration matrices, including promoter (prom), gene of interest (GOI) and resistance marker such a neomycin resistance (NeoR), are indicated.

arm. A suicide gene is generally added outside the homology arms to discard unwanted insertion events such as multi-copy targeted integrations or a combination of targeted and random integrations (Fig. 8A). The length of homologous sequences should be at least 500 bp on both side. It is highly recommended to sequence the genomic regions of homology of the host cell before any synthesis or cloning step, since the presence of mismatches in the homologous sequences could dramatically impair the efficacy of homologous recombination. Regarding the size of the sequence of interest to be integrated, we have been able to insert more than 8 kb of exogenous sequence without loss of targeted efficiency. Thus, two expression units can be easily integrated. As an example, a selection and a GOI expression cassettes could be inserted between the two homology arms. (Fig. 8A). Since the aim is to integrate an autonomous expression unit within a genomic region of about 200 bp (safe harbor locus), it is possible to design several different TALEN within this region and for that particular application, the TALEN of choice should be selected on the basis of its activity rather than on its position. According to the selected TALEN, the homology arms of the integration matrix should be then designed to accommodate the cleavage site. 3.3.2.2. Targeted integration screening strategies. To efficiently screen for correct targeted insertion events, it is important to set up locus specific left and right PCR screening assays (Fig. 8B and C). In addition, a third PCR screening assay, specific for a matrix domain that is not supposed to be inserted, should also be set up. Such screening is supposed to identify clones that encountered random or multicopy integration of the matrix. We usually amplify the matrix replication origin for such screening. To set them up, a construct mimicking the correct integration at the locus of interest should be generated (Fig. 8B). This construct should contain longer

homology arms (about 200 bp longer than the one present in the actual integration matrix) in order to test and select relevant screening primers pairs that will be used in combination with primers specific for the intervening sequence (Fig. 8B). To assess PCR screening efficiency and sensitivity, the construct is usually diluted in genomic DNA (serial dilutions from 10 copies to 0.001 copies per genome) and PCR is performed. A PCR screening protocol is considered relevant when one can detect at least 0.01 copies per genome.

3.3.2.3. Example #4. Gene integration at the AAVSI safe harbor locus in human induced pluripotent stem cells (hiPSC). To illustrate the targeted gene integration methodology at a safe harbor locus, we chose to describe the integration of a matrix called T3G-GOI-AAVSI at the AAVSI locus of hiPSC. The T3G-GOI-AAVSI integration matrix is a circular DNA fragment containing different key genetic elements for efficient targeted insertion and selection of clones (Fig. 9A). It contains left and right AAVSI homology sequences. Each of them are 1 kb long and are involved in the promotion of HRmediated targeted insertion at the AAVSI locus. These two homologies surround (i), a polycistronic selection cassette made of a self cleavable GFP-2A-neomycine resistance marker under the control of pEF1alpha promoter (pEF1alpha-GFP-T2A-NEOR) (ii), a TET-ON expression cassette and (iii), a GOI under the control of pTRE promoter, a TET-ON dependent promoter inducible by doxycycline. In addition, this matrix contains a counter selection cassette harboring the HSV TK coding sequence. In the presence of ganciclovir, a non toxic prodrug, HSV TK acts as a suicide enzyme and promotes cell death. Such cassette is located outside the AAVSI homologies and is thus not expected to be integrated at the AAVSI locus. However, when random integration occurs, HSV TK selection cassette could also be inserted randomly and thus used to counter select irrelevant clones in the presence of ganciclovir.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

15

Fig. 8. General design of integration matrix and targeted integration PCR screening strategy. (A) Schematic representation of the different genetic element constituting the matrix (ORI and prom stand for plasmid replication origin and promoter, respectively). (B) Schematic representation of the integration matrix positive control used to set up PCR screening protocol. Oligonucleotide pairs used to perform the locus-specific left and right PCRs as well as the non specific PCR, are indicated as blue, green and yellow arrows. (C) Schematic representation of PCR screening experiments performed on the matrix positive control (+) and on 4 different clones (Cl1-4) isolated after transfection by given TALEN and integration matrix. In this example, Cl4 (red) should be selected because it displays Locus specific left and right positive PCRs and is free of any random or targeted multicopy insertion as seen by the absence of Non specific PCR band.

To integrate the GOI at the AAVSI locus of hiPSC, cells were electroporated using Amaxa technology in the presence of T3G-GOIAAVS1 DNA matrix and of the two AAVSI TALEN-encoding plasmids (see M&M section). Five days post electroporation, when cells reached approximatively 70% confluency, neomycin was added to the culture media to select cell population harboring stable integrations of T3G-GOI-AAVS1 matrix. Such selective process was efficient, as 10 days of neomycin treatment were sufficient to remove the totality of control cells, electroporated with TALEN encoding plasmids alone. At the end of the process, an average of 165 NEOR colonies was obtained from cells transfected with the matrix and the two plasmid-encoding TALEN. Interestingly, we also obtained an average of 32 NEOR colonies from cells transfected with matrix only. Together, these results suggested that a majority of gene integration events occurred in a targeted fashion, while a minority occurred randomly. To eliminate NEOR colonies containing either multicopies and/ or random integration of AAVSI matrix, a double selection process was undergone using neomycin and ganciclovir. To do so, NEOR colonies were picked, transferred in a 96 well plates and allowed to grow for 5 days. Neomycin and ganciclovir were then added to the culture media and cells were kept under selection for 5 additional days. At the end of the double selection process, 96 well plates were duplicated. One plate was kept in culture for further

use, the other was used for genomic DNA extraction and PCR screening characterization (Fig. 9B). Three different PCRs were performed to identify targeted integration events at the AAVSI locus. The first two, named locus specific left and right PCR, were specific for targeted integration events. The third one however, was designed to amplifying the ORI sequence (located on a matrix section that was not supposed to be inserted) thus enabling detection of random or mutlicopy insertion events. Out of 168 NEOR colonies, 121 were able to survive to the double selection process. Among them, 35% harbored one single targeted integration at AAVSI locus (locus specific Left and Right PCR+/ORI specific PCR, Fig. 9C, red clones). The clone # 75, clearly displaying a targeted integration PCR signature, was selected to further assess its ability to express the GOI in a doxycyclin-dependent manner. For that purpose, it was grown for 10 days in the presence of increasing amounts of doxycyclin (from 0 to 8 lg/ ml). RT PCRs were then performed to quantify the GOI translation levels and amplicons were analyzed by electrophoresis (Fig. 9D). Our results showed that the amount of GOI amplicons increased as a function of doxycyclin concentration and reached a plateau similar to its physiological level found in HepG2 cells (Fig. 9D, top panel, lane C). Actin amplicons, used here as a RT PCR control, remained steady thus confirming that doxycyclin was a GOI-specific effector. Interestingly, a slight GOI amplicon band was found

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

16

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 9. TALEN-mediated gene integration at AAVSI safe harbor locus in hiPSC. (A) Schematic representation of T3G-GOI-AAVS1 matrix used for target integration at the AAVSI locus in hiPSC. This matrix contain two, 1 kb long, left and right AAVS1 homology sequences, a GFP-2A-NEO and TET-ON genes under the control of two independent EF1 alpha promoters, a GOI under the control of pTRE promoter and a counter selection cassette encompassing the suicide gene Thymidylate kinase from Herpes Simplex Virus-1 (HSV-TK) under the control of pHTLV promoter. (B) Schematic representation of the T3G-GOI-AAVS1 matrix inserted at the AAVSI locus along with the position of the oligonucleotides used to perform the locus specific left and right PCR screening. (C) Example of multiplex Locus specific PCR screening of 24 different clones obtained after double selection (neomycin/ganciclovir). Amplicons obtained after locus specific Left, Right or Ori PCRs were analyzed on agarose gel. Sizes of specific PCR amplicons are indicated on the left side and clones harboring targeted integration of AAVSI matrix (positive for locus specific Left and Right PCRs and negative for ORI specific PCR) are indicated in red. (D) Functional assessment of doxycyclin-dependent transcription of the GOI in clone #75. Clone #75 was treated by increasing concentration of doxycyclin (from 0 to 8 lg/mL) for 10 days. RT-PCRs were then performed to quantify translations of GOI (top) and actin (bottom) used as RT PCR control. Lanes A and B correspond respectively to the RT-PCR controls performed on untransfected hiPSC in the presence or in the absence of reverse transcriptase. Lanes C and D correspond respectively to the RT-PCR controls performed of HepG2 cells expressing the endogenous GOI in the presence or in the absence of reverse transcriptase. Reverse transcription was performed according to Superscript III reverse transcriptase protocol (Life technologies) and followed by specific PCR amplification of the GOI integrated in the AAVS1 locus.

in the absence of doxycyclin suggesting leakiness of pTRE promoter. Further analysis of clone #75 enabled to detect the presence of nuclear and surface stemness markers (Nanog, Sox2, Oct4, SSEA4, Tra-1-60 and Tra-1-81, data not shown). In addition, we were able to successfully promote directed hepatocyte differentiation of this clone, together demonstrating that hiPSC properties were maintained during the TALEN-mediated targeted integration. All together, our data showed TALEN-mediated gene integration could be efficiently performed at a safe harbor locus in hiPSC. About 4 weeks are necessary to transfect cells and select NEOR/ GanciclovirR colonies. One additional week should be anticipated for identifying relevant clones via multiplex PCR screening according to our procedure. 3.3.2.4. Example #5. TALEN-mediated targeted gene integration at the ROSA26 locus in L cells. A different strategy has been used to integrate a GOI at the ROSA26 locus and select clones harboring correct

targeted integration. We took advantage of the presence of an endogenous promoter within the ROSA26 locus and designed a matrix bearing a promoter-less neomycin resistance marker, downstream from the left homology region (Fig. 10A). Such matrix design was expect to allow expression of neomycin resistance selection only after a correct targeted event. A PCR screening strategy (similar to the one delineated above) and southern blot validation were set up to respectively monitor gene integration and assess the amount of aberrant integration. Upon co-transfection of ROSA26 specific TALEN and integration matrix (Supplementary Tables 1–4), we observed an overall low targeting efficiency (105). However, all neomycin resistant clones were positive for at least one locus specific PCR (either left or right, (Fig. 10C). Southern Blot analysis performed on 8 clones positive for left and right PCRs and on 2 clones positive for left PCR only, showed that all of them displayed the expected band size when using digestion test specific for the 50 end of the integration site

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

17

Fig. 10. TALEN-mediated gene integration at ROSA26 locus in mouse L cells. (A) Schematic representation of the ROSA26 integration matrix used for target integration at the ROSA26 locus in mouse L cells. This matrix contains left and right homology arms (500 bp each), a promoter less neomycin resistance gene, a GOI expression cassette and a counter selection cassette encompassing the suicide gene Thymidylate kinase from Herpes Simplex Virus-1 (HSV-TK) under the control of pEF1a promoter. (B) Schematic representation of the ROSA26 locus after targeted integration. The position of the oligonucleotides used to perform the locus specific left and right PCR screenings are indicated (with expected amplicon sizes) as well as the restriction sites used for Southern Blot analysis. (C) Example of locus specific PCR screening of 84 clones obtained after double selection (neomycine/ganciclovir). Amplicons obtained after locus specific Left or Right PCRs were analyzed on agarose gel 0.8%. Clones harboring targeted integration of ROSA26 matrix (positive for locus specific Left and Right PCRs) are indicated in red. (D) Southern Blot analysis. Using the SacII/EcoRV endonucleases, all clones showed a band at the expected size (3 kb) confirming the locus specific Left PCR screen. Furthermore, when using the digest SacII/PmeI (allowing the analysis of the entire region), 8 out of 10 clones showed a band at expected size (6.8 kb), confirming again the Locus specific Right PCR screen. The remaining 2 clones showing aberrant band correspond to clones A8 and B7 that displayed positive Locus specific Left PCR and negative Locus specific Right PCR.

(Fig. 10D). However, when using a digestion test specific for the overall correct integration (both 50 and 30 ), clones positive for both PCR screenings presented the expected band size whereas clones positive for the left PCR only, displayed a band shorter than expected. Thus 100% of NEOR clones positive for both PCR screenings harbor the expected target integration event. Together, these results illustrate an alternative and efficient method to perform TALEN-mediated targeted gene integration and stress the necessity to set up robust and comprehensive PCR screening to check both sides of integration for the identification of relevant clones. The development of such mouse cell line could be performed in less than 3 months. 3.3.3. Targeted integration for gene tagging As stated above, TALEN technology is amenable for highly precise gene modification. It can thus be used to precisely integrate a protein marker (i.e. fluorescent protein) in-frame with a coding sequence of interest, via a methodology called gene tagging.

When designing a gene tagging experiment, one should first determine the location of protein marker addition according to the intrinsic properties of the protein of interest (POI). Among the important parameters to take into account are the presence of peptide leader sequences, intracellular trafficking sequences, protein domains involved in protein partner interaction and the presence of functional domains. Another important parameter to set is the nature of the sequence linking the protein marker to the POI. Regarding that purpose, a 2–10mer poly glycine–serine peptide is usually used although some proteins may require more sophisticated peptide linker. In addition, location of the TALEN target site is crucial for gene tagging applications. Depending on the protein fusion orientation, this site should be located as close as possible to the start or the stop codon of the POI. In other words, if multiple TALEN could be generated within one of these regions, the TALEN of choice should be selected on the basis of its position rather than on its activity. Most of the time, such TALEN could be easily identified.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

18

J. Valton et al. / Methods xxx (2014) xxx–xxx

Fig. 11. TALEN-mediated multiplex gene tagging of Tubuline 1A and Histone 2B in U-2 OS cells. (A) Schematic representation of TALEN-assisted gene tagging by GFP. Optimal position of TALEN target site is indicated for an Nterminal tagging. Design of the integration matrix and mechanism of fluorescent protein HR-dependent integration are depicted. Upon specific TALEN cleavage, HR pathway is triggered and lead to the specific integration of the fluorescent marker in frame with the adjacent coding sequence. (B) Confocal microscopy analysis of double tagged U-2 OS cell line. TUB1A and H2B genes were respectively tagged with the GFP and BFP. Cells were analyzed during the different cell division stages including interphase, metaphase and anaphase. Left, middle and right columns illustrate respectively confocal analysis of U-2 OS cells using GFP, BFP and GFP/BFP filter channels.

Integration matrix, allowing the in frame fusion of a protein marker coding sequence with an endogenous gene, should be carefully designed. Such design should be done according to the chosen TALEN target site. Regarding the size of the homology arms, one can follow the same rules as the ones delineated above for gene integration at safe harbor loci. Regarding the intervening sequence however, it should be designed according to the orientation of protein fusion wanted. In the case of N-terminal tagging, the intervening sequence should start by an ATG and contain the coding sequence for the fluorescent marker followed by the peptide linker. The beginning of the GOI coding sequence should immediately follow the linker, must be in frame with the protein marker and be

part of the right homology arm. For a C-terminal tagging, the intervening sequence should start with the linker sequence in frame with the left homologous arm bearing part of the C-terminus sequence of the GOI and be followed by the sequence of the protein marker (with or without ATG). Since the size of the sequence to integrate is limited (usually less than 1 kb), a simple PCR screening could be set up. Oligonucleotide primers specific for the genomic sequence outside the left and right homology arms should be used as depicted in Fig. 2D. This PCR is supposed to amplify either the wild type gene or the modified gene and to allow for easy discrimination on a regular agarose gel.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

J. Valton et al. / Methods xxx (2014) xxx–xxx

3.3.3.1. Example # 6. TALEN-mediated multiplex gene tagging of Tubuline 1A and Histone 2B in U-2 OS cells. To illustrate the targeted gene tagging methodology, we chose to describe the multiplex tagging of Tubulin 1A (TUB1A) and Histon 2B (H2B) endogenous genes present in the U-2 OS cell line. In this example, we tagged, in a sequential manner, the N-terminal domain of TUB1A with the Green fluorescent protein (GFP) and C-terminal domain of H2B with the Blue fluorescent protein (BFP). A TALEN specifically designed to target the ATG start codon of TUB1A was first transfected along with its corresponding integration matrix (Supplemantary Tables 1–4). Because we aimed at introducing the GFP in the first coding exon of TUB1A (Supplemantary Table 3), the left homology arm of the matrix could contain some promoter regulatory sequences as illustrated in Fig. 11A. Such sequences could thus theoretically allow transient expression of the GFP by the matrix alone, resulting in false positive results. To verify this, we performed a control experiment by transfecting U-2 OS cells with the matrix alone and monitor GFP expression over more than two weeks. In parallel, U-2 OS cells were transfected with the matrix and its corresponding TALEN. Two weeks post transfection, we observed a steady expression of GFP for about 2% of cells transfected with the TALEN and matrix, whereas GFP expression was undetectable in the control experiment. Twenty one days post transfection, GFP positive cells were single cell sorted using a FACS apparatus and then subjected to PCR screening (data not shown). PCR screening positive clones were eventually analyzed via confocal microscopy for phenotypic validation (data not shown). A BFP and PCR screening positive clone was then used to assess the second targeted integration at the H2B locus (Supplemantary Tables 1–4) according to the procedure describe above. The same control experiment was performed even though H2B integration matrix was supposed to be theoretically free of any promoting sequences. Confocal analysis of the resulting engineered cell line were performed at different phases of cell cycle including the interphase, anaphase and metaphase allowing to observe cellular localization and organization of GFP-TUB1A and H2B-BFP fusion proteins (Fig. 11B). Our results showed that the activity and localization patterns of the two fusion proteins were found similar to the ones usually observed for their endogenous counterparts. This observation indicated the TALEN-mediated gene tagging methodology allowed visualization of endogenous TUB1A and H2B without significantly impacting their physiological functions. We cannot exclude however, that tagging other genes of interest won’t affect their function. Overall, production of GFP-TUB1A and H2B-BFP containing U-2 OS cell line was accomplished in about 5 months from the design of TALEN and integration matrices to the confocal microscopy analysis and molecular validations. Noteworthy, gene tagging of human actin gene (ACTB) by GFP has also been performed in hiPSC and other stem cell lineages (Duhamel et al., unpublished data).

4. Concluding remarks In this paper, different strategies for TALEN-mediated genome editing in mammalian cells have been delineated and illustrated by experimental examples. We showed that educated design and utilization of TALEN, as well as the choice of relevant activity assessment methods, are important to perform efficient gene inactivation and integration in the mammalian genome. This holds true when one wants to use other nuclease platforms including meganucleases, zinc finger nucleases or CRISPR/Cas9 nucleases, and in that sense, our work provides a methodological template for each of them.

19

Regarding the financial aspect, obtaining TALEN represents limited funding investment and essentially no workforce allowance. Indeed, with the increasing number of TALEN and gene synthesis providers, purchasing a TALEN is affordable and prices are expected to plummet as more players enter the field. Thus, even though setting up a TALEN assembly platform could be done using published methods [3,15,29,54–58], it might be unworthy if one needs only few couples of TALEN to fulfill its research plans. Even though today TALEN represents the most widely used TALE-base tool, it is not the only one. Indeed, since the past three years several different TALE scaffold engineerings have been reported. Thanks to these new scaffolds, it is now possible to regulate genes expression via specific activation or repression of their transcription or via modulation of their epigenetic status. Such regulation can be orchestrated in a spatio-temporal manner with the advent of self-assembly inducible systems described recently [5]. Utilization of such powerful techniques in combination with other technologies (such as nanotechnologies) will enable generation of complex synthetic organisms and devices for different fields of applications such as synthetic biology and regenerative medicine. A revolution is looming and we are just starting to feel its first tremors. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ymeth.2014.06. 013. References [1] A.J. Bogdanove, D.F. Voytas, Science 333 (2011) 1843–1846. [2] A. Munoz Bodnar, A. Bernal, B. Szurek, C.E. Lopez, Mol. Biotechnol. 53 (2013) 228–235. [3] F. Zhang, L. Cong, S. Lodato, S. Kosuri, G.M. Church, P. Arlotta, Nat. Biotechnol. 29 (2011) 149–153. [4] L. Cong, R. Zhou, Y.C. Kuo, M. Cunniff, F. Zhang, Nat. Commun. 3 (2012) 968. [5] S. Konermann, M.D. Brigham, A.E. Trevino, P.D. Hsu, M. Heidenreich, L. Cong, R.J. Platt, D.A. Scott, G.M. Church, F. Zhang, Nature 500 (2013) 472–476. [6] E.M. Mendenhall, K.E. Williamson, D. Reyon, J.Y. Zou, O. Ram, J.K. Joung, B.E. Bernstein, Nat. Biotechnol. 31 (2013) 1133–1136. [7] M.L. Maeder, J.F. Angstman, M.E. Richardson, S.J. Linder, V.M. Cascio, S.Q. Tsai, Q.H. Ho, J.D. Sander, D. Reyon, B.E. Bernstein, J.F. Costello, M.F. Wilkinson, J.K. Joung, Nat. Biotechnol. 31 (2013) 1137–1142. [8] A.C. Mercer, T. Gaj, R.P. Fuller, C.F. Barbas 3rd, Nucleic Acids Res. 40 (2012) 11163–11172. [9] M. Christian, T. Cermak, E.L. Doyle, C. Schmidt, F. Zhang, A. Hummel, A.J. Bogdanove, D.F. Voytas, Genetics 186 (2010) 757–761. [10] M. Beurdeley, F. Bietz, J. Li, S. Thomas, T. Stoddard, A. Juillerat, F. Zhang, D.F. Voytas, P. Duchateau, G.H. Silva, Nat. Commun. 4 (2013) 1762. [11] M. Yanik, J. Alzubi, T. Lahaye, T. Cathomen, A. Pingoud, W. Wende, PLoS One 8 (2013) e82539. [12] S. Boissel, J. Jarjour, A. Astrakhan, A. Adey, A. Gouble, P. Duchateau, J. Shendure, B.L. Stoddard, M.T. Certo, D. Baker, A.M. Scharenberg, Nucleic Acids Res. 42 (2014) 2591–2601. [13] J.K. Joung, J.D. Sander, Nat. Rev. Mol. Cell Biol. 14 (2013) 49–55. [14] C. Wei, J. Liu, Z. Yu, B. Zhang, G. Gao, R. Jiao, J. Genet. Genomics 40 (2013) 281– 289. [15] D. Reyon, S.Q. Tsai, C. Khayter, J.A. Foden, J.D. Sander, J.K. Joung, Nat. Biotechnol. 30 (2012) 460–465. [16] Y. Kim, J. Kweon, A. Kim, J.K. Chon, J.Y. Yoo, H.J. Kim, S. Kim, C. Lee, E. Jeong, E. Chung, D. Kim, M.S. Lee, E.M. Go, H.J. Song, H. Kim, N. Cho, D. Bang, J.S. Kim, Nat. Biotechnol. 31 (2013) 251–258. [17] Y. Zu, X. Tong, Z. Wang, D. Liu, R. Pan, Z. Li, Y. Hu, Z. Luo, P. Huang, Q. Wu, Z. Zhu, B. Zhang, S. Lin, Nat. Methods 10 (2011) 329–331. [18] D. Hockemeyer, H. Wang, S. Kiani, C.S. Lai, Q. Gao, J.P. Cassady, G.J. Cost, L. Zhang, Y. Santiago, J.C. Miller, B. Zeitler, J.M. Cherone, X. Meng, S.J. Hinkley, E.J. Rebar, P.D. Gregory, F.D. Urnov, R. Jaenisch, Nat. Biotechnol. 29 (2011) 731– 734. [19] H. Zhu, C.H. Lau, S.L. Goh, Q. Liang, C. Chen, S. Du, R.Z. Phang, F.C. Tay, W.K. Tan, Z. Li, J.C. Tay, W. Fan, S. Wang, Nucleic Acids Res. 41 (2013) e180. [20] A. Dupuy, J. Valton, S. Leduc, J. Armier, R. Galetto, A. Gouble, C. Lebuhotel, A. Stary, F. Paques, P. Duchateau, A. Sarasin, F. Daboussi, PLoS One 8 (2013) e78678. [21] N. Ma, B. Liao, H. Zhang, L. Wang, Y. Shan, Y. Xue, K. Huang, S. Chen, X. Zhou, Y. Chen, D. Pei, G. Pan, J. Biol. Chem. 288 (2013) 34671–34679.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

20

J. Valton et al. / Methods xxx (2014) xxx–xxx

[22] M.J. Osborn, C.G. Starker, A.N. McElroy, B.R. Webber, M.J. Riddle, L. Xia, A.P. DeFeo, R. Gabriel, M. Schmidt, C. von Kalle, D.F. Carlson, M.L. Maeder, J.K. Joung, J.E. Wagner, D.F. Voytas, B.R. Blazar, J. Tolar, Mol. Ther. 21 (2013) 1151– 1159. [23] D.G. Ousterout, P. Perez-Pinera, P.I. Thakore, A.M. Kabadi, M.T. Brown, X. Qin, O. Fedrigo, V. Mouly, J.P. Tremblay, C.A. Gersbach, Mol. Ther. 21 (2013) 1718– 1726. [24] S. Arnould, P. Chames, C. Perez, E. Lacroix, A. Duclert, J.C. Epinat, F. Stricher, A.S. Petit, A. Patin, S. Guillier, S. Rolland, J. Prieto, F.J. Blanco, J. Bravo, G. Montoya, L. Serrano, P. Duchateau, F. Paques, J. Mol. Biol. 355 (2006) 443–458. [25] Y. Doyon, V.M. Choi, D.F. Xia, T.D. Vo, P.D. Gregory, M.C. Holmes, Nat. Methods 7 (2010) 459–460. [26] J. Boch, U. Bonas, Annu. Rev. Phytopathol. 48 (2010) 419–436. [27] M.J. Moscou, A.J. Bogdanove, Science 326 (2009) 1501. [28] J.C. Miller, S. Tan, G. Qiao, K.A. Barlow, J. Wang, D.F. Xia, X. Meng, D.E. Paschon, E. Leung, S.J. Hinkley, G.P. Dulay, K.L. Hua, I. Ankoudinova, G.J. Cost, F.D. Urnov, H.S. Zhang, M.C. Holmes, L. Zhang, P.D. Gregory, E.J. Rebar, Nat. Biotechnol. 29 (2011) 143–148. [29] T. Cermak, E.L. Doyle, M. Christian, L. Wang, Y. Zhang, C. Schmidt, J.A. Baller, N.V. Somia, A.J. Bogdanove, D.F. Voytas, Nucleic Acids Res. 39 (2011) e82. [30] J. Streubel, C. Blucher, A. Landgraf, J. Boch, Nat. Biotechnol. 30 (2012) 593–595. [31] S. Chen, G. Oikonomou, C.N. Chiu, B.J. Niles, J. Liu, D.A. Lee, I. Antoshechkin, D.A. Prober, Nucleic Acids Res. 41 (2013) 2769–2778. [32] A. Juillerat, G. Dubois, J. Valton, S. Thomas, S. Stella, A. Marechal, S. Langevin, N. Benomari, C. Bertonati, G.H. Silva, F. Daboussi, J.C. Epinat, G. Montoya, A. Duclert, P. Duchateau, Nucleic Acids Res. 42 (2014) 5390–5402. [33] A. Xiao, Y. Wu, Z. Yang, Y. Hu, W. Wang, Y. Zhang, L. Kong, G. Gao, Z. Zhu, S. Lin, B. Zhang, Nucleic Acids Res. 41 (2012) D415–D422. [34] S. Bultmann, R. Morbitzer, C.S. Schmidt, K. Thanisch, F. Spada, J. Elsaesser, T. Lahaye, H. Leonhardt, Nucleic Acids Res. 40 (2012) 5368–5377. [35] B.I. Wicky, M. Stenta, M. Dal Peraro, PLoS One 8 (2013) e80261. [36] J. Valton, A. Dupuy, F. Daboussi, S. Thomas, A. Marechal, R. Macmaster, K. Melliand, A. Juillerat, P. Duchateau, J. Biol. Chem. 287 (2012) 38427–38432. [37] D. Deng, P. Yin, C. Yan, X. Pan, X. Gong, S. Qi, T. Xie, M. Mahfouz, J.K. Zhu, N. Yan, Y. Shi, Cell Res. 22 (2012) 1502–1504. [38] M. Bochtler, Biol. Chem. 393 (2012) 1055–1066.

[39] S. Bultmann, R. Morbitzer, C.S. Schmidt, K. Thanisch, F. Spada, J. Elsaesser, T. Lahaye, H. Leonhardt, Nucleic Acids Res. 40 (2012) 5368–5377. [40] S.S. Palii, B.O. Van Emburgh, U.T. Sankpal, K.D. Brown, K.D. Robertson, Mol. Cell Biol. 28 (2008) 752–771. [41] J.P. Guilinger, V. Pattanayak, D. Reyon, S.Q. Tsai, J.D. Sander, J.K. Joung, D.R. Liu, Nat. Methods 11 (2014) 429–435. [42] E.J. Fine, T.J. Cradick, C.L. Zhao, Y. Lin, G. Bao, Nucleic Acids Res. 42 (2013) e42. [43] J. Grau, J. Boch, S. Posch, Bioinformatics 29 (2013) 2931–2932. [44] E.L. Doyle, N.J. Booher, D.S. Standage, D.F. Voytas, V.P. Brendel, J.K. Vandyk, A.J. Bogdanove, Nucleic Acids Res. 40 (2012) W117–W122. [45] D.Y. Guschin, A.J. Waite, G.E. Katibah, J.C. Miller, M.C. Holmes, E.J. Rebar, Methods Mol. Biol. 649 (2010) 247–256. [46] A. Lloyd, C.L. Plaisier, D. Carroll, G.N. Drews, Proc. Natl. Acad. Sci. U.S.A. 102 (2005) 2232–2237. [47] T.J. Dahlem, K. Hoshijima, M.J. Jurynec, D. Gunther, C.G. Starker, A.S. Locke, A.M. Weis, D.F. Voytas, D.J. Grunwald, PLoS Genet. 8 (2012) e1002861. [48] G.A. Bazykin, A.V. Kochetov, Nucleic Acids Res. 39 (2010) 567–577. [49] A.V. Kochetov, Bioessays 30 (2008) 683–691. [50] J. Boch, H. Scholze, S. Schornack, A. Landgraf, S. Hahn, S. Kay, T. Lahaye, A. Nickstadt, U. Bonas, Science 326 (2009) 1509–1512. [51] A. Lombardo, D. Cesana, P. Genovese, B. Di Stefano, E. Provasi, D.F. Colombo, M. Neri, Z. Magnani, A. Cantore, P. Lo Riso, M. Damo, O.M. Pello, M.C. Holmes, P.D. Gregory, A. Gritti, V. Broccoli, C. Bonini, L. Naldini, Nat. Methods 8 (2011) 861– 869. [52] J. Eyquem, L. Poirot, R. Galetto, A.M. Scharenberg, J. Smith, Biotechnol. Bioeng. 110 (2013) 2225–2235. [53] M. Sadelain, E.P. Papapetrou, F.D. Bushman, Nat. Rev. Cancer 12 (2012) 51–58. [54] J. Yang, P. Yuan, D. Wen, Y. Sheng, S. Zhu, Y. Yu, X. Gao, W. Wei, PLoS One 8 (2013) e75649. [55] Z. Zhang, S. Zhang, X. Huang, K.E. Orwig, Y. Sheng, PLoS One 8 (2013) e80281. [56] J.L. Schmid-Burgk, T. Schmidt, V. Kaiser, K. Honing, V. Hornung, Nat. Biotechnol. 31 (2013) 76–81. [57] T. Li, S. Huang, X. Zhao, D.A. Wright, S. Carpenter, M.H. Spalding, D.P. Weeks, B. Yang, Nucleic Acids Res. 39 (2011) 6315–6325. [58] L. Li, M.J. Piatek, A. Atef, A. Piatek, A. Wibowo, X. Fang, J.S. Sabir, J.K. Zhu, M.M. Mahfouz, Plant Mol. Biol. 78 (2012) 407–416.

Please cite this article in press as: J. Valton et al., Methods (2014), http://dx.doi.org/10.1016/j.ymeth.2014.06.013

Efficient strategies for TALEN-mediated genome editing in mammalian cell lines.

TALEN is one of the most widely used tools in the field of genome editing. It enables gene integration and gene inactivation in a highly efficient and...
4MB Sizes 5 Downloads 3 Views