Optimal protocols for efficient and reproducible protein extraction from formalin-fixed paraffin-embedded (FFPE) tissues are not yet standardised and new techniques are continually developed and improved. The effect of polyethylene glycol (PEG) 20 000 on protein extraction efficiency has not been evaluated using human FFPE colorectal cancer tissues and there is no consensus on the protein extraction solution required for efficient, reproducible extraction.
The impact of PEG 20 000 on protein extraction efficiency, reproducibility and protein selection bias was evaluated using FFPE colonic tissue via liquid chromatography tandem mass spectrometry analysis.
This study was conducted from August 2017 to July 2019 using human FFPE colorectal carcinoma tissues from the Anatomical Pathology department at Tygerberg Hospital in South Africa. Samples were analysed via label-free liquid chromatography tandem mass spectrometry to determine the impact of using PEG 20 000 in the protein extraction solution. Data were assessed regarding peptide and protein identifications, method efficiency, reproducibility, protein characteristics and organisation relating to gene ontology categories.
Polyethylene glycol 20 000 exclusion increased peptides and proteins identifications and the method was more reproducible compared to the samples processed with PEG 20 000. However, no differences were observed with regard to protein selection bias. We found that higher protein concentrations (> 10 µg) compromised the function of PEG.
This study indicates that protocols generating high protein yields from human FFPE tissues would benefit from the exclusion of PEG 20 000 in the protein extraction solution.
Archival formalin-fixed paraffin-embedded (FFPE) tissue repositories are valuable resources for clinical proteomic studies; such repositories may include retrospective as well as protein biomarker discovery and validation studies.
The development and standardisation of FFPE sample processing for mass spectrometry (MS)-based analysis to determine changes (or similarities) in the proteome composition of tumour versus healthy tissues is of great interest to clinical and translational research.
During the protein extraction process, the effect of the formaldehyde fixation chemistry on the tissues poses another challenge to overcome. Due to extensive formaldehyde cross-linking between molecules, accurate and efficient protein extraction from FFPE tissues is difficult. It requires specific sample processing techniques to allow for complete breakage of cross-linking bonds, which in turn allows for proper trypsin digestion.
We have previously studied the effects of FFPE block age on the quality and quantity of protein extracted from FFPE tissues and also evaluated protein purification methods using LC-MS/MS analysis.
Polyethylene glycol can vary in polymer size, and for this study PEG 20 000 was chosen, because it is the most extensively used form in FFPE tissue proteomics; subsequently all references to PEG in this article are to the 20 000 form. The aim of this study was to evaluate the effects of PEG within the protein extraction buffer using label-free LC-MS/MS analysis of manually micro-dissected FFPE human colorectal carcinoma (CRC) resection samples. The sample pellets were also tested for residual protein, which was not extracted in the whole cell protein lysates (WCPLs).
Ethics clearance was obtained from the Health Research Ethics Committee of Stellenbosch University (ethics reference number: S17/10/203) and Biomedical Science Research Ethics Committee of the University of the Western Cape (ethics reference number: BM17/7/15). All patient specimens were anonymised before being archived for long-term storage and before they were accessed for the study. Patient consent was not required since it was a retrospective study using archival tissues.
This study conducted from August 2017 to July 2019, included retrospectively chosen human colorectal resection specimens acquired from the department of Anatomical Pathology at Tygerberg Hospital in Western Cape, South Africa. The specimens were preserved as FFPE blocks when the tissue was resected and archived between January 2016 and December 2017. Due to retrospective collection of the samples, the exact pre-analytical factors, such as the handling, fixation times and conditions, and storage conditions, were unknown and could not be accounted for.
Details of the three FFPE patient cases selected for analysis at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019.
Meta data tags | Patient 1 | Patient 2 | Patient 3 |
---|---|---|---|
Block year | 2017 | 2016 | 2016 |
Patient age (years) | 60 | 47 | 60 |
Gender | Female | Male | Male |
Diagnosis | Adenocarcinoma | Adenocarcinoma | Adenocarcinoma |
Grade | Low-grade | High-grade | Low-grade |
Stage | IIIB | IIIB | IIA |
Location | Right colon | Right colon | Right colon |
To ensure tissue quality and comparability, a pathologist reviewed the patient tissue sections after haematoxylin and eosin staining to select only specimens that had carcinomas with more than 90% viable tumour nuclei (
Colon adenocarcinoma tissue specimens analysed at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. Microscopic images of haematoxylin and eosin-stained colon tissue sections of patient resection specimens analysed in this study: patient 1 (a), patient 2 (b) and patient 3 (c) at 100× magnification.
To overcome the effects of formaldehyde cross-linking, we opted to combine protein extraction techniques that employed the use of antigen retrieval, strong detergent concentration, as well as a synthetic polymer for protein precipitation (PEG 20 000). For protein purification before LC-MS/MS analysis, we used the Single-Pot Solid-Phase-enhanced Sample Preparation (SP3)
The equivalence of 23 mm3 of manually micro-dissected FFPE tumour tissue was cut and processed for each patient case (
Summarised workflow and experimental design followed at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. FFPE colorectal carcinoma tissue from three patients were cut at 25 µm thickness and tumour areas were manually micro-dissected for analysis. From each patient FFPE block, 4 tissue sections (each 25 µm thick and equivalent to approximately 23 mm3 tissue) were used per experimental sample. Protein extraction buffer, with or without the addition of PEG, was used to extract protein. Sample pellets were analysed for residual protein by further protein extraction (using 4% SDS), followed by protein quantification and subsequent sample processing (for LC-MS/MS analysis) by the HILIC/SP3 method. WCPLs from each patient were quantified and processed by the HILIC/SP3 sample preparation method, followed by MS analysis. The mass spectra generated were then analysed during the data analysis phase.
The method used for sample processing, protein extraction and protein yield quantification was modified from the protocols used by Scicchitano
The MagReSyn® (ReSyn Biosciences, Edenvale, Gauteng, South Africa) HILIC/SP3 method (using on-bead digestion) was used for protein purification and tryptic digestion (peptide generation) prior to LC-MS/MS analysis. The method was modified from the protocol used by Hughes
Mass spectrometry analysis of each sample’s peptides was performed using the Q-Exactive quadrupole-Orbitrap (Thermo Fisher Scientific, Waltham, Massachusetts, United States), which was coupled with a Dionex Ultimate 3000 nano-UPLC system as described before by Rossouw.
The raw spectral data were converted into ‘mascot generic format’ (Matrix Science, London, United Kingdom), which is a standard format used for tandem MS data that converts the raw data into a simpler format for subsequent database searches, using msConvert (ProteoWizard, Palo Alto, California, United States).
Numbers of identified peptides and proteins from WCPLs and pellets, compared between different protein extraction buffers with or without addition of PEG at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. (a) Box and whiskers plot showing the number of peptides identified (for all three patient samples) per condition – Pellet with PEG (4% SDS), Pellet without PEG (4% SDS), WCPL with PEG (2% SDS), WCPL without PEG (2% SDS). (b) Venn diagram depicting the distribution of identified peptides (for all three patient cases) among all conditions. (c) Box and whiskers plot showing the number of proteins identified (for all three patient samples) per condition. (d) Venn diagram depicting the distribution of identified proteins (individual and protein groups) (for all three patient cases) among all conditions. (–PEG) refers to protein extracted without PEG and (+PEG) refers to protein extracted with PEG. Red boxplots refer to pellet samples extracted with PEG; Purple boxplots refer to pellet samples extracted without PEG; Blue boxplots refer to WCPL samples extracted with PEG; Green boxplots refer to WCPL samples extracted without PEG.
Peptide and protein identification settings at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019.
Parameter | Settings |
---|---|
Trypsin digestion | Specific, maximum of 2 missed cleavages |
MS1 tolerance | 10.0 ppm |
MS2 tolerance | 0.02 Da |
Fixed modifications | Methylthio of C (+45.987721 Da) |
Variable modifications | Oxidation of M (+15.994915 Da), Deamidation of N and Q (+0.984016 Da) |
Fixed modifications (refinement procedure) | Methylthio of C (+45.987721 Da) |
Variable modifications (refinement procedure) | Acetylation of protein N-term (+42.010565 Da), Pyrolidone from E (--18.010565 Da), Pyrolidone from Q (--17.026549 Da), Pyrolidone from carbamidomethylated C (--17.026549 Da) |
MS1, first stage of mass spectrometry; MS2, second stage of mass spectrometry; ppm, parts per million; Da, Dalton; C, cysteine; M, methionine; N, asparagine; Q, glutamine; N-term, N-terminal; E, glutamic acid.
Data were analysed and graphically visualised and displayed using Pandas, NumPy and Matplotlib Python packages (Python Software Foundation, Wilmington, Delaware, United States), as well as Microsoft Excel (Microsoft Corporation, Redmond, Washington, United States).
The merged lists of either peptide sequences or protein accession numbers (individual as well as protein groups) identified in each sample group and experimental condition were processed using Venny (version 2.1.0)
To determine the qualitative reproducibility of each experimental condition, the peptide identification overlap (Supplementary document – Figure S1) was computed using the peptide sequences identified for each sample from the data set (regardless of peptide abundance). From these results, the physicochemical properties of the peptides (unique as well as shared) for all conditions were assessed for each patient (Supplementary document – File S2).
Spectrum counting abundance indexes were estimated using the Normalised Spectrum Abundance Factor
ProPAS (version 1.1)
Gene Ontology (GO) analysis was performed using protein annotations retrieved from Ensembl (
The percentages of missed cleavages for all samples were calculated and graphically visualised and displayed using Pandas, NumPy and Matplotlib Python packages (Python Software Foundation, Wilmington, Delaware, United States).
We processed the FFPE colonic resection tumour tissues of three patients (diagnosed as indicated in
For overlap calculated from merged lists of peptide sequences, 27.1% of identified peptides were shared or overlapped between all the experimental conditions (
No substantial differences were observed for the physicochemical properties of the peptides for each patient (Supplementary document
The hydropathicity scales of all identified peptides generated from each experimental condition were similar (
Physicochemical properties of peptides extracted under the different experimental conditions at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. (a) Hydropathicity was based on Kyte and Doolittle’s (1982) GRand AVerage of hydropathY (GRAVY) scoring matrix. (b) Molecular weight. (c) Isolectric point (pI). (–PEG) refers to protein extracted without PEG and (+PEG) refers to protein extracted with PEG. Red boxplots refer to pellet samples extracted with PEG; Purple boxplots refer to pellet samples extracted without PEG; Blue boxplots refer to WCPL samples extracted with PEG; Green boxplots refer to WCPL samples extracted without PEG.
Overall, similar GO functional annotation profiles were obtained for all samples (
Gene ontology annotation profiles according to subcellular localisation for proteins identified from all samples and conditions at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. The average percentages of occurrence of GO terms for all three patients (per experimental group) are displayed with error bars showing standard deviation (significance level = 0.05). (–PEG) refers to protein extracted without PEG and (+PEG) refers to protein extracted with PEG. Red bar graphs refer to pellet samples extracted with PEG; Purple bar graphs refer to pellet samples extracted without PEG; Blue bar graphs refer to WCPL samples extracted with PEG; Green bar graphs refer to WCPL samples extracted without PEG.
All samples had a majority (> 80%) of fully cleaved peptides (0 missed cleavages), with approximately < 20% peptides with 1 missed cleavage, and approximately < 5% peptides with 2 missed cleavages (
Numbers of missed cleavages for all samples at the South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa, from August 2017 to July 2019. For each sample, the percentage of missed cleavages is shown. (–PEG) refers to protein extracted without PEG and (+PEG) refers to protein extracted with PEG.
In this present study, the samples processed using PEG in the protein extraction buffer had overall lower peptide and protein identifications. Using HeLa cells, Wiśniewski
The number of peptide (6840–7058) and protein (2302–2314) identifications reported here for the WCPLs fall within the range of previously published studies and are higher than those reported by Sprung
Our results indicate that the majority of proteins were extracted in the initial WCPLs. Therefore, the extraction buffer containing 2% SDS and the extraction protocol used was sufficiently efficient to extract the majority of proteins from the patient samples; the main differences occurred due to the addition of PEG to the extraction solution. Tanca
Trypsin digestion efficiency influences the molecular weight of peptides.
The current study had access to tissue samples that were not limited with regard to sample volumes and concentrations required for MS analysis compared to, for example, limited samples such as fine needle biopsies. Therefore, it was neither feasible nor cost-beneficial for us to determine the effects of PEG at < 10 µg protein, since this was not compatible with the material we had available, and did not fall within the scope of the present study or studies stemming from it.
Using FFPE human colorectal cancer resection tissue, we demonstrated that the addition of 0.5% PEG to protein extraction buffer resulted in overall lower peptide and protein identifications, compared to buffer without the addition of PEG. In addition, protein samples extracted without PEG showed higher reproducibility, and the addition of PEG to the protein extraction buffer generated lower percentages of unique peptides remaining in the sample pellets. By expanding on previous studies that only analysed FFPE animal tissues and human cells, we have demonstrated that high protein concentrations (> 10 µg) from FFPE human colon tissue also compromises the function of PEG. The data from this study, together with our recently published selection of protein purification protocols for different FFPE block ages,
We thank Prof. Gerhard Walzl for making available laboratory bench space at the Stellenbosch University Immunology Group, and Mrs Andrea Gutschmidt at the Stellenbosch University Immunology Group for her technical assistance and support during the use of their laboratory space. We also thank Mr Charles Gelderbloem and Mr Yunus Kippie at the University of the Western Cape’s Biotechnology and Pharmacology departments for their technical assistance and support during the project.
The authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article.
S.R., J.R. and A.C. designed the project. S.R. collected all samples, did the protein extraction work and drafted the first version of the manuscript. A.C. provided the funding for the project. L.B. performed the mass spectrometry experiment. H.B. assisted with the data analysis. The manuscript was written through contributions from all authors. All authors have given approval to the final version of the manuscript.
This work was supported by the South African Research Chairs Initiative of the Department of Science and Innovation and National Research Foundation of South Africa (grant UID 64751) and the South African Medical Research Council.
Mass spectrometry data (with identification results) were deposited to the PRIDE Archive (
The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors.