Modification profiles in large-scale HeLa, HEK293, and TNBC shotgun proteomics experiments. (a) Examples of common modifications showing differences in modification rates. (b) Examples of abundant modifications that were unique to particular experiments. © Examples of abundant mass features where the mass difference could not be effectively localized.

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics


There is a need to better understand and handle the ‘dark matter’ of proteomics-the vast diversity of post-translational and chemical modifications that are unaccounted in a typical mass spectrometry-based analysis and thus remain unidentified. We present a fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables a more than 100-fold improvement in speed over most existing proteome database search tools. Using several large proteomic data sets, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in modification rates across experimental samples and conditions. We further illustrate its utility using protein-RNA cross-linked peptide data and using affinity purification experiments where we observe, on average, a 300% increase in the number of identified spectra for enriched proteins. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics.

In Nature Methods