Single cell proteomics Rescoring and comparison to single cell transcriptomics
Topic
Mass spectrometry (MS)–based single-cell proteomics (SCP) is an emerging technology that promises unprecedented biological resolution, but introduces substantial computational and methodological challenges compared to bulk proteomics. These include limited throughput, small sample amounts, lower spectral quality, increased missingness, reduced protein coverage, and strong technology-specific biases.
In this practical course, students will investigate how established proteomics workflows—specifically Oktoberfest and its associated model ecosystem—perform when applied to SCP data. The focus lies on evaluating rescoring strategies under these constrained conditions and on extending the workflow to improve quality control, peak annotation, and performance assessment.
To place SCP results in a broader single-cell context, students will compare SCP and scRNA-seq data using the scp R package and Scanpy, assessing similarities and differences in biological insight across omics types. Since scp was primarily developed and optimized for earlier SCP technologies (e.g. SCoPE2), students will critically evaluate its limitations with respect to newer approaches such as label-free SCP, and implement missing or extended functionality where needed. This may involve bridging R- and Python-based workflows.
Building on this analysis, Oktoberfest-based rescoring will be applied to assess how improved peptide identification affects downstream protein quantification and biological interpretation, through systematic comparisons of scp / Scanpy results before and after rescoring. Overall, students will contribute to evaluating how far existing proteomics pipelines can be pushed toward modern single-cell applications, while gaining hands-on experience with realistic, limited data scenarios.

Aim
Applying Rescoring to Single-cell proteomics datasets
- learn about challenges of SCP and apply Oktoberfest-based rescoring workflows
- extend Oktoberfest with quality control mechanisms for peak annotation and prediction
- evaluate performance under small sample and limited spectral quality
- check applicability of different SCP technologies (e.g. label-free, multiplexed).
Cross-Omics Comparison and SCP Workflow Extensions
- compare biological insights from SCP and scRNA-seq data using the scp R package and Scanpy
- assess how improved peptide identification via Oktoberfest rescoring affects downstream protein quantification and interpretation
- establish technology-aware data processing beyond existing scp functionalities
- assess potential bridging R- and Python-based workflows (scanpy extensions for SCP)
General Schedule
Phase 1: Methods, Tools, Techniques
The first phase consists of a series of seminars in which you will learn the basics of various topics necessary for the project. The seminars will be a mix of presentations by team members of our research group, practical sessions where applicable, and short presentations prepared by the participants:
- Kickoff Seminar: Introduction to the course structure, organizational aspects, and overall project goals. Students will get an overview of the project and possible focus areas corresponding to the two aims.
- Proteomics Beyond Bulk: An introduction to general MS-based proteomics SCP technologies, experimental design, and computational approaches for peptide identification and quantification.
- Oktoberfest in Practice: Detailed walkthrough of the Oktoberfest pipeline, including preprocessing of MS data, Prosit-derived peptide property prediction, rescoring, FDR control, and requantification, including potential implications for SCP.
- Additional topics: In case you want to get deeper knowledge we are open to holding an additional seminar with a topic of your choice. You decide.
Phase 2: Research project planning
In the second phase, we want you to prepare a detailed project plan. At the end of this phase, you will present your plan and discuss it with us. We will assist you during the planning of your project and provide you with feedback to ensure that you are able to bring your project to a success. Most importantly, you should discuss the following points:
- Focus Area Selection: Agreement on which datasets, technologies, and analysis aspects will be addressed.
- Requirement Analysis: Definition of research questions, pipeline extensions, and QC metrics to be implemented.
- Organization: Milestones, task distribution, and time planning. Students are encouraged to use project management tools. Communication will be conducted via Slack.
- Frameworks and Languages: The project primarily uses Python (NumPy, Pandas, SciPy, Matplotlib), with optional use of R in context of the scp package, visualization or analysis.
Phase 3: Implementation and Research
This is the main phase of your project. According to your plan, you will implement, integrate, and test your work according to the plan. We will hold weekly progress meetings to discuss your progress.
- Semester Work: Students are expected to work throughout the semester. On-site work is encouraged but not mandatory; virtual participation is possible.
- Full-Time Block: Depending on progress, an optional intensive block of two to three weeks may be scheduled. The specific time and requirements will be discussed with you.
- Submission: Deliverables include implemented code, documentation, benchmarking results, and a written report. Students will present their work in a final presentation.
Skills Gained
- Practical understanding of single-cell proteomics data
- Experience applying and evaluating proteomics workflows under challenging conditions
- Strong programming and data analysis skills
- Insight into quality control and benchmarking
- Experience with collaborative, research-oriented software development
Organisation
Programming Language: Python, R
Must-have Skills:
- Intermediate Python programming
- Basic understanding of data analysis
- Interest in proteomics and applied research
- Interest in scientific software development
Good-to-have Skills:
- Software engineering best practices
- Experience with MS proteomics data formats
- Git/GitHub experience
- Basic R programming
Supervisors:
- Sonja Stockhaus - primary
- Mario Picciani - primary
- Mathias Wilhelm - PI
Grading:
- Presentation [max. 30 minutes, whole team]
- Report [~10 pages, including all sections]
- track record of indivual contributions must be supplied.
Team Size: 3-4 participants
Submission:
- Git repository with code and documentation
- A comprehensive report detailing the project
Location/Rooms: Flexible; preferred on-site in Freising/Weihenstephan with virtual options. Full-time block on site if required. Lecture preferably in the afternoon (tentative Thursday 16-18), project meetings preferably at the same time. However, we are flexible with respect to the day and will decide with your input. We also have a student room that you can use but you may also work from home if you so desire. We would like to welcome you to our institute. The full-time block will take place - if required - in Freising/Weihenstephan. The exact weeks will be decided upon once we know if it is required. Online participation is possible for all parts except the full-time block.
Material
All materials are made available in TUM Moodle.
Literature
- Picciani et al., Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 24 (8) 2300112 (2024). https://doi-org.tum-eaccess.de/10.1002/pmic.202300112
- Gabriel et al., Prosit-TMT: Deep Learning Boosts Identification of TMT-Labeled Peptides. Analytical Chemistry 94 (20), 7181-7190 (2022) https://pubs-acs-org.tum-eaccess.de/doi/10.1021/acs.analchem.1c05435
- Steen, Mann, The abc's (and xyz's) of peptide sequencing. Nat Rev Mol Cell Biol 5, 699–711 (2004). https://doi-org.tum-eaccess.de/10.1038/nrm1468
- Vanderaa, Gatto, The current state of single-cell proteomics data analysis. Current Protocols 3, e658 (2023). https://doi-org.tum-eaccess.de/10.1002/cpz1.658 (https://uclouvain-cbio.github.io/scp/index.html)
- Wolf et al., SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018). https://doi-org.tum-eaccess.de/10.1186/s13059-017-1382-0