lintian-brush: improve performance by caching parsed ASTs
Currently, lintian-brush re-parses Debian control files for each fixer that runs, which is inefficient when multiple fixers need to modify the same files (e.g., debian/control, debian/changelog). This project aims to implement a Workspace trait that provides access to common files with their parsed AST (Abstract Syntax Tree) representations, preserving these between fixers and only serializing to disk when needed. This relates to issues #5 and #8 on the issue tracker.
Some of this work will need to happen in debian-analyzer crate. This work would also allow use of the fixers from the debian-lsp.
Confirmed Mentor: JelmerVernooij
How to contact the mentor: mail, IRC/Matrix
Confirmed co-mentors: Otto Kekalainen
Difficulty level: Medium
Project size: 175 hours
Deliverables of the project:
- A Workspace trait in the debian-analyzer crate that provides cached access to parsed Debian control files (debian/control, debian/changelog, debian/copyright, debian/watch, etc.)
- Integration of the Workspace trait into lintian-brush's fixer framework, allowing fixers to access pre-parsed ASTs
- Lazy serialization mechanism that only writes files to disk when they've been modified
- Performance benchmarks demonstrating improvement in multi-fixer scenarios
- Documentation and examples for fixer authors on how to use the Workspace API
- Integration tests ensuring correctness when multiple fixers modify the same files
Desirable skills: Knowledge of Debian packaging, Rust programming (particularly traits and ownership patterns), experience with parser design and AST manipulation
What the intern will learn:
- Advanced Rust programming including trait design, lifetime management, and performance optimization
- Deep understanding of Debian package formats and control files
- Parser design patterns and AST manipulation techniques
- How Language Server Protocols (LSP) work and how to design APIs suitable for LSP integration
Application tasks:
- Write a simple Rust program that parses debian/control using the debian-control crate and prints out all binary package names
Profile lintian-brush on a sample Debian package and identify how many times the same files are parsed (hint: use cargo build --release and tools like perf or strace)
- Review the existing fixer code in lintian-brush/src/fixers/ and identify 2-3 fixers that operate on the same files
Related projects:
?https://salsa.debian.org/jelmer/lintian-brush, debian-analyzer, deb822-lossless, debian-changelog, debian-copyright
AI usage Policy: AI code assistance is acceptable for exploration and learning, but the intern is expected to make all the code changes.
