[GSoC 2026] Interest in Idea #7: Automated Code Hygiene & Security Hardening

Hi @asmorkalov and Gursimar Singh,

I’m Zihan Dai, a CS student at the University of Melbourne. I recently had PR #28502 merged in OpenCV (the erode/dilate docs fix; thanks @asmorkalov for the review), and I’ve had 7 PRs merged across Apache Beam and ShardingSphere around resource leaks, error handling, and build reliability.

Current CI Overview

I spent time reading the current CI before posting:

  • PR-4.x.yaml fans PRs out to: Linux, Windows, ARM64, ARM64 Debug, OpenVINO, CUDA (label-gated), macOS x64/ARM64/Vulkan, iOS, Android SDK, TIM-VX, docs, RISC-V Clang
  • OCV-PR-Linux.yaml builds Ubuntu 20.04/22.04/24.04 and runs test-plan-4.x.json (core, imgproc, imgcodecs, objdetect, videoio, dnn, perf)

What I did not find:

  • Active sanitizer coverage — the only ASan hook is a commented-out 24.04_asan entry in OCV-PR-Linux.yaml
  • No live UBSan/MSan/TSan jobs
  • OPENCV_ENABLE_MEMORY_SANITIZER exists in CMake but is not wired into CI
  • Static analysis is limited to scheduled CodeQL (OCV-CodeQL.yaml) + compiler warnings via warnings-handling.py
  • No repo-level .clang-tidy or cppcheck config

CVE-Based Target Selection

Malformed-input bugs cluster in parser-heavy code:

Module CVEs Issue Type
imgcodecs CVE-2017-12601, CVE-2017-12863, CVE-2018-5268 BMP/PXM/JPEG2000 buffer overflows
objdetect CVE-2019-14491, CVE-2019-14492 OOB access in cascadedetect.hpp
core/persistence* CVE-2019-14493, CVE-2019-5063, CVE-2019-5064 XML/JSON corruption, null deref
video/dis_flow CVE-2019-19624 Small-image OOB reads
wechat_qrcode (contrib) CVE-2023-2617, CVE-2023-2618 Decoder null deref/leak

12-Week Plan

Weeks 1-3 — Audit & Tooling Setup:
Wire ASan+UBSan into a Linux PR lane (unblock the commented 24.04_asan entry). Set up a .clang-tidy config targeting the high-CVE modules. Run initial passes, triage by severity.

Weeks 4-7 — Fix Wave 1 (Sanitizer Findings):
Fix ASan/UBSan hits in parser-heavy code (BMP/PXM/JPEG2000 decoders, XML/JSON persistence, cascade detector). Each fix as a PR with reproducer test. Get MSan running as nightly for core and imgproc.

Weeks 8-10 — Fix Wave 2 (Static Analysis):
Fix clang-tidy/clang-analyzer findings (resource leaks, bounds violations, null derefs). Tune check profile to suppress false positives on hot paths. Add TSan as nightly for videoio/highgui threading.

Weeks 11-12 — CI Hardening & Documentation:
Stabilize sanitizer and static analysis jobs in CI. Write contributor docs (how to run locally, add checks, interpret findings). Benchmark CI runtime impact. Blog post.

Proposed clang-tidy Profile

Low-noise checks for the target modules:

  • cppcoreguidelines-pro-bounds-pointer-arithmetic
  • cppcoreguidelines-pro-bounds-constant-array-index
  • bugprone-sizeof-expression, bugprone-too-small-loop-variable, bugprone-narrowing-conversions
  • clang-analyzer-unix.Malloc, clang-analyzer-cplusplus.NewDeleteLeaks, clang-analyzer-core.NullDereference
  • modernize-use-nullptr, modernize-use-override (non-hot-path only)
  • performance-unnecessary-value-param, performance-for-range-copy (non-hot-path only)

In parallel: ASan+UBSan in PR lane, MSan/TSan as narrower nightly jobs.

Questions

  1. @asmorkalov**:** OCV-PR-Linux.yaml already has a commented 24.04_asan lane and runner.py recognizes ASan aborts. Was that disabled because of flaky dependencies/runtime cost, or is it simply unfinished?
  2. CodeQL is scheduled from .github/workflows/4.x.yml rather than PR-gated. Would you want this project to keep CodeQL nightly-only and focus on sanitizers/clang-tidy, or is moving a reduced CodeQL profile into PR signal also in scope?

Hi @asmorkalov, just following up on this. I’ve since submitted my GSoC proposal for Apache IoTDB and contributed PRs to Debezium and Eclipse SW360 as well. I’m still very interested in Idea #7 if it’s available. Happy to discuss the approach further.

I think you are in the wrong place here. GSoC has never been organized through this forum.

this is what contributors are supposed to do: