Layrs reposted this
Imagine you're in a L5 system design interview at Google (1Cr+ role) and the interviewer asks: How does YouTube detect copyrighted music inside millions of uploaded videos daily? This is a classic large scale content matching problem. Btw, if you’re preparing for system design/coding interviews, check out our mock interview tool. You can use it for free here: https://lnkd.in/gpCn7t2T [1] Clarify what we actually need to detect This is not “understand the whole video.” It is: - detect whether uploaded audio matches known copyrighted audio - work even if the song is trimmed, compressed, pitch shifted a bit, or mixed with speech - do it at upload scale with low latency - allow some false positives/negatives, but keep them low enough for policy actions So the core problem is robust audio fingerprint matching, not full ML understanding. [2] High level approach Use a fingerprinting pipeline plus a distributed lookup system. - rights holders upload reference tracks - system converts each track into compact audio fingerprints - every uploaded video goes through the same fingerprinting pipeline - fingerprints are matched against a massive index - if similarity crosses a threshold, trigger a claim/review flow This is basically Content ID style thinking. [3] Ingestion and fingerprint generation For both reference songs and uploaded videos: - extract audio track from video - normalize audio a bit to reduce noise from encoding differences - split into small time windows - convert each window into spectral features using FFT/spectrogram style processing - generate fingerprints from stable peaks, not raw waveform Why fingerprints? - much smaller than raw audio - robust to compression and small edits - fast to search at scale [4] Matching phase Now match upload fingerprints against the copyright index. - hash fingerprint features into an inverted index - retrieve candidate songs with overlapping hashes - align matches by time offset - if many fingerprints line up with a consistent offset, it is likely the same song That offset consistency is important. Random overlaps happen. Real matches line up over time. [5] Scale and system design details - shard fingerprint index by hash range - process uploads asynchronously through a queue - parallelize fingerprint extraction across workers - cache hot tracks because popular songs get matched often - store only compact hashes plus metadata, not huge raw blobs in the serving path For millions of uploads, this becomes a distributed search problem over fingerprints.