Skip to main content
Build Log

How Palette is being built.

Short entries, newest first. Decisions, ships, and things that didn't work. Lightly polished.

  1. Learning

    Deferred decisions

    Two I'm not going to resolve this week.

    Sentiment analysis from TMDB reviews is the tenth ensemble component and it's currently running at the 0.004 weight floor. Adding real sentiment is roughly a week of work and the payoff is small — it's a low-weight component in the ensemble. I'd rather spend that week on onboarding or the film picker.

    The frequent-actor filter threshold is hardcoded at `c >= 2`, which is right for about 120 rated films but would explode at 3000. The fix is a corpus-scaled threshold, something like `max(2, len(rated) // 50)`. I'm not going to do it until the problem actually shows up. The scoring engine is frozen, and anything that changes its inputs has to be handled carefully. I'd rather wait and see real data force the choice than pre-optimize for a scale I'm not at yet.

  2. Audit

    What the design audit caught

    Pretty exhaustive — 74 findings across four audits (design-system, accessibility, ux-copy, design-critique). 23 fixed in the audit session, most of the rest parked for a later pass. Two findings worth flagging.

    The pipeline duration copy was off by 7-10x on three pages. It's the kind of thing you only catch when someone reads the UI carefully. That's the value of an audit — not "does this look nice" but "does what this says match what's actually happening."

    On accessibility, the site is at AA-compliant contrast across the board, minus one borderline eyebrow color that passes AA Large. Drill modals have proper focus-trap and focus-restore, and every chart now has a screen-reader text summary describing its top-line finding.

  3. Design

    Why FastAPI + Celery + Supabase

    The recommender started as a single HTML artifact — one user, one upload, one render. That's fine for a demo but it doesn't handle real usage. The migration is the whole reason there's a backend at all.

    FastAPI handles the HTTP surface because it's boring and fast and Pythonic enough that the scoring code and the web code look like they belong together. Celery handles the actual pipeline (parse → enrich → train → generate) because scoring takes minutes, not milliseconds, and it has to run asynchronously from the upload. Supabase handles auth, storage, and Postgres all in one, which matters because I'm solo and don't want to run three services when one will do.

    That said, this stack has tradeoffs. Supabase's free tier caps out around 8GB; if palette ever gets real users I'll have to pay or move. The Celery worker is currently a single container, which is fine for now but won't scale past a handful of concurrent uploads. These are future-me problems.

  4. Learning

    Why ten components

    The short answer is that no single approach captures what "liking a movie" actually means. Cosine similarity is good at finding films that feel like other films you rated highly. Collaborative filtering catches what people with similar taste watched. TF-IDF on overviews and keywords picks up vocabulary-level resonance. Franchise awareness figures out that if you gave The Empire Strikes Back five stars, you're probably not going to hate Return of the Jedi. Each one is weak on its own and catches something the others miss.

    The other reason is honesty about failure modes. A single-component recommender can be wrong in a confident way. Ten components with adaptive weighting let me show you WHERE the model is confident, which I think is more useful than a score on its own. That's why the Model Accuracy panel exists — I'd rather see "palette tends to miss on war films" than a flat 4.2 with no uncertainty attached.

← Back to the demo