Ensemble similarity
Cosine SimilarityHow closely a film's overall feature signature matches the average of films you've rated highly.
palette’s scoring engine is ten independent components that each score every film, then learn how much to trust each one against your ratings. This page walks through what each component does, what your weights look like, and where the model is honest about being wrong.
Sample data on this page is the founder’s real Letterboxd profile — about a year of ratings. Yours will look different in every section.
Build your modelpalette scores every film ten different ways — by genre, director, cast, mood, sentiment, and more — then learns which ways predict your ratings best and blends them. The result is a star prediction with a per-genre confidence bar, so you can see where the model is sure of itself and where it isn’t. The rest of this page is the long version, in case you want to see the math.
Each component scores every film independently. The engine learns how much to trust each one against your ratings, then blends them.
How closely a film's overall feature signature matches the average of films you've rated highly.
How often this film tends to be rated alongside other films you've rated.
Whether the film's genre + mood combination matches the patterns in your top-rated films.
How cross-products of director, genre, decade, and other attributes line up with your taste.
Which of seven taste clusters your ratings put you in, and how the film fits that cluster.
How viewers with similar feature vectors to yours have rated this film.
How positively the film's reviews score on average, weighted by your sensitivity to critics.
Whether you tend to rate films in the same franchise consistently.
How the film's overview and TMDB keywords overlap with films you've loved.
Latent screenplay-direction similarity to a 224-screenplay reference corpus.
This is the founder's profile — a year of Letterboxd ratings. Your model will land on a different shape.
These components are at the weight floor — they haven’t earned weight in your model yet. We don’t hide them, because hiding them would lie about how the model works.
When you upload your Letterboxd export, the engine reweights itself against your ratings. Some components will climb; others will sit at the floor.
Log in to see your own →Top directors and genres ranked by their per-film average vs your overall average. Anti-preferences are where you rate noticeably below your own baseline.
These shift with every resync. Add 50 ratings and a director who didn't make this profile's top 5 might lead yours.
Log in to see your own →Vulnerability is the differentiator. The five biggest misses — predicted vs actual — and the dimensions where the model has too few data points to be confident.
Every taste profile has gaps. We surface them — your worst predictions, your sparsest dimensions — instead of hiding them behind a single accuracy number.
Log in to see your own →Your model isn’t static. Every time you re-upload your Letterboxd export, the engine reruns the full ensemble against your latest ratings.
The component weights you saw in §2 are not fixed coefficients — they’re learned each time, by testing how well each component’s predictions match the films you’ve actually rated. Add 50 more ratings, and your top weight could shift. A director you’ve seen twice and rated 3.5★ will mean less; the director you’ve seen ten times and consistently rate 4.5★ will pull harder.
Floor-pinned components stay pinned until they have enough signal to lift off the floor. That happens through three paths: more ratings from you (densification), more films in the shared catalog (the corpus that drives collaborative filtering), or new enrichment fields shipping (sentiment is the next one — pending).
The model version below your dashboard’s last-sync timestamp tells you which engine produced your current scores. We bump it when the underlying ensemble changes.
Once the model has scored every film, you spin against the pool from /picker. The filters and the spin behavior are the layer that turns predictions into a watchable answer for tonight.
The Genre and Mood dropdowns now show 11 and 8 broad buckets respectively, down from 19 TMDB labels and 25 mood tokens. Crime, Thriller, and Mystery cluster as “Crime & Thriller.” The 25 mood tokens collapse into Dramatic, Heartfelt, Funny, Exciting, Dark, Tense, Dreamy, Cerebral. The underlying TMDB data is unchanged — only the picker's presentation is grouped, so multi-filter combinations have meaningful depth at this catalog size.
When the pool drops below 10 films and at least two filters are active, the picker runs a counterfactual on each active filter, identifies the most-restrictive one, and suggests dropping that specific filter to expand the pool. The Spin button hard-disables below four films — too few to draw a meaningful set.
If you pick a director in All films, then switch to Watchlist where that director has no films, the picker auto-clears the orphan selection with a brief notice rather than leaving you stuck at a zero-film pool. Same for Genre, Mood, and Country.
Films you marked watched on Letterboxd without rating or diary-entering them used to leak into the “New to me” bucket as ghost recommendations. The picker now reads the fullwatched.csvfrom your export so those films are correctly excluded from the candidate pool, like the rated ones already were.
These changes shipped 2026-05-10. The 31-test pool.ts regression suite covers the merge invariants — including the integration smoke that catches the asymmetric-predicate bug class.