How the model works

Ten components, one ensemble, weights you can read.

palette’s scoring engine is ten independent components that each score every film, then learn how much to trust each one against your ratings. This page walks through what each component does, what your weights look like, and where the model is honest about being wrong.

Sample data on this page is the founder’s real Letterboxd profile — about a year of ratings. Yours will look different in every section.

Build your model

TL;DR

palette scores every film ten different ways — by genre, director, cast, mood, sentiment, and more — then learns which ways predict your ratings best and blends them. The result is a star prediction with a per-genre confidence bar, so you can see where the model is sure of itself and where it isn’t. The rest of this page is the long version, in case you want to see the math.

§1How it works

Ten components, one ensemble.

Each component scores every film independently. The engine learns how much to trust each one against your ratings, then blends them.

Ensemble similarity

Cosine Similarity

How closely a film's overall feature signature matches the average of films you've rated highly.

Film-pair correlations

Pairwise Film Similarity

How often this film tends to be rated alongside other films you've rated.

Genre and mood match

Genre-Mood Affinity

Whether the film's genre + mood combination matches the patterns in your top-rated films.

Combined attribute signal

Feature Interactions

How cross-products of director, genre, decade, and other attributes line up with your taste.

Your taste cluster

K-Means Clustering

Which of seven taste clusters your ratings put you in, and how the film fits that cluster.

Similar-taste audience

Collaborative Filtering

How viewers with similar feature vectors to yours have rated this film.

Review sentiment

Sentiment Analysis

How positively the film's reviews score on average, weighted by your sensitivity to critics.

Franchise pattern

Franchise Awareness

Whether you tend to rate films in the same franchise consistently.

Plot and keyword match

TF-IDF Text Similarity

How the film's overview and TMDB keywords overlap with films you've loved.

Screenplay similarity

Script Direction (LSA)

Latent screenplay-direction similarity to a 224-screenplay reference corpus.

§2Your weights — sample

Here's what mine looks like.

This is the founder's profile — a year of Letterboxd ratings. Your model will land on a different shape.

Similar-taste audience
Collaborative Filtering
43.3%
Combined attribute signal
Feature Interactions
26.2%
Screenplay similarity
Script Direction (LSA)
17.9%
Review sentiment
Sentiment Analysis
8.2%
Plot and keyword match
TF-IDF Text Similarity
2.0%
Ensemble similarity
Cosine Similarity
< 0.5%
Film-pair correlations
Pairwise Film Similarity
< 0.5%
Genre and mood match
Genre-Mood Affinity
< 0.5%
Your taste cluster
K-Means Clustering
< 0.5%
Franchise pattern
Franchise Awareness
< 0.5%

Floor-pinned (5)

These components are at the weight floor — they haven’t earned weight in your model yet. We don’t hide them, because hiding them would lie about how the model works.

Ensemble similarity: sparse at your current rating count — gets stronger as you rate more films.
Film-pair correlations: sparse at your current rating count — gets stronger as you rate more films.
Genre and mood match: your rated films don't yet show a strong genre-and-mood pattern this signal can lean on.
Your taste cluster: your ratings sit on a cluster boundary — the assignment isn't strong enough yet to weight.
Franchise pattern: this profile doesn't have enough multi-film franchises rated for the pattern to stabilize.

Your weights, your model

When you upload your Letterboxd export, the engine reweights itself against your ratings. Some components will climb; others will sit at the floor.

§3What shaped this — sample

Where the model picked up its strongest signals on this profile.

Top directors and genres ranked by their per-film average vs your overall average. Anti-preferences are where you rate noticeably below your own baseline.

Top 5 directors

vs avg 3.61★

Francis Ford Coppola
2 films
5.00★
+1.39
Damien Chazelle
2 films
4.75★
+1.14
Federico Fellini
6 films
4.42★
+0.81
Martin Scorsese
4 films
4.25★
+0.64
Andrei Tarkovsky
2 films
4.25★
+0.64

Top 5 genres

vs avg 3.61★

Drama
15 films
4.83★
+1.22
Comedy
15 films
4.27★
+0.66
Crime
15 films
4.13★
+0.52
Romance
15 films
3.93★
+0.32
Thriller
15 films
3.93★
+0.32

3 anti-preferences

below your average

playful
5 films · mood
3.00★
-0.61
colorful
6 films · mood
3.08★
-0.53
atmospheric
5 films · mood
3.10★
-0.51

Your top weights

These shift with every resync. Add 50 ratings and a director who didn't make this profile's top 5 might lead yours.

§4Where the model fails — sample

The honest gaps on this profile.

Vulnerability is the differentiator. The five biggest misses — predicted vs actual — and the dimensions where the model has too few data points to be confident.

Worst 5 predictions

MAE 0.485

Low-confidence dimensions

sparse buckets (≤ 3 films)

Genres

Western·2
TV Movie·2
Documentary·2

Countries

Canada·1
Belgium·1
Australia·1
Hong Kong·1
China·2

Languages

de·1
fr·2
ru·2

Where your model will be honest

Every taste profile has gaps. We surface them — your worst predictions, your sparsest dimensions — instead of hiding them behind a single accuracy number.

§5How it updates

Every resync retrains.

Your model isn’t static. Every time you re-upload your Letterboxd export, the engine reruns the full ensemble against your latest ratings.

The component weights you saw in §2 are not fixed coefficients — they’re learned each time, by testing how well each component’s predictions match the films you’ve actually rated. Add 50 more ratings, and your top weight could shift. A director you’ve seen twice and rated 3.5★ will mean less; the director you’ve seen ten times and consistently rate 4.5★ will pull harder.

Floor-pinned components stay pinned until they have enough signal to lift off the floor. That happens through three paths: more ratings from you (densification), more films in the shared catalog (the corpus that drives collaborative filtering), or new enrichment fields shipping (sentiment is the next one — pending).

The model version below your dashboard’s last-sync timestamp tells you which engine produced your current scores. We bump it when the underlying ensemble changes.

Using the picker

Filters group like films together, and the picker tells you when you've cut too narrow.

Once the model has scored every film, you spin against the pool from /picker. The filters and the spin behavior are the layer that turns predictions into a watchable answer for tonight.

Genres and moods, grouped

The Genre and Mood dropdowns now show 11 and 8 broad buckets respectively, down from 19 TMDB labels and 25 mood tokens. Crime, Thriller, and Mystery cluster as “Crime & Thriller.” The 25 mood tokens collapse into Dramatic, Heartfelt, Funny, Exciting, Dark, Tense, Dreamy, Cerebral. The underlying TMDB data is unchanged — only the picker's presentation is grouped, so multi-filter combinations have meaningful depth at this catalog size.

Tells you which filter is too narrow

When the pool drops below 10 films and at least two filters are active, the picker runs a counterfactual on each active filter, identifies the most-restrictive one, and suggests dropping that specific filter to expand the pool. The Spin button hard-disables below four films — too few to draw a meaningful set.

Switching Source no longer leaves filters orphaned

If you pick a director in All films, then switch to Watchlist where that director has no films, the picker auto-clears the orphan selection with a brief notice rather than leaving you stuck at a zero-film pool. Same for Genre, Mood, and Country.

Watched-but-unrated films stay excluded

Films you marked watched on Letterboxd without rating or diary-entering them used to leak into the “New to me” bucket as ghost recommendations. The picker now reads the fullwatched.csvfrom your export so those films are correctly excluded from the candidate pool, like the rated ones already were.

These changes shipped 2026-05-10. The 31-test pool.ts regression suite covers the merge invariants — including the integration smoke that catches the asymmetric-predicate bug class.