MTG LLM Similar Card Search

Published

April 26, 2025

This is my first time using LLM embeddings for a program. The goal is to get magic cards similar to a card I’m searching for, which I do quite often.

Magic cards are quite similar. For example:

Llanowar Elves

Elvish Mystic

Fyndhorn Elder

I’d like to use an embedding to

  1. Search for a card by name
  2. Find cards similar to that

I used an off-the-shelf model (all-MiniLM-L6-v2), downloaded card data from scryfall, then encoded cards like so:

# format
name|type_line|mana_cost|oracle_text

# example
'Searslicer Goblin'|'Creature — Goblin Warrior'|'{1}{R}'|'Raid — At the beginning of your end step, if you attacked this turn, create a 1/1 red Goblin creature token.'

I used Chat-GPT to scaffold the data processing and encoding scripts.

In other words, the model will turn that string into a vector. Ideally, similar strings (i.e. cards) will be near eachother.

Here I set everything up. I wanted to try using duckdb to query the embeddings.

Code
import pandas as pd
import duckdb
import numpy as np
import json
from pathlib import Path
from sentence_transformers import SentenceTransformer

# Load files
data_dir = Path('../../.data')
embeddings = np.load(data_dir/"embeddings.npy")
with open(data_dir/"id_map.json") as f:
    id_map = json.load(f)

# load model
model = SentenceTransformer("all-MiniLM-L6-v2")

# data prep
emb_dim = embeddings.shape[1]
card_embedding = (
    pd.DataFrame([
        {"id": card["id"], "embedding": vec.tolist()}
        for card, vec in zip(id_map, embeddings)
    ])
)
parquet_file=data_dir / 'oracle-cards-20250425090226.parquet'

# database
db = duckdb.connect(":memory:")
db.execute(f""" 
INSTALL vss;
LOAD vss;

create table card_embedding as 
    select 
        id
        , embedding::DOUBLE[{emb_dim}] as embedding 
    from card_embedding
;

create table card as select * from '{parquet_file}';
""")

# how many cards?
db.sql('select count(*) as num_cards from card inner join card_embedding using(id)')
┌───────────┐
│ num_cards │
│   int64   │
├───────────┤
│     30189 │
└───────────┘

First, I need the ability to find the card I’m looking for in the database. I’ll use the embedding for that.

Forgive any sloppiness. I wanted to focus on results, not code.

Code
def search_for_card(conn, model, search_term, limit=1):
    # sentence to vector
    search_vec = model.encode([search_term])[0]
    emb_dim = len(search_vec)
    search_vec_str = "ARRAY[" + ",".join(f"CAST({x} AS DOUBLE)" for x in search_vec) + "]"

    # find similar vector
    return conn.sql(f"""
    select 
        id             
        , card.name
        , array_distance(embedding, {search_vec_str}::DOUBLE[{emb_dim}]) AS dist
    from card_embedding 
    left join card using(id)
    order by dist
    limit {limit}
    """)

def fetch_top_search_id(conn, model, search_term):
    return search_for_card(conn, model, search_term, limit=1).fetchone()[0]

assert fetch_top_search_id(db, model, 'llanowar elves') == '6a0b230b-d391-4998-a3f7-7b158a0ec2cd'

Now I want to find similar cards given an id.

Code
def query_closest_cards_to_id(conn, id, limit=9):
    ref_vec = (
        conn.sql(f"select embedding from card_embedding where id='{id}'")
        .fetchone()[0]
    )
    emb_dim = len(ref_vec)
    ref_vec_str = "ARRAY[" + ",".join(f"CAST({x} AS DOUBLE)" for x in ref_vec) + "]"

    return conn.sql(f"""
        SELECT 
            card.name
            , card.mana_cost
            , card.oracle_text
            , array_distance(embedding, {ref_vec_str}::DOUBLE[{emb_dim}]) AS dist
        FROM card_embedding
        left join card using(id)
        ORDER BY dist asc
        LIMIT {limit}
    """)

query_closest_cards_to_id(db, '6a0b230b-d391-4998-a3f7-7b158a0ec2cd')
┌───────────────────┬───────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────┐
│       name        │ mana_cost │                                                    oracle_text                                                     │         dist         │
│      varchar      │  varchar  │                                                      varchar                                                       │        double        │
├───────────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ Llanowar Elves    │ {G}       │ {T}: Add {G}.                                                                                                      │ 4.58366263816961e-17 │
│ Llanowar Tribe    │ {G}{G}{G} │ {T}: Add {G}{G}{G}.                                                                                                │   0.4155691829580547 │
│ Fyndhorn Elves    │ {G}       │ {T}: Add {G}.                                                                                                      │    0.555573984277023 │
│ Greenweaver Druid │ {2}{G}    │ {T}: Add {G}{G}.                                                                                                   │   0.6204260670261244 │
│ Elvish Mystic     │ {G}       │ {T}: Add {G}.                                                                                                      │    0.624662550035501 │
│ Wirewood Elf      │ {1}{G}    │ {T}: Add {G}.                                                                                                      │   0.6288179733631174 │
│ Llanowar Mentor   │ {G}       │ {G}, {T}, Discard a card: Create a 1/1 green Elf Druid creature token named Llanowar Elves. It has "{T}: Add {G}." │   0.6507426519955598 │
│ Llanowar Dead     │ {B}{G}    │ {T}: Add {B}.                                                                                                      │   0.6539462847478972 │
│ Urborg Elf        │ {1}{G}    │ {T}: Add {B}, {G}, or {U}.                                                                                         │   0.6550689705646989 │
└───────────────────┴───────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────┘

Gluing it all together:

Code
def query_closest_cards_to_search(conn, model, search_term, limit=9):
    id = fetch_top_search_id(conn, model, search_term)
    return query_closest_cards_to_id(conn, id, limit=limit)

query_closest_cards_to_search(db, model, "llanowar elves")
┌───────────────────┬───────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────┐
│       name        │ mana_cost │                                                    oracle_text                                                     │         dist         │
│      varchar      │  varchar  │                                                      varchar                                                       │        double        │
├───────────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ Llanowar Elves    │ {G}       │ {T}: Add {G}.                                                                                                      │ 4.58366263816961e-17 │
│ Llanowar Tribe    │ {G}{G}{G} │ {T}: Add {G}{G}{G}.                                                                                                │   0.4155691829580547 │
│ Fyndhorn Elves    │ {G}       │ {T}: Add {G}.                                                                                                      │    0.555573984277023 │
│ Greenweaver Druid │ {2}{G}    │ {T}: Add {G}{G}.                                                                                                   │   0.6204260670261244 │
│ Elvish Mystic     │ {G}       │ {T}: Add {G}.                                                                                                      │    0.624662550035501 │
│ Wirewood Elf      │ {1}{G}    │ {T}: Add {G}.                                                                                                      │   0.6288179733631174 │
│ Llanowar Mentor   │ {G}       │ {G}, {T}, Discard a card: Create a 1/1 green Elf Druid creature token named Llanowar Elves. It has "{T}: Add {G}." │   0.6507426519955598 │
│ Llanowar Dead     │ {B}{G}    │ {T}: Add {B}.                                                                                                      │   0.6539462847478972 │
│ Urborg Elf        │ {1}{G}    │ {T}: Add {B}, {G}, or {U}.                                                                                         │   0.6550689705646989 │
└───────────────────┴───────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────┘

Eyeball Evaluation

This ordering is not perfect. For example:

  • Llanowar Tribe│ {G}{G}{G} │ {T}: Add {G}{G}{G}. is the “top” result
  • Fyndhorn Elves │ {G} │ {T}: Add {G}. and other cards functionally identical to Llanowar Elves are rated lower.

I think this happens because I encode the name of the card in the embedding, which I chose to make search easier.

  • Perhaps I should make two embeddings if my use-case is scoped this narrowly.
  • The tradeoff is that I would need another API to do generalized search e.g. “{T}: Add {G}”

Extending

We can make the query more flexible to account for different contexts. With this we combine the power of traditional querying with vector search.

Code
def sql_array_distance_from_search(conn, model, search_term):
    id = fetch_top_search_id(conn, model, search_term)
    ref_vec = (
        conn.sql(f"select embedding from card_embedding where id='{id}'")
        .fetchone()[0]
    )
    emb_dim = len(ref_vec)
    ref_vec_str = "ARRAY[" + ",".join(f"CAST({x} AS DOUBLE)" for x in ref_vec) + "]"
    return f"array_distance(embedding, {ref_vec_str}::DOUBLE[{emb_dim}])"

db.sql(f""" 
select 
    name
    , oracle_text
    , {sql_array_distance_from_search(db, model, 'when you cycle')} as dist
from card_embedding
left join card using(id)
where 1=1
    and layout = 'normal'
    and cmc = 2
order by dist asc
limit 9
""")
┌────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────────────┐
│          name          │                                                                   oracle_text                                                                   │         dist          │
│        varchar         │                                                                     varchar                                                                     │        double         │
├────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────┤
│ Drannith Healer        │ Whenever you cycle another card, you gain 1 life.\nCycling {1} ({1}, Discard this card: Draw a card.)                                           │ 4.804520544858501e-17 │
│ Drannith Stinger       │ Whenever you cycle another card, this creature deals 1 damage to each opponent.\nCycling {1} ({1}, Discard this card: Draw a card.)             │    0.7654649610995645 │
│ Witherbloom Apprentice │ Magecraft — Whenever you cast or copy an instant or sorcery spell, each opponent loses 1 life and you gain 1 life.                              │    0.9166975304148206 │
│ Blessed Wine           │ You gain 1 life.\nDraw a card at the beginning of the next turn's upkeep.                                                                       │    0.9202103106272829 │
│ Revitalize             │ You gain 3 life.\nDraw a card.                                                                                                                  │    0.9227281467739591 │
│ Disciple of Law        │ Protection from red\nCycling {2} ({2}, Discard this card: Draw a card.)                                                                         │    0.9279500413320395 │
│ Soul Shepherd          │ {W}, Exile a creature card from your graveyard: You gain 1 life.                                                                                │    0.9443892064122495 │
│ Syndic of Tithes       │ Extort (Whenever you cast a spell, you may pay {W/B}. If you do, each opponent loses 1 life and you gain that much life.)                       │     0.950814787176584 │
│ Samite Herbalist       │ Whenever this creature becomes tapped, you gain 1 life and scry 1. (Look at the top card of your library. You may put that card on the bottom.) │    0.9514616126869583 │
└────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────┘

What’s next?

I think I need better recommendations before I go any further. The results need to be “good enough,” but they’re not quite there. I’ll have to look into what it takes to refine the model for my use-case.

Making an app (probably a simple webapp) would be the most helpful interface. Even 5-10 recommendations would be helpful.