Recommended models

Which AI should write your notes? We tested it.

We ran 14 of the world's newest AI models on 12 real university lectures (hard STEM and dense humanities) and scored every set of notes against hand-built answer keys. From free models to 15¢-a-lecture flagships, every single one passed our hardest accuracy test. So we picked based on what actually sets them apart. Here's the whole story.

0
fabricated facts across all 14 models, on our hardest accuracy test

Just want the answer?

  • Start with the built-in option. Free, private, zero setup. Upgrade only if you want more.
  • Using a cloud model? Our recommendation is MiniMax M3: it tied the priciest flagships on detail and accuracy for under a penny a lecture.
  • Got 16 GB+ of RAM and like tinkering? Run a local model; our tested picks are below.

Never make something up.

Notes matter most when the material is brand new, and that's exactly when a wrong fact is hardest to catch. Nobody can spot an error in something they're learning for the first time; that's just what learning is. A confidently-wrong note gets studied and trusted, which makes it worse than a note that's thin. So our number-one rule, before anything else, is: never make something up.

That's why our testing isn't about which AI sounds smartest. It's about which one we can trust not to invent a number, misattribute a quote, or botch a calculation; then, among the trustworthy ones, which keeps the most useful detail.

Accuracy Depth of detail Clean structure Brevity

Three ways to run it

All three produce the same kind of notes; they differ in setup, privacy, and ceiling.

Built-in Default

The option LectureSync ships with. Runs on your Mac using Apple's built-in models. Zero setup.

Free, foreverNothing leaves your MacGood notes for most classes

Local power-up Measured

Run a custom open model on your own Mac with Ollama, LM Studio, or oMLX, and point LectureSync at it.

Free, still 100% on your MacNeeds 8–32 GB RAMBacked by our 1,755-run bake-off

Cloud Measured

Use a hosted model through a provider like OpenRouter. The highest quality ceiling, for pennies per lecture.

Best-quality notesCosts cents, not dollarsBacked by our 14-model bake-off

How we tested

We ran the exact note-taking instructions LectureSync uses on 12 real university lectures, a deliberately broad mix: MIT linear algebra, MIT computer science, MIT economics, plus Yale philosophy, biology, and history. Hard STEM and dense humanities.

Then we scored every set of notes automatically against a hand-built answer key, not by asking another AI for its opinion, but by checking whether specific facts, named figures, quoted lines, and worked calculations from each lecture actually made it into the notes.

The toughest test, our "fabrication gate": one linear-algebra lecture works out the inverse of a matrix. The right answer is a specific grid of numbers. A model that guesses it and gets it wrong has done the one thing we can't allow. Smaller, cheaper AI models are known to fail this. Every cloud model we tested here got it right.

Faithfulnessno invented facts, ever
Detailworked examples kept
Coveragekey points across any subject
On-topicno class-admin clutter

The results

All 14 were accurate. What separated them was detail, speed, and cost. Click a column to sort.

Faithful
MiniMax M3 app pickMiniMaxMay 31 ’260.65¢89s 0 errors
93%
100%Value
Claude Opus 4.8AnthropicMay 27 ’2613.4¢35s 0 errors
93%
100%SOTA
GPT-5.5OpenAIApr 24 ’2615.3¢63s 0 errors
90%
98%SOTA
MiMo v2.5 Proapp pickXiaomiApr 22 ’260.55¢51s 0 errors
88%
93%Budget
DeepSeek V4 Flashapp pickDeepSeekApr 24 ’260.2¢53s 0 errors
86%
100%Budget
Claude Haiku 4.5AnthropicOct 15 ’252.08¢22s 0 errors
85%
100%Mid
Gemini 3.5 FlashGoogleMay 19 ’265.8¢24s 0 errors
85%
93%Mid
MiMo v2.5app pickXiaomiApr 22 ’260.18¢24s 0 errors
83%
95%Budget
Owl Alpha(stealth)Apr 28 ’26Free45s 0 errors
81%
95%Free
Nemotron 3 Superapp pickNVIDIAMar 11 ’260.21¢53s 0 errors
81%
90%Budget
Grok 4.3xAIApr 30 ’261.45¢9s 0 errors
79%
90%Mid
DeepSeek V4 ProDeepSeekApr 24 ’260.96¢77s 0 errors
76%
98%Budget
Gemini 3.1 Flash LiteGoogleMay 7 ’260.34¢5s 0 errors
64%
88%Lite
gpt-oss-120bOpenAIAug 5 ’25Free42s 0 errors
61%
80%Free

✓ Faithful: every model, no exceptions. That column being boring is the headline. Tinted rows marked app pick are the models we recommend inside LectureSync. Detail = % of worked examples & specific numbers that survived into the notes. Coverage = key points captured on unfamiliar (humanities) subjects. Cost = our measured spend per ~1-hour lecture.

Price does not predict quality.

Every dot is a model. The best dots sit at the cheap end, and the expensive flagships are no higher up.

x-axis is logarithmic. ⭐ = our value pick. Free models shown in the free band, left.

What we learned

Accuracy is (finally) a solved problem at the top.

Two years ago, AI models routinely invented facts. Today, every frontier cloud model we tested (including completely free ones) got our hardest math test right, every time. That's the floor now. It's why we can offer this at all.

Price does not predict quality.

Our value pick (MiniMax M3, under a penny a lecture) tied the 15¢ flagships on detail and accuracy. A free model (Owl Alpha) beat several paid ones. Paying 20× more bought polish, not trustworthiness.

The real difference is how much detail survives.

Since they're all accurate, we chose on depth: does the note keep the worked example, the exact number, the named theorist? The spread was real. The best kept ~93% of the specifics, the thinnest barely 60%. For a student studying for an exam, that detail is the whole point.

Why we recommend what we recommend

With accuracy a given, it came down to three things, in order: detail, cost, and reliability at scale. Here's how that shook out.

⭐ Our cloud recommendation

MiniMax M3

minimax/minimax-m3 · 0.65¢/lecture · 93% detail

It tied the most expensive flagships on the planet for detail and accuracy, while costing under a penny per lecture. Nothing else matched that combination of richness and price.

The budget champion

DeepSeek V4 Flash

deepseek/deepseek-v4-flash · 0.20¢/lecture · 86% detail

Almost as detailed, even cheaper, and the cleanest at ignoring class-admin clutter. A great fallback.

The free surprise

Owl Alpha

openrouter/owl-alpha · free · 81% detail

A brand-new, no-cost model that out-performed several paid options. Proof that good note-taking no longer requires a big budget.

And the famous names? Excellent, just more than you need.

The flagships (GPT-5.5, Claude Opus 4.8): superb models, and on this task they were matched, on both trust and detail, by options costing a twentieth as much. If you already use one, it'll serve you beautifully; you simply don't need flagship prices for great lecture notes.

The "lite" and smallest free models (Gemini 3.1 Flash Lite, gpt-oss-120b): genuinely fast and cheap, but they dropped too many of the worked examples and specific numbers a student actually needs before an exam.

Style outliers: some models leaned wordy (a 10,000-word "summary"), others too skeletal. We favor the ones that hit the right level of detail without a fight.

How current is this?

Our top value pick, MiniMax M3, was four days old when we tested it. Most of the field shipped within the six weeks before our test (plus two 2025 veterans for reference). We re-run this evaluation as new models land; the leaderboard above reflects June 4, 2026.

Prefer local? Picks by your Mac's RAM Measured · 1,755 runs

Before the cloud study, we ran the same kind of bake-off on 13 local models: 1,755 scored generations on real MIT lectures, running entirely on-device. When Google shipped QAT checkpoints of the Gemma-4 family, we re-ran the accuracy tests with the same method; the picks below come from that re-test.

8 GBMac RAM
Gemma-4-E2B QAT  Q4_K_XL · 2.6 GB
The QAT checkpoint matches the old 8-bit E2B's trustworthiness (0.3% fabrication) at roughly half the size, which frees about 2 GB on the smallest Macs.
Measured
16 GBMac RAM
Gemma-4-E2B QAT  Q4_K_XL · 2.6 GB
Still the pick: nothing bigger earned its keep here, and the extra memory goes to long-lecture context instead of model weights.
Measured
24 GBMac RAM
Gemma-4-26B-A4B QAT  Q4_K_XL · MoE · 14.3 GB
QAT shrank the big mixture-of-experts from 16.9 to 14.3 GB, so it now fits this tier with room for long-lecture context. Zero fabrications in our re-test.
Measured
32 GBMac RAM
Gemma-4-26B-A4B QAT  Q4_K_XL · MoE · 14.3 GB
Same pick, more headroom: the model runs comfortably alongside everything else you keep open.
Measured

The surprise of the original study held up in the re-test: the smallest model is the most trustworthy. E2B's QAT checkpoint invented a false statement just 0.3% of the time at half its old size, and the 26B mixture-of-experts came back with zero. That's why E2B is the pick on every Mac under 24 GB, not the budget fallback.

New to this? OpenRouter in four steps

OpenRouter is one account that gives you access to almost every AI model out there. Instead of signing up with five different companies, you sign up once and pick models like items on a menu. That's why it's the easiest on-ramp (and what we used for this entire test).

Create an account at openrouter.ai

Sign up like any website. Add a few dollars of credit; at under a penny a lecture, that lasts a semester.

Make an API key

In OpenRouter's settings, create a key. It's a long string starting with sk-or-…. Treat it like a password and don't share it.

Paste it into LectureSync

Settings → Connections → add OpenRouter and paste your key. It's stored in your Mac's Keychain; LectureSync never sends it anywhere except OpenRouter itself.

Pick your models

Settings → Defaults: for notes, our recommendation is minimax/minimax-m3. For transcription, openai/whisper-large-v3-turbo is the standard pick (this bake-off covered the notes step; transcription wasn't part of it).

LectureSync Connections settings showing on-device options alongside an OpenRouter connection with a saved, masked API key.

Step 3: your key, saved in Settings → Connections.

LectureSync's OpenRouter model picker showing 'Top picks for notes' with MiniMax M3 marked as Top pick, plus DeepSeek, Owl Alpha, and MiMo options, searchable across 346 models.

Step 4: pick your notes model, right inside the app.

Questions people actually ask

Does an AI write my notes?
Yes. By default everything runs on your Mac; if you choose a cloud model, your transcript goes to the provider you picked. Either way, we test obsessively to make sure the notes stick to your lecture and never invent anything.
Could it still get something wrong?
Our hardest test caught zero fabricated facts across 14 models. No system is perfect, which is why our notes stay faithful to what was actually said and flag uncertainty rather than guess.
Why not just use the biggest, most famous model?
We tested them (GPT-5.5, Claude Opus 4.8). They're excellent, and they were matched, on accuracy and detail, by a model costing a twentieth as much. We chose value without compromise.
Do you use free models?
We test them. One free model placed mid-pack and is genuinely good; others were too thin. We pick on results, not price tags.

The honest part

What "Measured" means here: both the local and cloud picks come from our own bake-offs, scored automatically against hand-built answer keys and ranked on faithfulness first. We don't ask another AI for its opinion of the notes.

Privacy trade-off, plainly: with the built-in or local options, your audio and notes never leave your Mac. With any cloud model, your lecture transcript is sent to the provider you picked, under their terms. Details in our privacy policy.

Models change fast. New models drop monthly and we re-test. If a pick on this page changes, it's because we measured something better, not because of a sponsorship. Nobody pays to be recommended here.

METHODOLOGY · Cloud: tested 2026-06-04 on 12 university lectures via OpenRouter, using LectureSync's production note-taking prompt (single pass, low temperature). Notes scored automatically against a curated, hand-built answer key per lecture: faithfulness (fabricated facts), detail (worked examples & specific quantities retained), coverage (key points on unfamiliar subjects), and on-topic discipline. One run per model per lecture: directional, not a statistical study. Costs are real OpenRouter charges. Local: 1,755-run multi-seed study on 13 GGUF models, 9 MIT OCW lectures.

Last updated: June 4, 2026 · Local: 1,755-run bake-off · Cloud: 14-model bake-off (2026-06-04)