Home/Blog/How AI Scores Financial News: Inside ChartPilot's News Radar

How AI Scores Financial News: Inside ChartPilot's News Radar

A look under the hood at how AI reads, classifies and scores financial news headlines into a bullish-or-bearish verdict, with confidence and reasons.

If you've ever read a Bloomberg headline and instantly thought "that's bad for gold", you've done sentiment analysis. What ChartPilot's News Radar does is the same thing, just at machine scale and with consistent rules. This article walks through how.

The pipeline in one diagram

You pick a symbol
   ↓
News Radar fetches ~12 fresh headlines
   ↓
Each headline goes to GPT-4o-mini with a strict prompt
   ↓
AI returns: per-headline label (+ reason) + overall verdict
   ↓
We compute net score, you see the dashboard

Five stages. The interesting work happens in stages 2 and 3.

Stage 1 — Symbol classification

Before fetching news, the system has to know what kind of asset it's dealing with. The news source for a stock is different from a forex pair, which is different from a commodity.

ChartPilot classifies your input symbol into:

  • Stock (AAPL, NVDA, TSLA) → company-news endpoint
  • Forex / metal (EURUSD, XAUUSD) → broader forex category news
  • Crypto (BTCUSDT, ETHUSDT) → crypto category news

This matters because asking for "Apple news" returns earnings, product launches and analyst notes. Asking for "EUR/USD news" returns ECB statements, German CPI, US labor data — anything that moves the pair.

Stage 2 — The fetch

Twelve headlines is the sweet spot. Fewer and the verdict is fragile (one outlier can dominate). More and the cost balloons without adding signal — most news outlets cover the same five stories anyway.

Items are sorted newest first. Anything older than 14 days is dropped for stocks, with shorter windows for forex/crypto since those markets move faster.

Stage 3 — The AI classification (this is the interesting part)

Each fetched item is bundled into a single prompt with a clear schema. Here's the abridged version:

"You are a financial news sentiment analyst for . For each headline, return one of: STRONG_BULL, BULL, NEUTRAL, BEAR, STRONG_BEAR — based on its likely impact on . Also produce an overall verdict: BULLISH, BEARISH, NEUTRAL_BULLISH, NEUTRAL_BEARISH, NEUTRAL or MIXED, with a confidence 0–100 and 3–5 driver bullets + 3–5 things to watch."

Why this design works:

The labels are explicit. Instead of asking "what's the sentiment?" — which produces vague essays — you force a multiple-choice answer. Five buckets, no escape hatches.

Asset-specificity is baked in. "Strong USD news" is bearish for gold but neutral for AAPL. By telling the model what the symbol is, you get reasoning that's anchored to that asset.

Reasoning is required per headline. The model has to justify each label with a short reason. This solves two problems: you can audit any classification, and asking the model to explain makes it more careful with the label itself.

Stage 4 — Scoring

Once the AI hands back labels, the math is trivial:

| Label | Weight | |---|---| | STRONG_BULL | +2 | | BULL | +1 | | NEUTRAL | 0 | | BEAR | −1 | | STRONG_BEAR | −2 |

Sum the weights → net score. A net score of +6 over 12 headlines means roughly "two thirds positive, no strong bears." A score of −2 means "mostly neutral with a couple bears" — that's why the verdict can be NEUTRAL_BEARISH instead of full BEARISH.

The verdict label itself comes directly from the AI, not from a score threshold. The score is the audit trail; the verdict is the judgment.

Stage 5 — Confidence

Confidence (0–100) is the AI's self-assessment of how strong the case is. It's high when headlines all point the same way with high-impact items. It's low when the news flow is sparse, mixed, or weak.

ChartPilot caps confidence at 60 for MIXED verdicts and at 50 when the news flow is thin. This prevents the system from looking falsely sure when the underlying data is shaky.

What we don't do (and why)

A few design choices that surprise people:

No keyword matching. Old-school sentiment used dictionaries: "beat" → +1, "miss" → −1. This fails on sentences like "no longer expected to miss" or "concerns over the beat." Modern LLMs understand sentence structure, so we let them do the reading.

No social-media data. Twitter/X chatter is loud but unreliable. We score editorial headlines from financial outlets — Bloomberg, Reuters, CNBC, Yahoo Finance — where the signal-to-noise is higher.

No price-prediction. News Radar tells you what the news says. It doesn't tell you what the price will do. Markets often disagree with the news for hours or days. That's why we always pair it with a chart analysis.

The honest limits

AI sentiment scoring is not magic. The known failure modes:

  • Headline-only reading. We don't read full articles (cost + latency). A headline like "Apple to add AI features" looks neutral but might contain a buried bombshell. We accept this trade-off for speed.

  • Recency bias. The newest 12 items get all the weight. A pivotal headline from 15 days ago doesn't count, even if the market is still pricing it in.

  • Language model errors. GPT-4o-mini misclassifies maybe 1 in 20 headlines. We surface the reasoning so you can spot when it's wrong. If the reasoning looks off, the verdict is less trustworthy.

These are real limits, and we list them openly. The system is useful, not infallible.

Why we chose this approach

The alternatives we considered:

| Approach | Pros | Cons | |---|---|---| | Keyword/dictionary | Fast, cheap | Brittle, misses context | | Sentiment-only LLM | Better than dictionary | Vague output, no audit trail | | LLM with structured schema | Auditable, asset-specific, fast | Slightly more expensive | | Vendor sentiment API | Plug-and-play | Black box, not asset-specific |

The structured schema approach gave us the best balance of quality, speed and transparency. It's also why each scan costs only 3 credits — a fraction of a cent at the API level.

Try it once

Theory only goes so far. The fastest way to internalize how this works:

  1. Open News Radar.
  2. Pick a symbol you have an opinion on.
  3. Run the scan, then read the per-headline reasons — not just the verdict.
  4. Decide whether the AI's labels match how you would have labeled the same headlines.

That last step is where you'll learn the most. Either you'll start trusting the system as a second pair of eyes, or you'll spot where its reasoning differs from yours — and that disagreement is itself valuable signal.

Educational content only. ChartPilot is an educational tool. Nothing in this article constitutes financial or investment advice. Always do your own research before making any trading decisions.
ChartPilot provides AI-assisted, scenario-based educational analysis only. It is not financial advice, investment advice, or a trading signal service. Trading involves risk of loss; past performance and AI-generated scenarios do not guarantee future results.