Files

gahusb cefaeca449 docs(ai_news): Phase 1 implementation plan — articles source (8 tasks)

8-task TDD plan. schema(source 컬럼) → articles_source 모듈 + 6 tests
→ analyzer(summary) → pipeline 교체 → telegram 매핑 라인 → router →
scraper deprecate → 운영 검증. 신규 단위 테스트 6 + 갱신 4.

선행 spec: docs/superpowers/specs/2026-05-14-ai-news-articles-source-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 01:58:27 +09:00

34 KiB

Raw Blame History

AI News Phase 1 — articles Source Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: ai_news 파이프라인의 데이터 소스를 Naver 스크래퍼에서 기존 articles 테이블로 교체. 종목명 substring 매핑으로 시총 상위 100 ticker 의 뉴스 sentiment 산출. news_sentiment.source 컬럼 추가로 Phase 2 비교 baseline 확보.

Architecture: 신규 articles_source.py 모듈이 articles 테이블 + krx_master.name substring 매핑으로 ticker별 뉴스 dict 반환. pipeline.py는 scraper 호출 대신 articles_source 사용. analyzer.py 가 LLM prompt 에 summary 포함. 텔레그램 메시지에 매핑 hit-rate 라인 추가. legacy scraper.py 는 deprecate 주석만 추가하고 보존.

Tech Stack: Python 3.11 / SQLite (WAL + busy_timeout) / anthropic AsyncClient / FastAPI / pytest + pytest-asyncio.

선행 spec: web-ui/docs/superpowers/specs/2026-05-14-ai-news-articles-source-design.md

파일 구조

신규 파일 (backend):

web-backend/stock-lab/app/screener/ai_news/
  articles_source.py             ← DB articles 조회 + 종목명 매핑

web-backend/stock-lab/tests/
  test_ai_news_articles_source.py    ← 6 tests

수정 파일 (backend):

web-backend/stock-lab/app/screener/
  schema.py                      ← news_sentiment.source 컬럼 + migration
  ai_news/pipeline.py            ← articles_source 사용, _make_http 제거
  ai_news/analyzer.py            ← prompt에 summary/pub_date 포함
  ai_news/telegram.py            ← build_message 에 mapping 라인
  ai_news/scraper.py             ← deprecate 주석만 추가
  router.py                      ← post_refresh_news_sentiment 에 mapping 전달

web-backend/stock-lab/tests/
  test_ai_news_pipeline.py       ← articles_source mock 으로 갱신
  test_ai_news_analyzer.py       ← summary 케이스 추가
  test_ai_news_telegram.py       ← mapping 인자 케이스 추가
  test_ai_news_router.py         ← mapping 응답 필드 검증

Task 1: schema.py — `news_sentiment.source` 컬럼 + migration

Files:

Modify: web-backend/stock-lab/app/screener/schema.py
Step 1: DDL 본문에 source 컬럼 정의 추가

schema.py 의 DDL 문자열 안 news_sentiment 테이블 정의에 source 컬럼을 model 컬럼 다음에 추가:

CREATE TABLE IF NOT EXISTS news_sentiment (
  ticker          TEXT NOT NULL,
  date            TEXT NOT NULL,
  score_raw       REAL NOT NULL,
  reason          TEXT NOT NULL DEFAULT '',
  news_count      INTEGER NOT NULL DEFAULT 0,
  tokens_input    INTEGER NOT NULL DEFAULT 0,
  tokens_output   INTEGER NOT NULL DEFAULT 0,
  model           TEXT NOT NULL DEFAULT 'claude-haiku-4-5-20251001',
  source          TEXT NOT NULL DEFAULT 'articles',
  created_at      TEXT NOT NULL DEFAULT (datetime('now','localtime')),
  PRIMARY KEY (ticker, date)
);

Step 2: ensure_screener_schema() 함수에 1회성 migration 블록 추가

기존 ai_news weight migration 블록 (라인 ~142-156 근처) 직전 또는 직후에 다음을 추가:

    # news_sentiment.source 컬럼 1회 추가 (기존 운영 환경)
    cols = {r[1] for r in conn.execute(
        "PRAGMA table_info(news_sentiment)"
    ).fetchall()}
    if "source" not in cols:
        conn.execute(
            "ALTER TABLE news_sentiment "
            "ADD COLUMN source TEXT NOT NULL DEFAULT 'articles'"
        )

위치는 executescript(DDL) 직후, 기존 ai_news weight migration block 안이 자연스러움.

Step 3: 기존 schema 테스트 회귀

cd C:\Users\jaeoh\Desktop\workspace\web-backend\stock-lab
python -m pytest app/test_screener_schema.py -v

Expected: PASS — 3 tests passed (migration 추가에도 idempotency 유지).

Step 4: Commit

git add app/screener/schema.py
git commit -m "feat(ai_news): add news_sentiment.source column with migration"

Task 2: `articles_source.py` — DB 매핑 모듈 + 6 tests

Files:

Create: web-backend/stock-lab/app/screener/ai_news/articles_source.py
Test: web-backend/stock-lab/tests/test_ai_news_articles_source.py
Step 1: 실패하는 테스트 작성

tests/test_ai_news_articles_source.py:

import datetime as dt
import sqlite3
import pytest

from app.screener.ai_news import articles_source
from app.screener.schema import ensure_screener_schema


@pytest.fixture
def conn():
    c = sqlite3.connect(":memory:")
    c.row_factory = sqlite3.Row
    ensure_screener_schema(c)
    # krx_master + articles 시드 helper 는 각 테스트에서 진행
    yield c
    c.close()


def _seed_master(conn, ticker, name):
    conn.execute(
        "INSERT INTO krx_master (ticker, name, market, market_cap, updated_at) "
        "VALUES (?, ?, 'KOSPI', 1_000_000_000, datetime('now'))",
        (ticker, name),
    )


def _seed_article(conn, title, summary="", crawled_at="2026-05-14T07:30:00"):
    import hashlib
    h = hashlib.md5(f"{title}|x".encode()).hexdigest()
    conn.execute(
        "INSERT INTO articles (hash, title, summary, link, press, pub_date, crawled_at) "
        "VALUES (?, ?, ?, '', '', '2026-05-14', ?)",
        (h, title, summary, crawled_at),
    )


ASOF = dt.date(2026, 5, 14)


def test_single_ticker_match_in_title(conn):
    _seed_master(conn, "005930", "삼성전자")
    _seed_article(conn, "삼성전자, HBM 양산 가시화")
    conn.commit()
    out, stats = articles_source.gather_articles_for_tickers(
        conn, ["005930"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert len(out["005930"]) == 1
    assert out["005930"][0]["title"] == "삼성전자, HBM 양산 가시화"
    assert stats["matched_pairs"] == 1
    assert stats["hit_tickers"] == 1


def test_single_ticker_match_in_summary(conn):
    _seed_master(conn, "005930", "삼성전자")
    _seed_article(conn, "메모리 시장 회복세", summary="삼성전자가 1분기 어닝 서프라이즈")
    conn.commit()
    out, _ = articles_source.gather_articles_for_tickers(
        conn, ["005930"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert len(out["005930"]) == 1


def test_multi_ticker_match(conn):
    _seed_master(conn, "005930", "삼성전자")
    _seed_master(conn, "000660", "SK하이닉스")
    _seed_article(conn, "삼성전자와 SK하이닉스, 메모리 양산 경쟁")
    conn.commit()
    out, stats = articles_source.gather_articles_for_tickers(
        conn, ["005930", "000660"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert len(out["005930"]) == 1
    assert len(out["000660"]) == 1
    assert stats["matched_pairs"] == 2
    assert stats["hit_tickers"] == 2


def test_no_match_returns_empty_list(conn):
    _seed_master(conn, "005930", "삼성전자")
    _seed_article(conn, "엔비디아 실적 발표", summary="AI 칩 수요 견조")
    conn.commit()
    out, stats = articles_source.gather_articles_for_tickers(
        conn, ["005930"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert out["005930"] == []
    assert stats["matched_pairs"] == 0
    assert stats["hit_tickers"] == 0


def test_max_per_ticker_caps_results(conn):
    _seed_master(conn, "005930", "삼성전자")
    for i in range(6):
        _seed_article(conn, f"삼성전자 뉴스 #{i}", crawled_at=f"2026-05-14T0{i}:00:00")
    conn.commit()
    out, _ = articles_source.gather_articles_for_tickers(
        conn, ["005930"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert len(out["005930"]) == 5


def test_window_days_filters_old_articles(conn):
    _seed_master(conn, "005930", "삼성전자")
    _seed_article(conn, "삼성전자 최신 뉴스", crawled_at="2026-05-14T07:00:00")
    _seed_article(conn, "삼성전자 오래된 뉴스", crawled_at="2026-05-01T07:00:00")
    conn.commit()
    out, _ = articles_source.gather_articles_for_tickers(
        conn, ["005930"], ASOF, window_days=1, max_per_ticker=5,
    )
    assert len(out["005930"]) == 1
    assert "최신" in out["005930"][0]["title"]

Step 2: 테스트 실패 확인

python -m pytest tests/test_ai_news_articles_source.py -v

Expected: FAIL — "No module named 'app.screener.ai_news.articles_source'".

Step 3: articles_source.py 구현 — 정확히:

"""기존 articles 테이블에서 종목별 뉴스 매핑."""

from __future__ import annotations

import datetime as dt
import logging
import sqlite3
from typing import Any, Dict, List, Tuple

log = logging.getLogger(__name__)


def gather_articles_for_tickers(
    conn: sqlite3.Connection,
    tickers: List[str],
    asof: dt.date,
    *,
    window_days: int = 1,
    max_per_ticker: int = 5,
) -> Tuple[Dict[str, List[Dict[str, Any]]], Dict[str, int]]:
    """articles 에서 ticker.name substring 매칭으로 종목별 뉴스 dict 반환.

    Returns:
        (
          {ticker: [{"title": str, "summary": str, "press": str, "pub_date": str}, ...]},
          {"total_articles": int, "matched_pairs": int, "hit_tickers": int},
        )
    """
    out: Dict[str, List[Dict[str, Any]]] = {t: [] for t in tickers}
    stats = {"total_articles": 0, "matched_pairs": 0, "hit_tickers": 0}

    if not tickers:
        return out, stats

    cutoff = (asof - dt.timedelta(days=window_days)).isoformat()

    placeholders = ",".join("?" * len(tickers))
    name_rows = conn.execute(
        f"SELECT ticker, name FROM krx_master WHERE ticker IN ({placeholders})",
        tickers,
    ).fetchall()
    # 2글자 미만 회사명은 false positive 위험으로 제외
    name_map = {r[0]: r[1] for r in name_rows if r[1] and len(r[1]) >= 2}

    articles = conn.execute(
        "SELECT title, summary, press, pub_date, crawled_at "
        "FROM articles WHERE crawled_at >= ? ORDER BY crawled_at DESC",
        (cutoff,),
    ).fetchall()
    stats["total_articles"] = len(articles)

    for a in articles:
        title = (a[0] or "").strip()
        summary = (a[1] or "").strip()
        haystack = title + " " + summary
        for ticker, name in name_map.items():
            if name not in haystack:
                continue
            if len(out[ticker]) >= max_per_ticker:
                continue
            out[ticker].append({
                "title": title,
                "summary": summary,
                "press": a[2] or "",
                "pub_date": a[3] or "",
            })
            stats["matched_pairs"] += 1

    stats["hit_tickers"] = sum(1 for arts in out.values() if arts)
    return out, stats

Step 4: 테스트 통과 확인

python -m pytest tests/test_ai_news_articles_source.py -v

Expected: PASS — 6 tests passed.

Step 5: Commit

git add app/screener/ai_news/articles_source.py tests/test_ai_news_articles_source.py
git commit -m "feat(ai_news): articles_source module (substring ticker matching)"

Task 3: `analyzer.py` — prompt 에 summary/pub_date 포함

Files:

Modify: web-backend/stock-lab/app/screener/ai_news/analyzer.py
Modify: web-backend/stock-lab/tests/test_ai_news_analyzer.py
Step 1: 테스트 갱신 (실패 유도)

tests/test_ai_news_analyzer.py 의 NEWS 상수와 test_score_sentiment_success_parses_json 테스트를 다음으로 교체/보강:

NEWS = [
    {"title": "삼성전자, HBM 양산", "summary": "1분기 영업이익 사상 최대", "pub_date": "2026-05-14"},
    {"title": "메모리 가격 반등", "summary": "", "pub_date": "2026-05-14"},
]


@pytest.mark.asyncio
async def test_score_sentiment_includes_summary_in_prompt():
    """summary 가 있으면 prompt 에 포함, 없으면 title 만."""
    llm = _mk_llm(json.dumps({"score": 5.0, "reason": "ok"}))
    await analyzer.score_sentiment(llm, "005930", NEWS, name="삼성전자")
    # mock 의 messages.create 호출 인자 확인
    call = llm.messages.create.call_args
    user_msg = call.kwargs["messages"][0]["content"]
    assert "1분기 영업이익 사상 최대" in user_msg  # summary 포함
    assert "삼성전자, HBM 양산" in user_msg  # title 포함
    assert "2026-05-14" in user_msg  # pub_date 포함

Step 2: 테스트 실행으로 실패 확인

python -m pytest tests/test_ai_news_analyzer.py::test_score_sentiment_includes_summary_in_prompt -v

Expected: FAIL — 1분기 영업이익 사상 최대 가 prompt 에 없음.

Step 3: analyzer.py 의 news_block 빌더 분리 + summary 포함

기존 prompt 빌드 부분 수정. score_sentiment 함수의 prompt build 직전에 helper 함수 추가:

def _format_news_block(news: List[Dict[str, Any]]) -> str:
    """news dict 리스트 → prompt 에 들어가는 텍스트 블록.

    summary 가 있으면 title 다음 줄에 indent 해서 포함 (최대 200자).
    pub_date 가 있으면 title 앞에 표시.
    """
    lines: List[str] = []
    for n in news:
        date = (n.get("pub_date") or "").strip()
        title = (n.get("title") or "").strip()
        summary = (n.get("summary") or "").strip()
        prefix = f"[{date}] " if date else ""
        if summary:
            lines.append(f"- {prefix}{title}\n  {summary[:200]}")
        else:
            lines.append(f"- {prefix}{title}")
    return "\n".join(lines)

그리고 score_sentiment 안 news_block 계산 라인을 다음으로 교체:

    news_block = _format_news_block(news)

Step 4: 테스트 통과 확인

python -m pytest tests/test_ai_news_analyzer.py -v

Expected: PASS — 5 tests (기존 4 + 신규 1) 모두 통과.

Step 5: Commit

git add app/screener/ai_news/analyzer.py tests/test_ai_news_analyzer.py
git commit -m "feat(ai_news): include summary + pub_date in LLM prompt"

Task 4: `pipeline.py` — articles_source 사용으로 교체

Files:

Modify: web-backend/stock-lab/app/screener/ai_news/pipeline.py
Modify: web-backend/stock-lab/tests/test_ai_news_pipeline.py
Step 1: 테스트 갱신 (실패 유도)

tests/test_ai_news_pipeline.py 의 test_refresh_daily_happy_path 를 다음으로 교체:

@pytest.mark.asyncio
async def test_refresh_daily_happy_path(conn):
    """3종목 mini integration — articles_source mock + analyzer mock.

    각 종목에 매핑되는 articles 1개씩 있다고 가정.
    """
    asof = dt.date(2026, 5, 13)

    fake_articles_by_ticker = {
        "005930": [{"title": "삼성 뉴스", "summary": "", "press": "", "pub_date": ""}],
        "000660": [{"title": "SK 뉴스", "summary": "", "press": "", "pub_date": ""}],
        "373220": [{"title": "LG 뉴스", "summary": "", "press": "", "pub_date": ""}],
    }
    fake_stats = {"total_articles": 3, "matched_pairs": 3, "hit_tickers": 3}

    scores_by_ticker = {
        "005930": 7.5, "000660": 4.0, "373220": -6.0,
    }
    async def fake_score(llm, ticker, news, *, name=None, model="m"):
        return {
            "ticker": ticker, "score_raw": scores_by_ticker[ticker],
            "reason": f"r{ticker}", "news_count": 1,
            "tokens_input": 100, "tokens_output": 20, "model": model,
        }

    with patch.object(pipeline, "articles_source") as mas, \
         patch.object(pipeline, "_analyzer") as ma, \
         patch.object(pipeline, "_make_llm") as ml:
        mas.gather_articles_for_tickers = MagicMock(
            return_value=(fake_articles_by_ticker, fake_stats)
        )
        ma.score_sentiment = fake_score
        ml.return_value.__aenter__.return_value = AsyncMock()
        ml.return_value.__aexit__.return_value = None
        result = await pipeline.refresh_daily(conn, asof, concurrency=3)

    assert result["asof"] == "2026-05-13"
    assert result["updated"] == 3
    assert result["failures"] == []
    assert result["top_pos"][0]["ticker"] == "005930"
    assert result["top_neg"][0]["ticker"] == "373220"
    assert result["mapping"] == fake_stats

    rows = conn.execute("SELECT ticker, score_raw, source FROM news_sentiment "
                        "WHERE date=?", ("2026-05-13",)).fetchall()
    assert len(rows) == 3
    assert all(r["source"] == "articles" for r in rows)


@pytest.mark.asyncio
async def test_refresh_daily_no_match_ticker_skipped(conn):
    """매핑 0인 ticker 는 LLM 호출 skip + news_sentiment 행 미생성."""
    asof = dt.date(2026, 5, 13)

    fake_articles_by_ticker = {
        "005930": [{"title": "삼성", "summary": "", "press": "", "pub_date": ""}],
        "000660": [],  # 매핑 없음
        "373220": [],  # 매핑 없음
    }
    fake_stats = {"total_articles": 1, "matched_pairs": 1, "hit_tickers": 1}

    async def fake_score(llm, ticker, news, *, name=None, model="m"):
        return {
            "ticker": ticker, "score_raw": 5.0, "reason": "r",
            "news_count": 1, "tokens_input": 100, "tokens_output": 20,
            "model": model,
        }

    with patch.object(pipeline, "articles_source") as mas, \
         patch.object(pipeline, "_analyzer") as ma, \
         patch.object(pipeline, "_make_llm") as ml:
        mas.gather_articles_for_tickers = MagicMock(
            return_value=(fake_articles_by_ticker, fake_stats)
        )
        ma.score_sentiment = fake_score
        ml.return_value.__aenter__.return_value = AsyncMock()
        ml.return_value.__aexit__.return_value = None
        result = await pipeline.refresh_daily(conn, asof, concurrency=3)

    assert result["updated"] == 1
    rows = conn.execute("SELECT ticker FROM news_sentiment "
                        "WHERE date=?", ("2026-05-13",)).fetchall()
    assert {r["ticker"] for r in rows} == {"005930"}

기존 test_refresh_daily_failures_isolated 는 articles_source 매핑 데이터를 추가해야 함:

@pytest.mark.asyncio
async def test_refresh_daily_failures_isolated(conn):
    asof = dt.date(2026, 5, 13)

    fake_articles_by_ticker = {
        "005930": [{"title": "h", "summary": "", "press": "", "pub_date": ""}],
        "000660": [{"title": "h", "summary": "", "press": "", "pub_date": ""}],
        "373220": [{"title": "h", "summary": "", "press": "", "pub_date": ""}],
    }
    fake_stats = {"total_articles": 3, "matched_pairs": 3, "hit_tickers": 3}

    async def fake_score(llm, ticker, news, *, name=None, model="m"):
        if ticker == "000660":
            raise RuntimeError("llm exploded")
        return {
            "ticker": ticker, "score_raw": 5.0, "reason": "r", "news_count": 1,
            "tokens_input": 100, "tokens_output": 20, "model": model,
        }

    with patch.object(pipeline, "articles_source") as mas, \
         patch.object(pipeline, "_analyzer") as ma, \
         patch.object(pipeline, "_make_llm") as ml:
        mas.gather_articles_for_tickers = MagicMock(
            return_value=(fake_articles_by_ticker, fake_stats)
        )
        ma.score_sentiment = fake_score
        ml.return_value.__aenter__.return_value = AsyncMock()
        ml.return_value.__aexit__.return_value = None
        result = await pipeline.refresh_daily(conn, asof, concurrency=3)

    assert result["updated"] == 2
    assert len(result["failures"]) == 1

상단 import 에 MagicMock 추가 확인:

from unittest.mock import AsyncMock, MagicMock, patch

Step 2: 테스트 실패 확인

python -m pytest tests/test_ai_news_pipeline.py -v

Expected: FAIL — pipeline 이 articles_source 를 아직 사용 안 함.

Step 3: pipeline.py 본문 교체

pipeline.py 의 다음을 변경:

(1) 상단 import 에 articles_source 추가:

from . import scraper as _scraper          # legacy, kept for backward import
from . import analyzer as _analyzer
from . import articles_source              # 신규

(2) _make_http() 함수와 DEFAULT_RATE_LIMIT_SEC 상수는 제거 (또는 deprecate). 더 이상 사용 안 함.

(3) _process_one() 함수를 다음으로 교체:

async def _process_one(
    ticker: str, name: str, articles: List[Dict[str, Any]],
    sem: asyncio.Semaphore, llm, model: str,
) -> Dict[str, Any]:
    async with sem:
        return await _analyzer.score_sentiment(
            llm, ticker, articles, name=name, model=model,
        )

(4) refresh_daily() 시그니처 + 본문 교체:

async def refresh_daily(
    conn: sqlite3.Connection,
    asof: dt.date,
    *,
    top_n: int = DEFAULT_TOP_N,
    concurrency: int = DEFAULT_CONCURRENCY,
    max_news_per_ticker: int = DEFAULT_NEWS_PER_TICKER,
    window_days: int = 1,
    model: str = _analyzer.DEFAULT_MODEL,
) -> Dict[str, Any]:
    started = time.time()
    tickers = _top_market_cap_tickers(conn, n=top_n)
    name_map = {
        r[0]: r[1] for r in conn.execute(
            f"SELECT ticker, name FROM krx_master WHERE ticker IN "
            f"({','.join('?' * len(tickers))})", tickers,
        ).fetchall()
    } if tickers else {}

    articles_by_ticker, mapping_stats = articles_source.gather_articles_for_tickers(
        conn, tickers, asof,
        window_days=window_days,
        max_per_ticker=max_news_per_ticker,
    )

    sem = asyncio.Semaphore(concurrency)
    async with _make_llm() as llm:
        tasks = []
        for t in tickers:
            arts = articles_by_ticker.get(t, [])
            if not arts:
                continue  # 매핑 0 — score 미생성
            tasks.append(_process_one(t, name_map.get(t, t), arts, sem, llm, model))
        raw_results = await asyncio.gather(*tasks, return_exceptions=True)

    successes: List[Dict[str, Any]] = []
    failures: List[str] = []
    for r in raw_results:
        if isinstance(r, BaseException):
            failures.append(repr(r))
        elif isinstance(r, dict):
            successes.append(r)

    if successes:
        _upsert_news_sentiment(conn, asof, successes, source="articles")

    top_pos = sorted(successes, key=lambda r: -r["score_raw"])[:5]
    top_neg = sorted(successes, key=lambda r: r["score_raw"])[:5]

    return {
        "asof": asof.isoformat(),
        "updated": len(successes),
        "failures": failures,
        "duration_sec": round(time.time() - started, 2),
        "tokens_input": sum(r["tokens_input"] for r in successes),
        "tokens_output": sum(r["tokens_output"] for r in successes),
        "top_pos": top_pos,
        "top_neg": top_neg,
        "model": model,
        "mapping": mapping_stats,
    }

(5) _upsert_news_sentiment() 함수에 source 인자 추가 + INSERT 에 컬럼 포함:

def _upsert_news_sentiment(
    conn: sqlite3.Connection, asof: dt.date,
    rows: List[Dict[str, Any]], *, source: str = "articles",
) -> None:
    iso = asof.isoformat()
    data = [
        (
            r["ticker"], iso, r["score_raw"], r["reason"], r["news_count"],
            r["tokens_input"], r["tokens_output"], r["model"], source,
        )
        for r in rows
    ]
    conn.executemany(
        """INSERT INTO news_sentiment
             (ticker, date, score_raw, reason, news_count,
              tokens_input, tokens_output, model, source)
           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
           ON CONFLICT(ticker, date) DO UPDATE SET
             score_raw=excluded.score_raw,
             reason=excluded.reason,
             news_count=excluded.news_count,
             tokens_input=excluded.tokens_input,
             tokens_output=excluded.tokens_output,
             model=excluded.model,
             source=excluded.source
        """,
        data,
    )
    conn.commit()

Step 4: 테스트 통과 확인

python -m pytest tests/test_ai_news_pipeline.py -v

Expected: PASS — test_refresh_daily_happy_path, test_refresh_daily_failures_isolated, test_refresh_daily_no_match_ticker_skipped, test_top_market_cap_tickers 모두 통과 (4 tests).

Step 5: Commit

git add app/screener/ai_news/pipeline.py tests/test_ai_news_pipeline.py
git commit -m "feat(ai_news): pipeline uses articles_source (replaces Naver scraper)"

Task 5: `telegram.py` — 매핑 라인 추가

Files:

Modify: web-backend/stock-lab/app/screener/ai_news/telegram.py
Modify: web-backend/stock-lab/tests/test_ai_news_telegram.py
Step 1: 테스트 갱신 (실패 유도)

tests/test_ai_news_telegram.py 끝에 새 테스트 추가:

def test_build_message_includes_mapping_line():
    msg = tg.build_message(
        asof="2026-05-14",
        top_pos=[_row("005930", 8.5, "HBM 호재")],
        top_neg=[],
        tokens_input=1000, tokens_output=200,
        mapping={"total_articles": 35, "matched_pairs": 50, "hit_tickers": 42},
    )
    assert "매핑" in msg
    assert "42" in msg
    assert "50" in msg
    assert "35" in msg


def test_build_message_without_mapping_omits_line():
    msg = tg.build_message(
        asof="2026-05-14",
        top_pos=[],
        top_neg=[],
        tokens_input=1000, tokens_output=200,
    )
    assert "매핑" not in msg

Step 2: 테스트 실패 확인

python -m pytest tests/test_ai_news_telegram.py -v

Expected: FAIL — mapping 인자 미지원.

Step 3: telegram.py 의 build_message 시그니처 + footer 갱신

def build_message(
    *,
    asof: str,
    top_pos: List[Dict[str, Any]],
    top_neg: List[Dict[str, Any]],
    tokens_input: int,
    tokens_output: int,
    mapping: Dict[str, int] | None = None,
) -> str:
    lines: List[str] = [
        f"🌅 *AI 뉴스 분석* \\({_escape(asof)} 08:00\\)",
        "",
        "📈 *호재 Top 5*",
    ]
    if top_pos:
        for i, r in enumerate(top_pos, 1):
            lines.append(_row_line(i, r))
    else:
        lines.append(_escape("- (없음)"))

    lines += ["", "📉 *악재 Top 5*"]
    if top_neg:
        for i, r in enumerate(top_neg, 1):
            lines.append(_row_line(i, r))
    else:
        lines.append(_escape("- (없음)"))

    cost = _cost_won(tokens_input, tokens_output)
    mapping_part = ""
    if mapping:
        mapping_part = (
            f"매핑 {mapping['hit_tickers']}/100 ticker "
            f"\\({mapping['matched_pairs']}쌍 / articles {mapping['total_articles']}건\\) · "
        )
    lines += [
        "",
        f"_분석: 시총 상위 100종목 · {mapping_part}"
        f"토큰 {tokens_input:,} in / {tokens_output:,} out · "
        f"약 ₩{cost:,}_",
    ]
    return "\n".join(lines)

Step 4: 테스트 통과 확인

python -m pytest tests/test_ai_news_telegram.py -v

Expected: PASS — 6 tests (기존 4 + 신규 2) 모두 통과.

Step 5: Commit

git add app/screener/ai_news/telegram.py tests/test_ai_news_telegram.py
git commit -m "feat(ai_news): telegram includes article mapping stats line"

Task 6: `router.py` — mapping 응답 필드 전달

Files:

Modify: web-backend/stock-lab/app/screener/router.py
Modify: web-backend/stock-lab/tests/test_ai_news_router.py
Step 1: 테스트 갱신

tests/test_ai_news_router.py 의 test_refresh_news_sentiment_weekday_invokes_pipeline 보강:

def test_refresh_news_sentiment_weekday_invokes_pipeline():
    fake_summary = {
        "asof": "2026-05-13", "updated": 3, "failures": [],
        "duration_sec": 1.0, "tokens_input": 100, "tokens_output": 20,
        "top_pos": [], "top_neg": [], "model": "m",
        "mapping": {"total_articles": 5, "matched_pairs": 8, "hit_tickers": 3},
    }
    with patch("app.screener.router._ai_pipeline") as mp, \
         patch("app.screener.router._ai_telegram") as mt:
        mp.refresh_daily = AsyncMock(return_value=fake_summary)
        mt.build_message = lambda **kw: f"TEXT_with_mapping={kw.get('mapping')}"
        client = TestClient(app)
        resp = client.post(
            "/api/stock/screener/snapshot/refresh-news-sentiment?asof=2026-05-13"
        )
    assert resp.status_code == 200
    body = resp.json()
    assert body["mapping"]["hit_tickers"] == 3
    assert "mapping=" in body["telegram_text"]

Step 2: 테스트 실패 확인

python -m pytest tests/test_ai_news_router.py -v

Expected: FAIL — mapping 이 build_message 호출에 전달되지 않음.

Step 3: router.py 의 post_refresh_news_sentiment 의 telegram_text 빌드 갱신

기존:

    summary["telegram_text"] = _ai_telegram.build_message(
        asof=summary["asof"],
        top_pos=summary["top_pos"], top_neg=summary["top_neg"],
        tokens_input=summary["tokens_input"],
        tokens_output=summary["tokens_output"],
    )

다음으로 교체:

    summary["telegram_text"] = _ai_telegram.build_message(
        asof=summary["asof"],
        top_pos=summary["top_pos"], top_neg=summary["top_neg"],
        tokens_input=summary["tokens_input"],
        tokens_output=summary["tokens_output"],
        mapping=summary.get("mapping"),
    )

Step 4: 테스트 통과 확인

python -m pytest tests/test_ai_news_router.py -v

Expected: PASS — 2 tests.

Step 5: Commit

git add app/screener/router.py tests/test_ai_news_router.py
git commit -m "feat(ai_news): router forwards mapping stats to telegram"

Task 7: 전체 회귀 + scraper deprecate 주석

Files:

Modify: web-backend/stock-lab/app/screener/ai_news/scraper.py (주석만)
Step 1: scraper.py 상단에 deprecate 주석 추가

기존 docstring 을 다음으로 교체:

"""[DEPRECATED] 네이버 finance 종목 뉴스 스크래핑.

본 모듈은 ai_news Phase 1 (2026-05-14, `cdfa31b` spec) 에서 더 이상
파이프라인에서 사용되지 않음. 데이터 소스는 stock-lab 의 articles 테이블
(`ai_news/articles_source.py`) 로 전환됨.

삭제 시점: Phase 2 (DART 도입) 결정 후. IC 검증 4주 누적 후 노드 활성화
여부에 따라 본 모듈을 (a) 완전 삭제 또는 (b) DART 와 함께 ensemble
fallback 으로 재활용.
"""

다른 라인은 유지 (테스트가 여전히 import 함).

Step 2: 전체 stock-lab 테스트 실행

cd C:\Users\jaeoh\Desktop\workspace\web-backend\stock-lab
python -m pytest --ignore=app/test_scraper.py -q

Expected: 신규 6 + 갱신 테스트 포함 82 tests passed (이전 76 + ai_news_articles_source 6 - 변동 없음).

Step 3: Commit

git add app/screener/ai_news/scraper.py
git commit -m "docs(ai_news): mark scraper.py deprecated (Phase 1 transition)"

Task 8: 운영 검증 + 배포

Files: (실행만, 수동 점검)

Step 1: backend push

cd C:\Users\jaeoh\Desktop\workspace\web-backend
git push origin main

실패 시: 사용자에게 Gitea 자격증명 입력 요청.

Step 2: deployer 반영 확인 (~1분)

docker logs stock-lab --tail 20 2>&1 | grep -i "starting\|started"
docker logs agent-office --tail 20 2>&1 | grep -i "starting\|started"

두 컨테이너 모두 새 startup 시각 확인.

Step 3: 운영 DB 마이그레이션 자동 적용 확인

docker exec stock-lab python -c "
import sqlite3
c = sqlite3.connect('/app/data/stock.db')
cols = [r[1] for r in c.execute('PRAGMA table_info(news_sentiment)').fetchall()]
print('news_sentiment columns:', cols)
print('has source:', 'source' in cols)
"

Expected: has source: True.

Step 4: 수동 트리거

curl -X POST "https://gahusb.synology.me/api/agent-office/command" \
  -H "Content-Type: application/json" \
  -d '{"agent":"stock","action":"run_ai_news"}'

응답 {"ok": true} 받으면 30-60초 후 텔레그램에 메시지 도착.

Step 5: 텔레그램 메시지 검증

수신 메시지에 다음 패턴 모두 포함되는지 확인:

🌅 AI 뉴스 분석 (YYYY-MM-DD 08:00) 헤더
📈 호재 Top 5 / 📉 악재 Top 5 섹션
종목명 + 티커 형태 (예: 삼성전자 (005930))
매핑 N/100 ticker (M쌍 / articles K건) 라인 (신규)
토큰/비용 라인

매핑 hit_tickers 가 합리적 범위 (예: 20~60) 인지 확인.

Step 6: DB 검증

docker exec stock-lab python -c "
import sqlite3
c = sqlite3.connect('/app/data/stock.db')
rows = c.execute('SELECT COUNT(*), SUM(news_count), SUM(tokens_input) FROM news_sentiment WHERE date = date(\"now\") AND source = \"articles\"').fetchone()
print('articles rows / total_news / tokens:', rows)
# Naver 데이터와 비교
naver = c.execute('SELECT COUNT(*) FROM news_sentiment WHERE source = \"articles\"').fetchone()
print('all articles-source rows:', naver[0])
"

Expected: articles rows >= 10 (매핑 hit 종목 수), source='articles'.

Step 7: 메모리 업데이트

C:\Users\jaeoh\.claude\projects\C--Users-jaeoh-Desktop-workspace-web-ui\memory\project_stock_screener.md 의 hotfix 이력에 본 슬라이스 commits 추가:

Phase 1 (cdfa31b spec + 본 plan 의 task commit SHA들)
매핑 hit-rate 측정 결과 (예: "첫 실행 매핑 42/100, articles 35건, LLM cost ₩42")
다음 단계: 4주 후 IC 측정 결과 보고 Phase 2 (DART) 또는 노드 삭제 결정

완료 후 검증 체크리스트

본 plan 완료 시:

stock-lab news_sentiment 테이블에 source 컬럼 존재
운영 트리거 시 source='articles' 행 생성, news_count > 0
텔레그램 메시지에 매핑 N/100 라인 표시
외부 HTTP 호출 (Naver) 0건
LLM cost 텔레그램 ₩ 라인이 이전(~~₩60)보다 작거나 비슷 (~~₩40-80)
단위 테스트 신규 6 + 갱신 4 모두 통과, 기존 회귀 없음
news_sentiment.source 컬럼이 idempotent 하게 추가 (재기동 시 재추가 시도 없음)
legacy scraper.py 에 deprecate 주석 (코드 보존)

후속 슬라이스 (이번 plan 완료 후)

본 spec §15 명시:

Phase 1.5 — 매핑 hit-rate < 30% 면 alias dict 추가
Phase 2 — 4주 IC ≥ 0.05 시 DART OpenAPI 추가
Phase X — IC < 0.05 시 노드 deprecate

34 KiB Raw Blame History

AI News Phase 1 — articles Source Implementation Plan

파일 구조

Task 1: schema.py — news_sentiment.source 컬럼 + migration

Task 2: articles_source.py — DB 매핑 모듈 + 6 tests

Task 3: analyzer.py — prompt 에 summary/pub_date 포함

Task 4: pipeline.py — articles_source 사용으로 교체

Task 5: telegram.py — 매핑 라인 추가

Task 6: router.py — mapping 응답 필드 전달

Task 7: 전체 회귀 + scraper deprecate 주석

Task 8: 운영 검증 + 배포

완료 후 검증 체크리스트

후속 슬라이스 (이번 plan 완료 후)

34 KiB

Raw Blame History

Task 1: schema.py — `news_sentiment.source` 컬럼 + migration

Task 2: `articles_source.py` — DB 매핑 모듈 + 6 tests

Task 3: `analyzer.py` — prompt 에 summary/pub_date 포함

Task 4: `pipeline.py` — articles_source 사용으로 교체

Task 5: `telegram.py` — 매핑 라인 추가

Task 6: `router.py` — mapping 응답 필드 전달