diff --git a/docs/superpowers/plans/2026-05-14-ai-news-articles-source.md b/docs/superpowers/plans/2026-05-14-ai-news-articles-source.md new file mode 100644 index 0000000..23a56dc --- /dev/null +++ b/docs/superpowers/plans/2026-05-14-ai-news-articles-source.md @@ -0,0 +1,999 @@ +# AI News Phase 1 — articles Source Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** ai_news 파이프라인의 데이터 소스를 Naver 스크래퍼에서 기존 `articles` 테이블로 교체. 종목명 substring 매핑으로 시총 상위 100 ticker 의 뉴스 sentiment 산출. `news_sentiment.source` 컬럼 추가로 Phase 2 비교 baseline 확보. + +**Architecture:** 신규 `articles_source.py` 모듈이 `articles` 테이블 + `krx_master.name` substring 매핑으로 ticker별 뉴스 dict 반환. `pipeline.py`는 scraper 호출 대신 articles_source 사용. `analyzer.py` 가 LLM prompt 에 `summary` 포함. 텔레그램 메시지에 매핑 hit-rate 라인 추가. legacy `scraper.py` 는 deprecate 주석만 추가하고 보존. + +**Tech Stack:** Python 3.11 / SQLite (WAL + busy_timeout) / anthropic AsyncClient / FastAPI / pytest + pytest-asyncio. + +**선행 spec**: `web-ui/docs/superpowers/specs/2026-05-14-ai-news-articles-source-design.md` + +--- + +## 파일 구조 + +신규 파일 (backend): +``` +web-backend/stock-lab/app/screener/ai_news/ + articles_source.py ← DB articles 조회 + 종목명 매핑 + +web-backend/stock-lab/tests/ + test_ai_news_articles_source.py ← 6 tests +``` + +수정 파일 (backend): +``` +web-backend/stock-lab/app/screener/ + schema.py ← news_sentiment.source 컬럼 + migration + ai_news/pipeline.py ← articles_source 사용, _make_http 제거 + ai_news/analyzer.py ← prompt에 summary/pub_date 포함 + ai_news/telegram.py ← build_message 에 mapping 라인 + ai_news/scraper.py ← deprecate 주석만 추가 + router.py ← post_refresh_news_sentiment 에 mapping 전달 + +web-backend/stock-lab/tests/ + test_ai_news_pipeline.py ← articles_source mock 으로 갱신 + test_ai_news_analyzer.py ← summary 케이스 추가 + test_ai_news_telegram.py ← mapping 인자 케이스 추가 + test_ai_news_router.py ← mapping 응답 필드 검증 +``` + +--- + +### Task 1: schema.py — `news_sentiment.source` 컬럼 + migration + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/schema.py` + +- [ ] **Step 1: DDL 본문에 `source` 컬럼 정의 추가** + +`schema.py` 의 `DDL` 문자열 안 `news_sentiment` 테이블 정의에 `source` 컬럼을 `model` 컬럼 다음에 추가: +```sql +CREATE TABLE IF NOT EXISTS news_sentiment ( + ticker TEXT NOT NULL, + date TEXT NOT NULL, + score_raw REAL NOT NULL, + reason TEXT NOT NULL DEFAULT '', + news_count INTEGER NOT NULL DEFAULT 0, + tokens_input INTEGER NOT NULL DEFAULT 0, + tokens_output INTEGER NOT NULL DEFAULT 0, + model TEXT NOT NULL DEFAULT 'claude-haiku-4-5-20251001', + source TEXT NOT NULL DEFAULT 'articles', + created_at TEXT NOT NULL DEFAULT (datetime('now','localtime')), + PRIMARY KEY (ticker, date) +); +``` + +- [ ] **Step 2: `ensure_screener_schema()` 함수에 1회성 migration 블록 추가** + +기존 ai_news weight migration 블록 (라인 ~142-156 근처) 직전 또는 직후에 다음을 추가: +```python + # news_sentiment.source 컬럼 1회 추가 (기존 운영 환경) + cols = {r[1] for r in conn.execute( + "PRAGMA table_info(news_sentiment)" + ).fetchall()} + if "source" not in cols: + conn.execute( + "ALTER TABLE news_sentiment " + "ADD COLUMN source TEXT NOT NULL DEFAULT 'articles'" + ) +``` + +위치는 `executescript(DDL)` 직후, 기존 ai_news weight migration block 안이 자연스러움. + +- [ ] **Step 3: 기존 schema 테스트 회귀** + +```bash +cd C:\Users\jaeoh\Desktop\workspace\web-backend\stock-lab +python -m pytest app/test_screener_schema.py -v +``` +Expected: PASS — 3 tests passed (migration 추가에도 idempotency 유지). + +- [ ] **Step 4: Commit** + +```bash +git add app/screener/schema.py +git commit -m "feat(ai_news): add news_sentiment.source column with migration" +``` + +--- + +### Task 2: `articles_source.py` — DB 매핑 모듈 + 6 tests + +**Files:** +- Create: `web-backend/stock-lab/app/screener/ai_news/articles_source.py` +- Test: `web-backend/stock-lab/tests/test_ai_news_articles_source.py` + +- [ ] **Step 1: 실패하는 테스트 작성** + +`tests/test_ai_news_articles_source.py`: +```python +import datetime as dt +import sqlite3 +import pytest + +from app.screener.ai_news import articles_source +from app.screener.schema import ensure_screener_schema + + +@pytest.fixture +def conn(): + c = sqlite3.connect(":memory:") + c.row_factory = sqlite3.Row + ensure_screener_schema(c) + # krx_master + articles 시드 helper 는 각 테스트에서 진행 + yield c + c.close() + + +def _seed_master(conn, ticker, name): + conn.execute( + "INSERT INTO krx_master (ticker, name, market, market_cap, updated_at) " + "VALUES (?, ?, 'KOSPI', 1_000_000_000, datetime('now'))", + (ticker, name), + ) + + +def _seed_article(conn, title, summary="", crawled_at="2026-05-14T07:30:00"): + import hashlib + h = hashlib.md5(f"{title}|x".encode()).hexdigest() + conn.execute( + "INSERT INTO articles (hash, title, summary, link, press, pub_date, crawled_at) " + "VALUES (?, ?, ?, '', '', '2026-05-14', ?)", + (h, title, summary, crawled_at), + ) + + +ASOF = dt.date(2026, 5, 14) + + +def test_single_ticker_match_in_title(conn): + _seed_master(conn, "005930", "삼성전자") + _seed_article(conn, "삼성전자, HBM 양산 가시화") + conn.commit() + out, stats = articles_source.gather_articles_for_tickers( + conn, ["005930"], ASOF, window_days=1, max_per_ticker=5, + ) + assert len(out["005930"]) == 1 + assert out["005930"][0]["title"] == "삼성전자, HBM 양산 가시화" + assert stats["matched_pairs"] == 1 + assert stats["hit_tickers"] == 1 + + +def test_single_ticker_match_in_summary(conn): + _seed_master(conn, "005930", "삼성전자") + _seed_article(conn, "메모리 시장 회복세", summary="삼성전자가 1분기 어닝 서프라이즈") + conn.commit() + out, _ = articles_source.gather_articles_for_tickers( + conn, ["005930"], ASOF, window_days=1, max_per_ticker=5, + ) + assert len(out["005930"]) == 1 + + +def test_multi_ticker_match(conn): + _seed_master(conn, "005930", "삼성전자") + _seed_master(conn, "000660", "SK하이닉스") + _seed_article(conn, "삼성전자와 SK하이닉스, 메모리 양산 경쟁") + conn.commit() + out, stats = articles_source.gather_articles_for_tickers( + conn, ["005930", "000660"], ASOF, window_days=1, max_per_ticker=5, + ) + assert len(out["005930"]) == 1 + assert len(out["000660"]) == 1 + assert stats["matched_pairs"] == 2 + assert stats["hit_tickers"] == 2 + + +def test_no_match_returns_empty_list(conn): + _seed_master(conn, "005930", "삼성전자") + _seed_article(conn, "엔비디아 실적 발표", summary="AI 칩 수요 견조") + conn.commit() + out, stats = articles_source.gather_articles_for_tickers( + conn, ["005930"], ASOF, window_days=1, max_per_ticker=5, + ) + assert out["005930"] == [] + assert stats["matched_pairs"] == 0 + assert stats["hit_tickers"] == 0 + + +def test_max_per_ticker_caps_results(conn): + _seed_master(conn, "005930", "삼성전자") + for i in range(6): + _seed_article(conn, f"삼성전자 뉴스 #{i}", crawled_at=f"2026-05-14T0{i}:00:00") + conn.commit() + out, _ = articles_source.gather_articles_for_tickers( + conn, ["005930"], ASOF, window_days=1, max_per_ticker=5, + ) + assert len(out["005930"]) == 5 + + +def test_window_days_filters_old_articles(conn): + _seed_master(conn, "005930", "삼성전자") + _seed_article(conn, "삼성전자 최신 뉴스", crawled_at="2026-05-14T07:00:00") + _seed_article(conn, "삼성전자 오래된 뉴스", crawled_at="2026-05-01T07:00:00") + conn.commit() + out, _ = articles_source.gather_articles_for_tickers( + conn, ["005930"], ASOF, window_days=1, max_per_ticker=5, + ) + assert len(out["005930"]) == 1 + assert "최신" in out["005930"][0]["title"] +``` + +- [ ] **Step 2: 테스트 실패 확인** + +```bash +python -m pytest tests/test_ai_news_articles_source.py -v +``` +Expected: FAIL — "No module named 'app.screener.ai_news.articles_source'". + +- [ ] **Step 3: `articles_source.py` 구현** — 정확히: + +```python +"""기존 articles 테이블에서 종목별 뉴스 매핑.""" + +from __future__ import annotations + +import datetime as dt +import logging +import sqlite3 +from typing import Any, Dict, List, Tuple + +log = logging.getLogger(__name__) + + +def gather_articles_for_tickers( + conn: sqlite3.Connection, + tickers: List[str], + asof: dt.date, + *, + window_days: int = 1, + max_per_ticker: int = 5, +) -> Tuple[Dict[str, List[Dict[str, Any]]], Dict[str, int]]: + """articles 에서 ticker.name substring 매칭으로 종목별 뉴스 dict 반환. + + Returns: + ( + {ticker: [{"title": str, "summary": str, "press": str, "pub_date": str}, ...]}, + {"total_articles": int, "matched_pairs": int, "hit_tickers": int}, + ) + """ + out: Dict[str, List[Dict[str, Any]]] = {t: [] for t in tickers} + stats = {"total_articles": 0, "matched_pairs": 0, "hit_tickers": 0} + + if not tickers: + return out, stats + + cutoff = (asof - dt.timedelta(days=window_days)).isoformat() + + placeholders = ",".join("?" * len(tickers)) + name_rows = conn.execute( + f"SELECT ticker, name FROM krx_master WHERE ticker IN ({placeholders})", + tickers, + ).fetchall() + # 2글자 미만 회사명은 false positive 위험으로 제외 + name_map = {r[0]: r[1] for r in name_rows if r[1] and len(r[1]) >= 2} + + articles = conn.execute( + "SELECT title, summary, press, pub_date, crawled_at " + "FROM articles WHERE crawled_at >= ? ORDER BY crawled_at DESC", + (cutoff,), + ).fetchall() + stats["total_articles"] = len(articles) + + for a in articles: + title = (a[0] or "").strip() + summary = (a[1] or "").strip() + haystack = title + " " + summary + for ticker, name in name_map.items(): + if name not in haystack: + continue + if len(out[ticker]) >= max_per_ticker: + continue + out[ticker].append({ + "title": title, + "summary": summary, + "press": a[2] or "", + "pub_date": a[3] or "", + }) + stats["matched_pairs"] += 1 + + stats["hit_tickers"] = sum(1 for arts in out.values() if arts) + return out, stats +``` + +- [ ] **Step 4: 테스트 통과 확인** + +```bash +python -m pytest tests/test_ai_news_articles_source.py -v +``` +Expected: PASS — 6 tests passed. + +- [ ] **Step 5: Commit** + +```bash +git add app/screener/ai_news/articles_source.py tests/test_ai_news_articles_source.py +git commit -m "feat(ai_news): articles_source module (substring ticker matching)" +``` + +--- + +### Task 3: `analyzer.py` — prompt 에 summary/pub_date 포함 + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/ai_news/analyzer.py` +- Modify: `web-backend/stock-lab/tests/test_ai_news_analyzer.py` + +- [ ] **Step 1: 테스트 갱신 (실패 유도)** + +`tests/test_ai_news_analyzer.py` 의 `NEWS` 상수와 `test_score_sentiment_success_parses_json` 테스트를 다음으로 교체/보강: +```python +NEWS = [ + {"title": "삼성전자, HBM 양산", "summary": "1분기 영업이익 사상 최대", "pub_date": "2026-05-14"}, + {"title": "메모리 가격 반등", "summary": "", "pub_date": "2026-05-14"}, +] + + +@pytest.mark.asyncio +async def test_score_sentiment_includes_summary_in_prompt(): + """summary 가 있으면 prompt 에 포함, 없으면 title 만.""" + llm = _mk_llm(json.dumps({"score": 5.0, "reason": "ok"})) + await analyzer.score_sentiment(llm, "005930", NEWS, name="삼성전자") + # mock 의 messages.create 호출 인자 확인 + call = llm.messages.create.call_args + user_msg = call.kwargs["messages"][0]["content"] + assert "1분기 영업이익 사상 최대" in user_msg # summary 포함 + assert "삼성전자, HBM 양산" in user_msg # title 포함 + assert "2026-05-14" in user_msg # pub_date 포함 +``` + +- [ ] **Step 2: 테스트 실행으로 실패 확인** + +```bash +python -m pytest tests/test_ai_news_analyzer.py::test_score_sentiment_includes_summary_in_prompt -v +``` +Expected: FAIL — `1분기 영업이익 사상 최대` 가 prompt 에 없음. + +- [ ] **Step 3: `analyzer.py` 의 news_block 빌더 분리 + summary 포함** + +기존 prompt 빌드 부분 수정. `score_sentiment` 함수의 prompt build 직전에 helper 함수 추가: + +```python +def _format_news_block(news: List[Dict[str, Any]]) -> str: + """news dict 리스트 → prompt 에 들어가는 텍스트 블록. + + summary 가 있으면 title 다음 줄에 indent 해서 포함 (최대 200자). + pub_date 가 있으면 title 앞에 표시. + """ + lines: List[str] = [] + for n in news: + date = (n.get("pub_date") or "").strip() + title = (n.get("title") or "").strip() + summary = (n.get("summary") or "").strip() + prefix = f"[{date}] " if date else "" + if summary: + lines.append(f"- {prefix}{title}\n {summary[:200]}") + else: + lines.append(f"- {prefix}{title}") + return "\n".join(lines) +``` + +그리고 `score_sentiment` 안 `news_block` 계산 라인을 다음으로 교체: +```python + news_block = _format_news_block(news) +``` + +- [ ] **Step 4: 테스트 통과 확인** + +```bash +python -m pytest tests/test_ai_news_analyzer.py -v +``` +Expected: PASS — 5 tests (기존 4 + 신규 1) 모두 통과. + +- [ ] **Step 5: Commit** + +```bash +git add app/screener/ai_news/analyzer.py tests/test_ai_news_analyzer.py +git commit -m "feat(ai_news): include summary + pub_date in LLM prompt" +``` + +--- + +### Task 4: `pipeline.py` — articles_source 사용으로 교체 + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/ai_news/pipeline.py` +- Modify: `web-backend/stock-lab/tests/test_ai_news_pipeline.py` + +- [ ] **Step 1: 테스트 갱신 (실패 유도)** + +`tests/test_ai_news_pipeline.py` 의 `test_refresh_daily_happy_path` 를 다음으로 교체: +```python +@pytest.mark.asyncio +async def test_refresh_daily_happy_path(conn): + """3종목 mini integration — articles_source mock + analyzer mock. + + 각 종목에 매핑되는 articles 1개씩 있다고 가정. + """ + asof = dt.date(2026, 5, 13) + + fake_articles_by_ticker = { + "005930": [{"title": "삼성 뉴스", "summary": "", "press": "", "pub_date": ""}], + "000660": [{"title": "SK 뉴스", "summary": "", "press": "", "pub_date": ""}], + "373220": [{"title": "LG 뉴스", "summary": "", "press": "", "pub_date": ""}], + } + fake_stats = {"total_articles": 3, "matched_pairs": 3, "hit_tickers": 3} + + scores_by_ticker = { + "005930": 7.5, "000660": 4.0, "373220": -6.0, + } + async def fake_score(llm, ticker, news, *, name=None, model="m"): + return { + "ticker": ticker, "score_raw": scores_by_ticker[ticker], + "reason": f"r{ticker}", "news_count": 1, + "tokens_input": 100, "tokens_output": 20, "model": model, + } + + with patch.object(pipeline, "articles_source") as mas, \ + patch.object(pipeline, "_analyzer") as ma, \ + patch.object(pipeline, "_make_llm") as ml: + mas.gather_articles_for_tickers = MagicMock( + return_value=(fake_articles_by_ticker, fake_stats) + ) + ma.score_sentiment = fake_score + ml.return_value.__aenter__.return_value = AsyncMock() + ml.return_value.__aexit__.return_value = None + result = await pipeline.refresh_daily(conn, asof, concurrency=3) + + assert result["asof"] == "2026-05-13" + assert result["updated"] == 3 + assert result["failures"] == [] + assert result["top_pos"][0]["ticker"] == "005930" + assert result["top_neg"][0]["ticker"] == "373220" + assert result["mapping"] == fake_stats + + rows = conn.execute("SELECT ticker, score_raw, source FROM news_sentiment " + "WHERE date=?", ("2026-05-13",)).fetchall() + assert len(rows) == 3 + assert all(r["source"] == "articles" for r in rows) + + +@pytest.mark.asyncio +async def test_refresh_daily_no_match_ticker_skipped(conn): + """매핑 0인 ticker 는 LLM 호출 skip + news_sentiment 행 미생성.""" + asof = dt.date(2026, 5, 13) + + fake_articles_by_ticker = { + "005930": [{"title": "삼성", "summary": "", "press": "", "pub_date": ""}], + "000660": [], # 매핑 없음 + "373220": [], # 매핑 없음 + } + fake_stats = {"total_articles": 1, "matched_pairs": 1, "hit_tickers": 1} + + async def fake_score(llm, ticker, news, *, name=None, model="m"): + return { + "ticker": ticker, "score_raw": 5.0, "reason": "r", + "news_count": 1, "tokens_input": 100, "tokens_output": 20, + "model": model, + } + + with patch.object(pipeline, "articles_source") as mas, \ + patch.object(pipeline, "_analyzer") as ma, \ + patch.object(pipeline, "_make_llm") as ml: + mas.gather_articles_for_tickers = MagicMock( + return_value=(fake_articles_by_ticker, fake_stats) + ) + ma.score_sentiment = fake_score + ml.return_value.__aenter__.return_value = AsyncMock() + ml.return_value.__aexit__.return_value = None + result = await pipeline.refresh_daily(conn, asof, concurrency=3) + + assert result["updated"] == 1 + rows = conn.execute("SELECT ticker FROM news_sentiment " + "WHERE date=?", ("2026-05-13",)).fetchall() + assert {r["ticker"] for r in rows} == {"005930"} +``` + +기존 `test_refresh_daily_failures_isolated` 는 articles_source 매핑 데이터를 추가해야 함: +```python +@pytest.mark.asyncio +async def test_refresh_daily_failures_isolated(conn): + asof = dt.date(2026, 5, 13) + + fake_articles_by_ticker = { + "005930": [{"title": "h", "summary": "", "press": "", "pub_date": ""}], + "000660": [{"title": "h", "summary": "", "press": "", "pub_date": ""}], + "373220": [{"title": "h", "summary": "", "press": "", "pub_date": ""}], + } + fake_stats = {"total_articles": 3, "matched_pairs": 3, "hit_tickers": 3} + + async def fake_score(llm, ticker, news, *, name=None, model="m"): + if ticker == "000660": + raise RuntimeError("llm exploded") + return { + "ticker": ticker, "score_raw": 5.0, "reason": "r", "news_count": 1, + "tokens_input": 100, "tokens_output": 20, "model": model, + } + + with patch.object(pipeline, "articles_source") as mas, \ + patch.object(pipeline, "_analyzer") as ma, \ + patch.object(pipeline, "_make_llm") as ml: + mas.gather_articles_for_tickers = MagicMock( + return_value=(fake_articles_by_ticker, fake_stats) + ) + ma.score_sentiment = fake_score + ml.return_value.__aenter__.return_value = AsyncMock() + ml.return_value.__aexit__.return_value = None + result = await pipeline.refresh_daily(conn, asof, concurrency=3) + + assert result["updated"] == 2 + assert len(result["failures"]) == 1 +``` + +상단 import 에 `MagicMock` 추가 확인: +```python +from unittest.mock import AsyncMock, MagicMock, patch +``` + +- [ ] **Step 2: 테스트 실패 확인** + +```bash +python -m pytest tests/test_ai_news_pipeline.py -v +``` +Expected: FAIL — pipeline 이 articles_source 를 아직 사용 안 함. + +- [ ] **Step 3: `pipeline.py` 본문 교체** + +`pipeline.py` 의 다음을 변경: + +(1) 상단 import 에 articles_source 추가: +```python +from . import scraper as _scraper # legacy, kept for backward import +from . import analyzer as _analyzer +from . import articles_source # 신규 +``` + +(2) `_make_http()` 함수와 `DEFAULT_RATE_LIMIT_SEC` 상수는 제거 (또는 deprecate). 더 이상 사용 안 함. + +(3) `_process_one()` 함수를 다음으로 교체: +```python +async def _process_one( + ticker: str, name: str, articles: List[Dict[str, Any]], + sem: asyncio.Semaphore, llm, model: str, +) -> Dict[str, Any]: + async with sem: + return await _analyzer.score_sentiment( + llm, ticker, articles, name=name, model=model, + ) +``` + +(4) `refresh_daily()` 시그니처 + 본문 교체: +```python +async def refresh_daily( + conn: sqlite3.Connection, + asof: dt.date, + *, + top_n: int = DEFAULT_TOP_N, + concurrency: int = DEFAULT_CONCURRENCY, + max_news_per_ticker: int = DEFAULT_NEWS_PER_TICKER, + window_days: int = 1, + model: str = _analyzer.DEFAULT_MODEL, +) -> Dict[str, Any]: + started = time.time() + tickers = _top_market_cap_tickers(conn, n=top_n) + name_map = { + r[0]: r[1] for r in conn.execute( + f"SELECT ticker, name FROM krx_master WHERE ticker IN " + f"({','.join('?' * len(tickers))})", tickers, + ).fetchall() + } if tickers else {} + + articles_by_ticker, mapping_stats = articles_source.gather_articles_for_tickers( + conn, tickers, asof, + window_days=window_days, + max_per_ticker=max_news_per_ticker, + ) + + sem = asyncio.Semaphore(concurrency) + async with _make_llm() as llm: + tasks = [] + for t in tickers: + arts = articles_by_ticker.get(t, []) + if not arts: + continue # 매핑 0 — score 미생성 + tasks.append(_process_one(t, name_map.get(t, t), arts, sem, llm, model)) + raw_results = await asyncio.gather(*tasks, return_exceptions=True) + + successes: List[Dict[str, Any]] = [] + failures: List[str] = [] + for r in raw_results: + if isinstance(r, BaseException): + failures.append(repr(r)) + elif isinstance(r, dict): + successes.append(r) + + if successes: + _upsert_news_sentiment(conn, asof, successes, source="articles") + + top_pos = sorted(successes, key=lambda r: -r["score_raw"])[:5] + top_neg = sorted(successes, key=lambda r: r["score_raw"])[:5] + + return { + "asof": asof.isoformat(), + "updated": len(successes), + "failures": failures, + "duration_sec": round(time.time() - started, 2), + "tokens_input": sum(r["tokens_input"] for r in successes), + "tokens_output": sum(r["tokens_output"] for r in successes), + "top_pos": top_pos, + "top_neg": top_neg, + "model": model, + "mapping": mapping_stats, + } +``` + +(5) `_upsert_news_sentiment()` 함수에 `source` 인자 추가 + INSERT 에 컬럼 포함: +```python +def _upsert_news_sentiment( + conn: sqlite3.Connection, asof: dt.date, + rows: List[Dict[str, Any]], *, source: str = "articles", +) -> None: + iso = asof.isoformat() + data = [ + ( + r["ticker"], iso, r["score_raw"], r["reason"], r["news_count"], + r["tokens_input"], r["tokens_output"], r["model"], source, + ) + for r in rows + ] + conn.executemany( + """INSERT INTO news_sentiment + (ticker, date, score_raw, reason, news_count, + tokens_input, tokens_output, model, source) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) + ON CONFLICT(ticker, date) DO UPDATE SET + score_raw=excluded.score_raw, + reason=excluded.reason, + news_count=excluded.news_count, + tokens_input=excluded.tokens_input, + tokens_output=excluded.tokens_output, + model=excluded.model, + source=excluded.source + """, + data, + ) + conn.commit() +``` + +- [ ] **Step 4: 테스트 통과 확인** + +```bash +python -m pytest tests/test_ai_news_pipeline.py -v +``` +Expected: PASS — `test_refresh_daily_happy_path`, `test_refresh_daily_failures_isolated`, `test_refresh_daily_no_match_ticker_skipped`, `test_top_market_cap_tickers` 모두 통과 (4 tests). + +- [ ] **Step 5: Commit** + +```bash +git add app/screener/ai_news/pipeline.py tests/test_ai_news_pipeline.py +git commit -m "feat(ai_news): pipeline uses articles_source (replaces Naver scraper)" +``` + +--- + +### Task 5: `telegram.py` — 매핑 라인 추가 + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/ai_news/telegram.py` +- Modify: `web-backend/stock-lab/tests/test_ai_news_telegram.py` + +- [ ] **Step 1: 테스트 갱신 (실패 유도)** + +`tests/test_ai_news_telegram.py` 끝에 새 테스트 추가: +```python +def test_build_message_includes_mapping_line(): + msg = tg.build_message( + asof="2026-05-14", + top_pos=[_row("005930", 8.5, "HBM 호재")], + top_neg=[], + tokens_input=1000, tokens_output=200, + mapping={"total_articles": 35, "matched_pairs": 50, "hit_tickers": 42}, + ) + assert "매핑" in msg + assert "42" in msg + assert "50" in msg + assert "35" in msg + + +def test_build_message_without_mapping_omits_line(): + msg = tg.build_message( + asof="2026-05-14", + top_pos=[], + top_neg=[], + tokens_input=1000, tokens_output=200, + ) + assert "매핑" not in msg +``` + +- [ ] **Step 2: 테스트 실패 확인** + +```bash +python -m pytest tests/test_ai_news_telegram.py -v +``` +Expected: FAIL — `mapping` 인자 미지원. + +- [ ] **Step 3: `telegram.py` 의 `build_message` 시그니처 + footer 갱신** + +```python +def build_message( + *, + asof: str, + top_pos: List[Dict[str, Any]], + top_neg: List[Dict[str, Any]], + tokens_input: int, + tokens_output: int, + mapping: Dict[str, int] | None = None, +) -> str: + lines: List[str] = [ + f"🌅 *AI 뉴스 분석* \\({_escape(asof)} 08:00\\)", + "", + "📈 *호재 Top 5*", + ] + if top_pos: + for i, r in enumerate(top_pos, 1): + lines.append(_row_line(i, r)) + else: + lines.append(_escape("- (없음)")) + + lines += ["", "📉 *악재 Top 5*"] + if top_neg: + for i, r in enumerate(top_neg, 1): + lines.append(_row_line(i, r)) + else: + lines.append(_escape("- (없음)")) + + cost = _cost_won(tokens_input, tokens_output) + mapping_part = "" + if mapping: + mapping_part = ( + f"매핑 {mapping['hit_tickers']}/100 ticker " + f"\\({mapping['matched_pairs']}쌍 / articles {mapping['total_articles']}건\\) · " + ) + lines += [ + "", + f"_분석: 시총 상위 100종목 · {mapping_part}" + f"토큰 {tokens_input:,} in / {tokens_output:,} out · " + f"약 ₩{cost:,}_", + ] + return "\n".join(lines) +``` + +- [ ] **Step 4: 테스트 통과 확인** + +```bash +python -m pytest tests/test_ai_news_telegram.py -v +``` +Expected: PASS — 6 tests (기존 4 + 신규 2) 모두 통과. + +- [ ] **Step 5: Commit** + +```bash +git add app/screener/ai_news/telegram.py tests/test_ai_news_telegram.py +git commit -m "feat(ai_news): telegram includes article mapping stats line" +``` + +--- + +### Task 6: `router.py` — mapping 응답 필드 전달 + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/router.py` +- Modify: `web-backend/stock-lab/tests/test_ai_news_router.py` + +- [ ] **Step 1: 테스트 갱신** + +`tests/test_ai_news_router.py` 의 `test_refresh_news_sentiment_weekday_invokes_pipeline` 보강: +```python +def test_refresh_news_sentiment_weekday_invokes_pipeline(): + fake_summary = { + "asof": "2026-05-13", "updated": 3, "failures": [], + "duration_sec": 1.0, "tokens_input": 100, "tokens_output": 20, + "top_pos": [], "top_neg": [], "model": "m", + "mapping": {"total_articles": 5, "matched_pairs": 8, "hit_tickers": 3}, + } + with patch("app.screener.router._ai_pipeline") as mp, \ + patch("app.screener.router._ai_telegram") as mt: + mp.refresh_daily = AsyncMock(return_value=fake_summary) + mt.build_message = lambda **kw: f"TEXT_with_mapping={kw.get('mapping')}" + client = TestClient(app) + resp = client.post( + "/api/stock/screener/snapshot/refresh-news-sentiment?asof=2026-05-13" + ) + assert resp.status_code == 200 + body = resp.json() + assert body["mapping"]["hit_tickers"] == 3 + assert "mapping=" in body["telegram_text"] +``` + +- [ ] **Step 2: 테스트 실패 확인** + +```bash +python -m pytest tests/test_ai_news_router.py -v +``` +Expected: FAIL — `mapping` 이 build_message 호출에 전달되지 않음. + +- [ ] **Step 3: `router.py` 의 `post_refresh_news_sentiment` 의 telegram_text 빌드 갱신** + +기존: +```python + summary["telegram_text"] = _ai_telegram.build_message( + asof=summary["asof"], + top_pos=summary["top_pos"], top_neg=summary["top_neg"], + tokens_input=summary["tokens_input"], + tokens_output=summary["tokens_output"], + ) +``` + +다음으로 교체: +```python + summary["telegram_text"] = _ai_telegram.build_message( + asof=summary["asof"], + top_pos=summary["top_pos"], top_neg=summary["top_neg"], + tokens_input=summary["tokens_input"], + tokens_output=summary["tokens_output"], + mapping=summary.get("mapping"), + ) +``` + +- [ ] **Step 4: 테스트 통과 확인** + +```bash +python -m pytest tests/test_ai_news_router.py -v +``` +Expected: PASS — 2 tests. + +- [ ] **Step 5: Commit** + +```bash +git add app/screener/router.py tests/test_ai_news_router.py +git commit -m "feat(ai_news): router forwards mapping stats to telegram" +``` + +--- + +### Task 7: 전체 회귀 + scraper deprecate 주석 + +**Files:** +- Modify: `web-backend/stock-lab/app/screener/ai_news/scraper.py` (주석만) + +- [ ] **Step 1: scraper.py 상단에 deprecate 주석 추가** + +기존 docstring 을 다음으로 교체: +```python +"""[DEPRECATED] 네이버 finance 종목 뉴스 스크래핑. + +본 모듈은 ai_news Phase 1 (2026-05-14, `cdfa31b` spec) 에서 더 이상 +파이프라인에서 사용되지 않음. 데이터 소스는 stock-lab 의 articles 테이블 +(`ai_news/articles_source.py`) 로 전환됨. + +삭제 시점: Phase 2 (DART 도입) 결정 후. IC 검증 4주 누적 후 노드 활성화 +여부에 따라 본 모듈을 (a) 완전 삭제 또는 (b) DART 와 함께 ensemble +fallback 으로 재활용. +""" +``` + +다른 라인은 유지 (테스트가 여전히 import 함). + +- [ ] **Step 2: 전체 stock-lab 테스트 실행** + +```bash +cd C:\Users\jaeoh\Desktop\workspace\web-backend\stock-lab +python -m pytest --ignore=app/test_scraper.py -q +``` +Expected: 신규 6 + 갱신 테스트 포함 **82 tests passed** (이전 76 + ai_news_articles_source 6 - 변동 없음). + +- [ ] **Step 3: Commit** + +```bash +git add app/screener/ai_news/scraper.py +git commit -m "docs(ai_news): mark scraper.py deprecated (Phase 1 transition)" +``` + +--- + +### Task 8: 운영 검증 + 배포 + +**Files:** (실행만, 수동 점검) + +- [ ] **Step 1: backend push** + +```bash +cd C:\Users\jaeoh\Desktop\workspace\web-backend +git push origin main +``` +실패 시: 사용자에게 Gitea 자격증명 입력 요청. + +- [ ] **Step 2: deployer 반영 확인 (~1분)** + +```bash +docker logs stock-lab --tail 20 2>&1 | grep -i "starting\|started" +docker logs agent-office --tail 20 2>&1 | grep -i "starting\|started" +``` +두 컨테이너 모두 새 startup 시각 확인. + +- [ ] **Step 3: 운영 DB 마이그레이션 자동 적용 확인** + +```bash +docker exec stock-lab python -c " +import sqlite3 +c = sqlite3.connect('/app/data/stock.db') +cols = [r[1] for r in c.execute('PRAGMA table_info(news_sentiment)').fetchall()] +print('news_sentiment columns:', cols) +print('has source:', 'source' in cols) +" +``` +Expected: `has source: True`. + +- [ ] **Step 4: 수동 트리거** + +```bash +curl -X POST "https://gahusb.synology.me/api/agent-office/command" \ + -H "Content-Type: application/json" \ + -d '{"agent":"stock","action":"run_ai_news"}' +``` +응답 `{"ok": true}` 받으면 30-60초 후 텔레그램에 메시지 도착. + +- [ ] **Step 5: 텔레그램 메시지 검증** + +수신 메시지에 다음 패턴 모두 포함되는지 확인: +- `🌅 AI 뉴스 분석 (YYYY-MM-DD 08:00)` 헤더 +- `📈 호재 Top 5` / `📉 악재 Top 5` 섹션 +- 종목명 + 티커 형태 (예: `삼성전자 (005930)`) +- `매핑 N/100 ticker (M쌍 / articles K건)` 라인 (신규) +- 토큰/비용 라인 + +매핑 hit_tickers 가 합리적 범위 (예: 20~60) 인지 확인. + +- [ ] **Step 6: DB 검증** + +```bash +docker exec stock-lab python -c " +import sqlite3 +c = sqlite3.connect('/app/data/stock.db') +rows = c.execute('SELECT COUNT(*), SUM(news_count), SUM(tokens_input) FROM news_sentiment WHERE date = date(\"now\") AND source = \"articles\"').fetchone() +print('articles rows / total_news / tokens:', rows) +# Naver 데이터와 비교 +naver = c.execute('SELECT COUNT(*) FROM news_sentiment WHERE source = \"articles\"').fetchone() +print('all articles-source rows:', naver[0]) +" +``` +Expected: `articles rows >= 10` (매핑 hit 종목 수), `source='articles'`. + +- [ ] **Step 7: 메모리 업데이트** + +`C:\Users\jaeoh\.claude\projects\C--Users-jaeoh-Desktop-workspace-web-ui\memory\project_stock_screener.md` 의 hotfix 이력에 본 슬라이스 commits 추가: +- Phase 1 (`cdfa31b` spec + 본 plan 의 task commit SHA들) +- 매핑 hit-rate 측정 결과 (예: "첫 실행 매핑 42/100, articles 35건, LLM cost ₩42") +- 다음 단계: 4주 후 IC 측정 결과 보고 Phase 2 (DART) 또는 노드 삭제 결정 + +--- + +## 완료 후 검증 체크리스트 + +본 plan 완료 시: +- [ ] stock-lab `news_sentiment` 테이블에 `source` 컬럼 존재 +- [ ] 운영 트리거 시 source='articles' 행 생성, news_count > 0 +- [ ] 텔레그램 메시지에 매핑 N/100 라인 표시 +- [ ] 외부 HTTP 호출 (Naver) 0건 +- [ ] LLM cost 텔레그램 ₩ 라인이 이전(~₩60)보다 작거나 비슷 (~₩40-80) +- [ ] 단위 테스트 신규 6 + 갱신 4 모두 통과, 기존 회귀 없음 +- [ ] `news_sentiment.source` 컬럼이 idempotent 하게 추가 (재기동 시 재추가 시도 없음) +- [ ] legacy `scraper.py` 에 deprecate 주석 (코드 보존) + +## 후속 슬라이스 (이번 plan 완료 후) + +본 spec §15 명시: +- **Phase 1.5** — 매핑 hit-rate < 30% 면 alias dict 추가 +- **Phase 2** — 4주 IC ≥ 0.05 시 DART OpenAPI 추가 +- **Phase X** — IC < 0.05 시 노드 deprecate