docs(insta-trends): implementation plan (10 TDD-grouped tasks)

trend_collector NAVER+Google+LLM 분류, db migration + preferences CRUD, extract_with_weights, 4 endpoints + keywords source 필터, InstaAgent collect_trends action + preferences-aware schedule, web-ui 탭 + 3 패널, 스모크 매트릭스.
docs(insta-trends): 셀프 리뷰 보강 — LLM 분류 캐시 위치, days 쿼리 의미 명시
2026-05-16 17:39:19 +09:00 · 2026-05-16 17:31:22 +09:00 · 2026-05-16 17:30:45 +09:00 · 2026-05-16 02:11:38 +09:00 · 2026-05-16 01:58:53 +09:00 · 2026-05-16 01:51:45 +09:00
6 changed files with 2063 additions and 14 deletions
--- a/docs/superpowers/plans/2026-05-16-insta-trends-implementation.md
+++ b/docs/superpowers/plans/2026-05-16-insta-trends-implementation.md
--- a/docs/superpowers/specs/2026-05-16-insta-trends-design.md
+++ b/docs/superpowers/specs/2026-05-16-insta-trends-design.md
@@ -0,0 +1,247 @@
+# insta-lab Trends 탭 설계 — 외부 트렌드 수집 + 카테고리 가중치
+
+작성일: 2026-05-16
+상태: 사용자 승인 대기 → writing-plans 진입 예정
+연관 문서: `2026-05-15-insta-agent-design.md` (insta-lab 기본 설계)
+
+---
+
+## 1. 목적·배경
+
+insta-lab 운영 첫 사이클(2026-05-16 머지·배포 완료)에서 다음 두 가지 한계가 드러남:
+
+1. **키워드 발견 소스가 사용자 시드 키워드에만 의존** — 진짜 "지금 뜨고 있는" 화제를 잡지 못함. 카테고리당 5개 시드를 고정해두고 거기에 매칭되는 기사만 모음.
+2. **계정 정체성을 시스템이 모름** — 사용자가 "내 인스타 계정은 경제 위주"라고 정해도 시스템은 모든 카테고리를 균등하게 처리.
+
+이 spec은 두 한계를 해소하기 위해:
+- 외부 트렌드 소스(NAVER 인기 + Google Trends)를 추가해 "발견" 단계를 보강
+- 계정 카테고리 가중치 모델을 도입해 자동 추출 알고리즘이 계정 정체성을 반영
+
+---
+
+## 2. 스코프
+
+### 포함
+
+- 신규 백엔드 모듈 `trend_collector.py` (NAVER 인기 + Google Trends 두 source)
+- 신규 백엔드 모듈 변경: `keyword_extractor.py`에 가중치 기반 `extract_with_weights()` 추가
+- DB 마이그레이션: `trending_keywords` 테이블에 `source` 컬럼 추가, `account_preferences` 신규 테이블
+- 신규 API 4개 (`POST /trends/collect`, `GET /trends`, `GET/PUT /preferences`)
+- 09:00 매일 cron 추가 (트렌드 수집), 09:30 cron 가중치 적용
+- 프론트엔드: InstaCards 페이지에 탭 네비게이션 추가, Trends 탭 신규 3개 패널
+
+### 제외
+
+- pytrends 외 외부 SaaS 트렌드 API (BuzzSumo 등)
+- 트렌드 시계열 차트
+- 카테고리 자동 학습 (사용자 카드 생성 이력에서 선호도 추론)
+- 트렌드 알림 (특정 키워드 등장 시 push)
+
+---
+
+## 3. 데이터 소스
+
+### 3-1. NAVER 인기 (source = 'naver_popular')
+- NAVER news.json API 재사용. 카테고리당 시드 키워드로 `sort=sim` (정확도 정렬 = 인기 시그널) 30건 수집
+- 응답 기사 묶음에서 빈도어 추출 → 카테고리 매핑 (기존 keyword_extractor의 `_count_nouns` + `_top_candidates` 재사용)
+- 상위 N개를 `trending_keywords` 테이블에 source='naver_popular'로 저장
+
+### 3-2. Google Trends (source = 'google_trends')
+- 라이브러리: `pytrends` (PyPI, MIT)
+- `TrendReq(hl='ko-KR', tz=540).trending_searches(pn='south_korea')` 호출 → 일일 트렌딩 키워드 리스트
+- 각 키워드에 대해 Claude Haiku 1회 호출로 카테고리 분류 (`economy` / `psychology` / `celebrity` / 사용자 추가 카테고리 / `uncategorized`)
+- LLM 분류 비용 절감을 위해 분류 결과를 1일 캐시 — `trend_collector` 모듈 레벨 `_category_cache: dict[str, tuple[str, float]]` (keyword → (category, expires_ts)), 컨테이너 lifetime 동안 유효. 같은 키워드 재요청 시 cache hit. 캐시는 영속화하지 않음 (재시작 시 첫 호출은 LLM 재분류)
+- `trending_keywords` 테이블에 source='google_trends', score=traffic 정규화값
+
+### 3-3. 통합 저장
+
+기존 `trending_keywords` 스키마에 한 컬럼 추가:
+
+```sql
+ALTER TABLE trending_keywords ADD COLUMN source TEXT NOT NULL DEFAULT 'manual';
+-- 기존 row 모두 'manual'로 마킹됨 (시드 키워드에서 추출된 것)
+-- 신규 source: 'naver_popular' | 'google_trends'
+```
+
+`source`별 추가 인덱스:
+```sql
+CREATE INDEX idx_tk_source ON trending_keywords(source, suggested_at DESC);
+```
+
+---
+
+## 4. 카테고리 가중치 모델
+
+### 4-1. 신규 테이블 `account_preferences`
+
+```sql
+CREATE TABLE account_preferences (
+    category    TEXT PRIMARY KEY,
+    weight      REAL NOT NULL DEFAULT 1.0,
+    updated_at  TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ','now'))
+);
+```
+
+- 초기 시드: `economy=1.0`, `psychology=1.0`, `celebrity=1.0` (균등)
+- 사용자는 0~10 자유 범위 (UI는 0~100 정수%로 노출, 백엔드에서 0~1 정규화)
+- 합계 강제 없음. 알고리즘 내부에서 비율 정규화
+- 카테고리 추가 자유. 단 추가 시 `prompt_templates.category_seeds`에도 시드 키워드 함께 정의해야 자동 추출에 반영됨 (UI에서 안내)
+
+### 4-2. 가중치 기반 추출 알고리즘
+
+기존 `keyword_extractor.extract_for_category(category, limit)` 유지. 신규:
+
+```python
+def extract_with_weights(weights: dict[str, float], total_limit: int) -> list[Keyword]:
+    """카테고리 가중치 비율대로 키워드를 분배 추출."""
+    if not weights or sum(weights.values()) == 0:
+        # fallback: 균등 가중치
+        cats = list(DEFAULT_CATEGORY_SEEDS.keys())
+        weights = {c: 1.0 for c in cats}
+
+    total_weight = sum(weights.values())
+    saved = []
+    for category, w in weights.items():
+        if w <= 0:
+            continue
+        per_cat = round(total_limit * w / total_weight)
+        if per_cat <= 0:
+            continue
+        saved.extend(extract_for_category(category, limit=per_cat))
+    return saved
+```
+
+- `total_limit` 기본 15 (3 카테고리 × 5 시드 시절 합계와 동일)
+- weight=0 카테고리는 skip (분류는 유지하되 자동 추출에서 제외하고 싶을 때)
+
+---
+
+## 5. API (insta-lab)
+
+| 메서드 | 경로 | 설명 |
+|--------|------|------|
+| POST | `/api/insta/trends/collect` | 두 source 모두 수집 (BackgroundTask) → `{task_id}` |
+| GET | `/api/insta/trends` | 트렌드 조회. query: `source` (`naver_popular`/`google_trends`/`all`), `category`, `days` (default 1, 의미: `suggested_at >= now() - days*24h`). 정렬 `suggested_at DESC, score DESC` |
+| GET | `/api/insta/preferences` | 가중치 조회 → `{categories: [{category, weight, updated_at}]}` |
+| PUT | `/api/insta/preferences` | body `{categories: {economy: 0.6, ...}}` → upsert |
+
+기존 `/api/insta/keywords`는 source 필터 추가 (`?source=manual` 등). 미지정 시 모든 source 반환 (default behavior 유지).
+
+---
+
+## 6. 스케줄러 변경 (agent-office InstaAgent)
+
+기존:
+- 09:30 — 키워드 추출 → 텔레그램 푸시
+
+신규:
+- **09:00 — 외부 트렌드 수집** (NAVER 인기 + Google Trends) — `_run_insta_trends_collect()` 신규 cron
+- **09:30 — 키워드 추출** (기존 + 가중치 적용) — InstaAgent가 `get_preferences()` 호출 후 `extract_with_weights()` 사용
+
+수동 트리거: InstaAgent에 `on_command("collect_trends", {})` 신규 액션. 텔레그램에서 `/insta collect_trends` 슬래시 명령 또는 Insta 페이지 버튼에서 호출.
+
+---
+
+## 7. 프론트엔드 변경 (web-ui InstaCards.jsx)
+
+### 7-1. 탭 네비게이션
+
+기존 5개 패널을 두 탭으로 재구성:
+
+| 탭 | 패널 |
+|----|------|
+| **Cards** (기본) | Trigger, Trending Keywords, Slates, SlateDetail, PromptEditor (기존 그대로) |
+| **Trends** (신규) | AccountFocusPanel, ExternalTrendsPanel, PreferenceImpactPanel |
+
+탭 컴포넌트: `<TabBar>` 단순 buttons (`activeTab` state), URL에 `?tab=trends` 쿼리로 deep-link 지원.
+
+### 7-2. AccountFocusPanel
+- 카테고리별 가중치 슬라이더 (0~100 정수%) + 우측 막대 차트 (분포 시각화)
+- **+ 카테고리 추가** 버튼 → 모달로 카테고리명 + 시드 키워드 N개 입력 (시드는 category_seeds 프롬프트 템플릿에 머지)
+- **저장** 버튼 → `PUT /preferences` (debounce 1초)
+
+### 7-3. ExternalTrendsPanel
+- 상단: **🔄 수동 수집** 버튼 + "마지막 수집: HH:MM" 라벨 + 진행 task box
+- 두 컬럼 (반응형 → 모바일은 세로):
+  - **🔥 NAVER 인기** — 카테고리별 그룹핑, 각 카드: keyword + score + 카테고리 배지
+  - **🌐 Google Trends** — 단순 리스트, 각 카드: keyword + 카테고리 배지 + traffic
+- 각 카드 우측에 **🎴** 버튼 → 즉시 `POST /slates` (기존 흐름)
+- 색상 매핑: economy=#0F62FE, psychology=#A66CFF, celebrity=#FF5C8A, custom=#6B7280
+
+### 7-4. PreferenceImpactPanel (작은 박스)
+- "현재 가중치 기준 다음 자동 추출 결과 미리보기: economy 3 / psychology 2 / celebrity 0"
+- 가중치 슬라이더 변경 시 즉시 클라이언트에서 계산해 갱신
+- 컴팩트 1줄 표시
+
+### 7-5. 신규 API 헬퍼 (src/api.js)
+
+```js
+export function getInstaTrends({ source, category, days = 1 } = {}) { ... }
+export function instaCollectTrends() { ... }
+export function getInstaPreferences() { ... }
+export function putInstaPreferences(categories) { ... }
+```
+
+---
+
+## 8. 에러 처리
+
+| 상황 | 처리 |
+|------|------|
+| pytrends rate limit / 차단 | try/except → 빈 결과로 graceful degrade. NAVER 인기는 정상 수집 |
+| LLM 분류 실패 | `uncategorized` 카테고리로 폴백, 사용자가 UI에서 수동 재분류 가능 |
+| 가중치 합계 0 | 균등 가중치 (1/N)로 폴백, 로그 warning |
+| 카테고리 추가했는데 시드 없음 | 자동 추출에서 자연스럽게 skip (NAVER 검색에 시드 필요), UI에서 "시드 키워드 추가 필요" 경고 |
+| Google Trends 한국 region 부재 | hl='ko-KR' + pn='south_korea' 명시. 실패 시 빈 결과 |
+
+---
+
+## 9. 테스트
+
+### insta-lab pytest
+- `test_trend_collector.py` (4): `fetch_naver_popular` mocked, `fetch_google_trends` pytrends mocked, 카테고리 매핑, 캐시 hit
+- `test_extract_with_weights.py` (3): 균등 가중치, 한쪽 0 가중치, fallback 빈 가중치
+- `test_preferences_crud.py` (2): GET 기본값, PUT upsert
+- `test_main_trends.py` (3): 신규 4개 엔드포인트 통합
+
+### agent-office pytest
+- `test_insta_agent_trends.py` (2): `on_schedule_trends` mocked, weight-applied extract
+
+---
+
+## 10. 마이그레이션 절차
+
+1. `db.init_db()`에 `ALTER TABLE trending_keywords ADD COLUMN source ...` 추가 — `PRAGMA table_info`로 컬럼 존재 여부 확인 후 idempotent하게 실행
+2. `account_preferences` 테이블 신규 생성
+3. 초기 시드: 기존 카테고리 economy/psychology/celebrity 모두 weight=1.0
+4. 기존 `trending_keywords` row는 자동으로 source='manual' (컬럼 DEFAULT)
+5. `requirements.txt`에 `pytrends>=4.9` 추가
+6. 배포 후 사용자가 Trends 탭에서 가중치 조정 (필수 아님, 균등이 디폴트 동작)
+
+---
+
+## 11. 운영 영향
+
+| 항목 | 영향 |
+|------|------|
+| Anthropic 토큰 비용 | +미미 (Google Trends 1회당 ~20 키워드 × Haiku 분류 1콜 ≈ 600 토큰/일) |
+| DB 크기 | +미미 (트렌드 row 일일 ~50개, 카테고리당 30 + Google 20) |
+| NAS CPU | +낮음 (pytrends + NAVER API 호출만, LLM은 외부) |
+| 카드 생성 흐름 | 변경 없음. 트렌드는 "발견" 단계만 보강 |
+
+---
+
+## 12. 완료 정의
+
+- [ ] `trending_keywords.source` 컬럼 마이그레이션 적용, 기존 row 모두 'manual'로 표시됨
+- [ ] `account_preferences` 테이블 생성, 초기 3개 카테고리 weight=1.0
+- [ ] `POST /api/insta/trends/collect` 호출 시 NAVER 인기 + Google Trends 모두 수집되어 DB 저장
+- [ ] `GET /api/insta/trends?source=google_trends` 결과 카테고리 분류됨
+- [ ] `PUT /api/insta/preferences` 후 09:30 cron이 가중치 비율대로 추출
+- [ ] 09:00 cron 등록, 매일 자동 트렌드 수집
+- [ ] Insta 페이지에 Cards/Trends 탭 전환 작동
+- [ ] Trends 탭의 AccountFocusPanel에서 가중치 변경·저장 가능
+- [ ] ExternalTrendsPanel에서 NAVER 인기 + Google Trends 한 눈에 표시, 각 카드 생성 트리거 작동
+- [ ] PreferenceImpactPanel 미리보기 갱신
+- [ ] insta-lab pytest 전체 통과 (기존 21 + 신규 12 = 33)
+- [ ] agent-office pytest 전체 통과
--- a/insta-lab/Dockerfile
+++ b/insta-lab/Dockerfile
@@ -1,15 +1,23 @@
-FROM python:3.12-slim
+FROM python:3.12-slim-bookworm
 ENV PYTHONUNBUFFERED=1

 WORKDIR /app

+# Korean fonts + Chromium runtime deps (Debian 12 / bookworm)
+# `playwright install --with-deps`를 쓰지 않는 이유: 그 명령은 Ubuntu 패키지명을
+# 사용해 Debian에서 ttf-ubuntu-font-family / ttf-unifont 등 없는 패키지를 시도
+# → apt 실패. 대신 Chromium이 실제 필요로 하는 라이브러리만 명시 설치.
 RUN apt-get update && apt-get install -y --no-install-recommends \
    fonts-noto-cjk fonts-noto-cjk-extra \
+    libnss3 libnspr4 libdbus-1-3 libatk1.0-0 libatk-bridge2.0-0 \
+    libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
+    libxfixes3 libxrandr2 libgbm1 libxshmfence1 libpango-1.0-0 \
+    libcairo2 libasound2 libatspi2.0-0 \
 && rm -rf /var/lib/apt/lists/*

 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-RUN playwright install --with-deps chromium
+RUN playwright install chromium

 COPY . .

--- a/scripts/deploy-nas.sh
+++ b/scripts/deploy-nas.sh
@@ -2,7 +2,7 @@
 set -euo pipefail

 # ── 서비스 목록 (한 곳에서만 관리) ──
-SERVICES="lotto travel-proxy deployer stock music-lab blog-lab realestate-lab agent-office personal packs-lab nginx scripts"
+SERVICES="lotto travel-proxy deployer stock music-lab insta-lab realestate-lab agent-office personal packs-lab nginx scripts"

 # 1. 자동 감지: Docker 컨테이너 내부인가?
 if [ -d "/repo" ] && [ -d "/runtime" ]; then
--- a/scripts/deploy.sh
+++ b/scripts/deploy.sh
@@ -7,13 +7,13 @@ flock -n 200 || { echo "Deploy already running, skipping"; exit 0; }

 # ── 서비스 목록 (한 곳에서만 관리) ──
 # docker compose 서비스명 (deployer 제외 — 자기 자신을 재빌드하면 스크립트 중단)
-BUILD_TARGETS="lotto travel-proxy stock music-lab blog-lab realestate-lab agent-office personal packs-lab frontend"
-# 컨테이너 이름 (고아 정리용)
-CONTAINER_NAMES="lotto stock music-lab blog-lab realestate-lab agent-office personal packs-lab travel-proxy frontend"
+BUILD_TARGETS="lotto travel-proxy stock music-lab insta-lab realestate-lab agent-office personal packs-lab frontend"
+# 컨테이너 이름 (고아 정리용 — blog-lab은 폐기 대상으로 정리 리스트에 유지)
+CONTAINER_NAMES="lotto stock music-lab insta-lab blog-lab realestate-lab agent-office personal packs-lab travel-proxy frontend"
 # 헬스체크 대상
-HEALTH_ENDPOINTS="lotto stock travel-proxy music-lab blog-lab realestate-lab agent-office personal packs-lab"
+HEALTH_ENDPOINTS="lotto stock travel-proxy music-lab insta-lab realestate-lab agent-office personal packs-lab"
 # data 디렉토리 (packs-lab은 별도 media/packs 사용)
-DATA_DIRS="music stock blog realestate agent-office personal"
+DATA_DIRS="music stock insta realestate agent-office personal"

 # 1. 자동 감지: Docker 컨테이너 내부인가?
 if [ -d "/repo" ] && [ -d "/runtime" ]; then
@@ -96,13 +96,25 @@ docker compose up -d --build $BUILD_TARGETS
 docker exec frontend nginx -s reload 2>/dev/null || true

 # ── 배포 후 헬스체크 ──
-echo "Waiting for services to start..."
-sleep 5
+# Docker compose의 healthcheck 블록이 이미 모든 컨테이너에 정의되어 있으므로
+# `docker inspect`로 컨테이너 health state를 직접 조회. 이 방식은
+# (a) deployer 컨테이너 내부에서도 호스트에서도 동일하게 동작
+# (b) 호스트네임 DNS 해석에 의존하지 않음 (호스트 셸에서는 'lotto' 등 미해석)
+echo "Waiting for services to become healthy..."

 HEALTH_OK=true
 for svc in $HEALTH_ENDPOINTS; do
-    if ! curl -sf --max-time 10 --retry 2 --retry-delay 3 "http://$svc:8000/health" > /dev/null 2>&1; then
-        echo "HEALTH_FAIL: http://$svc:8000/health"
+    health="starting"
+    # 최대 60초 (5초×12) 동안 starting → healthy 전이 대기
+    for _ in $(seq 1 12); do
+        health=$(docker inspect --format='{{.State.Health.Status}}' "$svc" 2>/dev/null || echo "missing")
+        if [ "$health" = "healthy" ] || [ "$health" = "unhealthy" ] || [ "$health" = "missing" ]; then
+            break
+        fi
+        sleep 5
+    done
+    if [ "$health" != "healthy" ]; then
+        echo "HEALTH_FAIL: $svc (state=$health)"
        HEALTH_OK=false
    fi
 done
--- a/scripts/healthcheck.sh
+++ b/scripts/healthcheck.sh
@@ -44,8 +44,9 @@ check_url "Music Health" "http://localhost:18600/health"
 check_url "Music Providers" "http://localhost:18600/api/music/providers"

 echo ""
-echo "--- 4. Blog Lab ---"
-check_url "Blog Health" "http://localhost:18700/health"
+echo "--- 4. Insta Lab ---"
+check_url "Insta Health" "http://localhost:18700/health"
+check_url "Insta Status" "http://localhost:18700/api/insta/status"

 echo ""
 echo "--- 5. Realestate Lab ---"
Author	SHA1	Message	Date
gahusb	d6081ba2d3	docs(insta-trends): implementation plan (10 TDD-grouped tasks) trend_collector NAVER+Google+LLM 분류, db migration + preferences CRUD, extract_with_weights, 4 endpoints + keywords source 필터, InstaAgent collect_trends action + preferences-aware schedule, web-ui 탭 + 3 패널, 스모크 매트릭스.	2026-05-16 17:39:19 +09:00
gahusb	10cb3ae1df	docs(insta-trends): 셀프 리뷰 보강 — LLM 분류 캐시 위치, days 쿼리 의미 명시	2026-05-16 17:31:22 +09:00
gahusb	e3348da642	docs(insta-trends): 외부 트렌드 + 카테고리 가중치 설계 NAVER 인기 + Google Trends 두 source 수집, account_preferences로 카테고리 가중치 모델, 가중치 기반 키워드 추출 알고리즘, Insta 페이지 Cards/Trends 탭 분리.	2026-05-16 17:30:45 +09:00
gahusb	088bbaa097	fix(deploy): use docker inspect for healthcheck (호스트/컨테이너 둘 다 동작) 기존 curl http://lotto:8000/health은 deployer 컨테이너 내부에서만 Docker DNS가 'lotto'를 해석. 호스트 셸에서 sudo bash로 직접 실행 시 DNS 해석 실패해 모든 서비스가 HEALTH_FAIL로 오판정. docker inspect로 이미 정의된 compose healthcheck 결과를 직접 조회하도록 변경. starting 상태는 최대 60초 대기 후 최종 판정.	2026-05-16 02:11:38 +09:00
gahusb	be322557ee	fix(insta-lab): pin to bookworm + manual Chromium deps (drop --with-deps) python:3.12-slim이 trixie(Debian 13)로 옮겨가면서 Playwright 1.48의 --with-deps가 ttf-ubuntu-font-family / ttf-unifont 등 ubuntu20.04 fallback 패키지를 시도하다 apt 실패 → Docker build exit 100. 해결: python:3.12-slim-bookworm 명시(Debian 12, Playwright 공식 지원) + Chromium 런타임 라이브러리 직접 apt 설치 + --with-deps 제거.	2026-05-16 01:58:53 +09:00
gahusb	70438caa1f	fix(scripts): blog-lab → insta-lab in deploy/healthcheck service lists 배포 스크립트 hardcoded 서비스 리스트가 blog-lab을 참조해 머지 후 첫 webhook 배포가 rsync(/repo/blog-lab 없음) + docker compose (서비스 미정의) 양쪽에서 실패. SERVICES/BUILD_TARGETS/HEALTH_ENDPOINTS/ DATA_DIRS를 insta-lab 기준으로 갱신. CONTAINER_NAMES는 blog-lab 고아 정리용으로 유지(다음번 docker rm -f가 안전 실행).	2026-05-16 01:51:45 +09:00
gahusb	e16029ebdb	Merge pull request 'feat/insta-agent' (#3 ) from feat/insta-agent into main Reviewed-on: #3	2026-05-16 01:43:21 +09:00