v3.1 과매수 방지, 앙상블 학습, KRX 캘린더 기반 장중 전용 운영 구현

[잔고 관리] - _today_buy_total 인스턴스 변수로 당일 누적 매수 추적 (KIS T+2 미차감 보완) - MAX_BUY_PER_CYCLE, MAX_DAILY_BUY_RATIO 설정 추가 - available_deposit = max_daily_buy - effective_today_buy 계산 [앙상블 & 포지션 사이징] - AdaptiveEnsemble 실제 연동 (하드코딩 가중치 제거) - Kelly Criterion Half-Kelly 포지션 비중 계산 - SignalWeights.normalize() Water-Filling 알고리즘으로 경계 위반 해결 - _accuracy_weighted() 크기 가중 정확도로 통일 - ensemble_weights.json → ensemble_history.json 통합 [LLM 클라이언트] - GeminiLLMClient 추가 (Gemini → Ollama 폴백 체인) - _class_last_call_ts 클래스 변수로 워커 재시작 후에도 스로틀 유지 - Ollama 미실행 조기 감지 및 명확한 오류 메시지 [KIS API] - 모든 requests.get/post에 timeout=Config.HTTP_TIMEOUT 적용 - get_balance()에 today_buy_amt 필드 추가 [장중 전용 운영] - KRXCalendar: exchange_calendars 기반, 2024~2026 공휴일 하드코딩 폴백 - EOD 셧다운: 15:35에 전체 상태 저장 후 서버 자동 종료 - Watchdog: .eod_date 마커로 EOD 후 재시작 차단 - daily_launcher.py: 매일 08:30 실행, 휴장일 감지 후 봇 미시작 - Windows 작업 스케줄러 WebAI_DailyLauncher 등록 [텔레그램 스킬 수정] - PYTHONIOENCODING=utf-8 서브프로세스 환경 설정 (cp949 이모지 오류 해결) - /regime: IPC macro_indices 파싱 구현, --json 모드 input() 블로킹 제거 - /weights: ensemble_history.json 형식 파싱 업데이트 - /model_health: glob 패턴 *_v3.pt 수정 - /postmortem: 거래 없을 때 빈 JSON 출력으로 Telegram 오류 해결 - /macro: price=0 시 prev_close 폴백 표시 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 05:21:23 +09:00
parent 760d1906ed
commit 0aebca7ff0
17 changed files with 3816 additions and 200 deletions
--- a/modules/services/llm_client.py
+++ b/modules/services/llm_client.py
@@ -0,0 +1,199 @@
+"""
+통합 LLM 클라이언트 — Gemini 2.5 Flash (Primary) + Ollama (Fallback)
+
+설계 원칙:
+  - OllamaManager.request_inference(prompt) 와 동일한 인터페이스 유지
+    → process.py, ai_council.py 코드 변경 최소화
+  - Gemini 실패(네트워크, Rate Limit) 시 자동으로 로컬 Ollama 폴백
+  - 15 RPM 제한 준수를 위한 자동 스로틀링
+  - VRAM 충돌 없음 (외부 API 호출이므로 LSTM 학습과 간섭 없음)
+
+Rate Limit (Gemini 2.5 Flash 무료 티어):
+  - 15 RPM, 1,500 RPD (봇 필요량 ~240/일 → 여유 6배)
+
+추가 패키지 불필요:
+  - requests (이미 설치됨) 기반 REST API 직접 호출
+"""
+
+import time
+import requests
+import json
+
+from modules.config import Config
+
+
+class GeminiLLMClient:
+    """
+    Gemini API 클라이언트
+
+    사용법:
+        client = GeminiLLMClient()
+        result = client.request_inference(prompt)  # str | None
+    """
+
+    _GENERATE_URL = (
+        "https://generativelanguage.googleapis.com/v1beta/models"
+        "/{model}:generateContent?key={key}"
+    )
+    # 15 RPM → 최소 4초 간격 (여유 0.1초 추가)
+    _MIN_INTERVAL = 4.1
+    # 클래스 변수: 같은 프로세스 내 재생성 시에도 마지막 호출 시각 유지
+    # (워커 OOM 재시작 후 싱글톤 교체 시에도 스로틀 유효)
+    _class_last_call_ts: float = 0.0
+
+    def __init__(self):
+        self.api_key = Config.GEMINI_API_KEY
+        self.model   = Config.GEMINI_MODEL
+        self._ollama = None          # Ollama 폴백 (lazy init)
+        self._use_gemini = bool(self.api_key)
+
+        if self._use_gemini:
+            print(f"✅ [LLMClient] Primary: Gemini {self.model}")
+        else:
+            print("⚠️  [LLMClient] GEMINI_API_KEY 미설정 → Ollama 전용 모드")
+
+    # ── 내부 헬퍼 ────────────────────────────────────────────────────────────
+
+    def _throttle(self):
+        """15 RPM 제한 준수 — 최소 호출 간격 강제 대기 (클래스 공유 타임스탬프)"""
+        elapsed = time.time() - GeminiLLMClient._class_last_call_ts
+        if elapsed < self._MIN_INTERVAL:
+            time.sleep(self._MIN_INTERVAL - elapsed)
+
+    def _call_gemini(self, prompt: str) -> str | None:
+        """
+        Gemini REST API 단일 호출
+
+        설정:
+          - systemInstruction: JSON 전용 응답 강제
+          - thinkingBudget=0: 내부 추론 비활성 (속도 1.5초 / 토큰 절약)
+          - maxOutputTokens=512: 200은 thinking 소모로 잘리므로 여유 확보
+        """
+        self._throttle()
+
+        url = self._GENERATE_URL.format(model=self.model, key=self.api_key)
+        payload = {
+            "system_instruction": {
+                "parts": [{"text": (
+                    "You are a Korean stock market analyst. "
+                    "Respond with valid JSON only. "
+                    "No markdown, no code blocks, no explanations."
+                )}]
+            },
+            "contents": [{"parts": [{"text": prompt}]}],
+            "generationConfig": {
+                "maxOutputTokens": 512,   # 200→512 (thinking 비활성 후 실제 응답 공간 확보)
+                "temperature": 0.1,       # 결정론적 출력
+                "thinkingConfig": {"thinkingBudget": 0},  # 내부 추론 끔 (속도↑, 토큰↓)
+            },
+        }
+
+        try:
+            resp = requests.post(url, json=payload, timeout=30)
+            GeminiLLMClient._class_last_call_ts = time.time()
+
+            # Rate Limit 초과
+            if resp.status_code == 429:
+                print("[LLMClient] Gemini Rate Limit (429) → Ollama 폴백")
+                return None
+
+            resp.raise_for_status()
+            data = resp.json()
+
+            # thinking 파트 제외, 실제 텍스트 파트만 결합
+            candidate = data.get("candidates", [{}])[0]
+            parts = candidate.get("content", {}).get("parts", [])
+            text = "".join(
+                p.get("text", "") for p in parts
+                if "text" in p and not p.get("thought")
+            ).strip()
+
+            return text if text else None
+
+        except requests.exceptions.Timeout:
+            print("[LLMClient] Gemini Timeout (30s) → Ollama 폴백")
+            return None
+        except Exception as e:
+            print(f"[LLMClient] Gemini Error: {e} → Ollama 폴백")
+            return None
+
+    def _get_ollama(self):
+        """Ollama 폴백 인스턴스 (lazy init — 필요할 때만 로드)"""
+        if self._ollama is None:
+            from modules.services.ollama import OllamaManager
+            self._ollama = OllamaManager()
+            # Ollama 실행 여부 사전 확인 (WinError 10061 조기 감지)
+            try:
+                requests.get(
+                    f"{Config.OLLAMA_API_URL}/api/tags",
+                    timeout=3,
+                )
+            except Exception:
+                print(
+                    f"❌ [LLMClient] Ollama 미실행 (localhost:11434 연결 거부) — "
+                    f"`ollama serve` 명령으로 Ollama를 시작하세요."
+                )
+        return self._ollama
+
+    # ── 공개 인터페이스 ───────────────────────────────────────────────────────
+
+    def request_inference(self, prompt: str, context_data=None) -> str | None:
+        """
+        LLM 추론 요청 — OllamaManager.request_inference()와 동일한 시그니처
+
+        순서:
+          1) GEMINI_API_KEY 있음 → Gemini API 호출
+          2) Gemini 실패(에러/타임아웃/Rate Limit) → Ollama 로컬 폴백
+          3) GEMINI_API_KEY 없음 → 바로 Ollama 사용
+        """
+        if self._use_gemini:
+            result = self._call_gemini(prompt)
+            if result is not None:
+                return result
+            # Gemini 실패 → Ollama 폴백
+            print("[LLMClient] Ollama 폴백 시도 중...")
+
+        return self._get_ollama().request_inference(prompt, context_data)
+
+    # ── OllamaManager 호환 메서드 (ai_council, evaluator 등에서 사용) ─────────
+
+    def check_vram(self) -> float:
+        """VRAM 사용량 반환 (Ollama 측 정보, Gemini 호출 시엔 무관)"""
+        if self._ollama:
+            return self._ollama.check_vram()
+        return 0.0
+
+    def get_gpu_status(self) -> dict:
+        """GPU 상태 반환 (OllamaManager 호환)"""
+        return self._get_ollama().get_gpu_status()
+
+    def unload_model(self):
+        """Ollama 모델 언로드 (LSTM 학습 전 호출용, Gemini는 무작동)"""
+        if self._ollama:
+            try:
+                requests.post(
+                    f"{Config.OLLAMA_API_URL}/api/generate",
+                    json={"model": Config.OLLAMA_MODEL, "keep_alive": 0},
+                    timeout=5,
+                )
+            except Exception:
+                pass
+
+
+# ── 워커 프로세스 전역 싱글톤 ─────────────────────────────────────────────────
+
+_llm_client: GeminiLLMClient | None = None
+
+
+def get_llm_client() -> GeminiLLMClient:
+    """
+    워커 프로세스 내 GeminiLLMClient 싱글톤 반환
+
+    process.py에서 기존 get_ollama() 대신 이 함수를 사용:
+        ollama = get_llm_client()
+        result = ollama.request_inference(prompt)
+    """
+    global _llm_client
+    if _llm_client is None:
+        _llm_client = GeminiLLMClient()
+    return _llm_client