Overview

Overview

Lillo's language model system provides a flexible and extensible architecture for integrating multiple AI providers. The system is built around a factory pattern that supports dynamic model selection, function calling, and per-chat model preferences.

Core Components

src/lib/services/
├── LLMFactory.ts           # Main factory implementation
├── ModelPreferencesService.ts  # Model selection system
└── AgentConfigService.ts   # Agent configuration

Supported Models

OpenAI (Primary)

  • Model: gpt-4o

  • Full function calling support

  • Token usage tracking

  • Primary provider for complex tasks

  • Standardized tool integration

  • Error handling with retries

Gemini

  • Model: gemini-pro

  • Chat support without function calling

  • Role-based message history

  • Custom role mapping (assistant → model)

  • Efficient for basic interactions

  • Simplified error handling

Grok

  • Model: grok-2-latest

  • OpenAI-compatible API

  • Function calling support

  • Response normalization

  • Tool call format standardization

  • Custom base URL configuration

DeepSeek

  • Model: deepseek-chat

  • Direct API integration

  • Function calling support

  • Custom message transformation

  • Role-based message mapping

  • Direct HTTP request handling

Function Calling

The system supports a standardized set of tools across compatible providers:

Core Functions

  • generate_image: DALL-E image generation

  • get_weather: Location-based weather data

  • get_market_data: Cryptocurrency market information

  • get_time: Location-based time information

Implementation

Each provider that supports function calling implements these tools consistently, ensuring:

  • Standardized parameter schemas

  • Consistent error handling

  • Response normalization

  • Tool validation

  • Type safety

  • Required parameter validation

Model Selection

The system supports dynamic model selection:

Per-Chat Preferences

  • Persistent storage in PostgreSQL

  • Runtime model switching

  • Default model fallback (OpenAI)

  • Preference management API

  • Concurrent access handling

Selection Criteria

  • Task complexity requirements

  • Function calling needs

  • Response quality preferences

  • Rate limit considerations

  • Token usage optimization

Best Practices

Provider Selection

  • Use OpenAI for complex tasks requiring function calling

  • Use Gemini for basic chat interactions

  • Use Grok for OpenAI-compatible function calling

  • Use DeepSeek for specialized tasks

  • Consider rate limits and costs

Error Handling

  • Implement provider-specific error handling

  • Handle rate limits and quotas

  • Validate responses

  • Log errors appropriately

  • Implement retries where appropriate

  • Handle network failures

Performance

  • Reuse factory instance (singleton pattern)

  • Monitor token usage

  • Implement caching where appropriate

  • Handle streaming responses efficiently

  • Optimize message history

  • Clean content processing

Last updated