LLM - Summary

Lillo's language model system provides a flexible and extensible architecture for integrating multiple AI providers. The system is built around a factory pattern that supports dynamic model selection,

Core Components

src/lib/services/
├── LLMFactory.ts           # Main factory implementation
├── ModelPreferencesService.ts  # Model selection system
└── AgentConfigService.ts   # Agent configuration

Supported Models

OpenAI (Primary)

Model: gpt-4o
Full function calling support
Token usage tracking
Primary provider for complex tasks
Standardized tool integration
Error handling with retries

Gemini

Model: gemini-pro
Chat support without function calling
Role-based message history
Custom role mapping (assistant → model)
Efficient for basic interactions
Simplified error handling

Grok

Model: grok-2-latest
OpenAI-compatible API
Function calling support
Response normalization
Tool call format standardization
Custom base URL configuration

DeepSeek

Model: deepseek-chat
Direct API integration
Function calling support
Custom message transformation
Role-based message mapping
Direct HTTP request handling

Function Calling

The system supports a standardized set of tools across compatible providers:

Core Functions

generate_image: DALL-E image generation
get_weather: Location-based weather data
get_market_data: Cryptocurrency market information
get_time: Location-based time information

Implementation

Each provider that supports function calling implements these tools consistently, ensuring:

Standardized parameter schemas
Consistent error handling
Response normalization
Tool validation
Type safety
Required parameter validation

Model Selection

The system supports dynamic model selection:

Per-Chat Preferences

Persistent storage in PostgreSQL
Runtime model switching
Default model fallback (OpenAI)
Preference management API
Concurrent access handling

Selection Criteria

Task complexity requirements
Function calling needs
Response quality preferences
Rate limit considerations
Token usage optimization

Best Practices

Provider Selection

Use OpenAI for complex tasks requiring function calling
Use Gemini for basic chat interactions
Use Grok for OpenAI-compatible function calling
Use DeepSeek for specialized tasks
Consider rate limits and costs

Error Handling

Implement provider-specific error handling
Handle rate limits and quotas
Validate responses
Log errors appropriately
Implement retries where appropriate
Handle network failures

Performance

Reuse factory instance (singleton pattern)
Monitor token usage
Implement caching where appropriate
Handle streaming responses efficiently
Optimize message history
Clean content processing

PreviousArchitecture NextFactory

Last updated 9 months ago