Overview
Overview
Lillo's language model system provides a flexible and extensible architecture for integrating multiple AI providers. The system is built around a factory pattern that supports dynamic model selection, function calling, and per-chat model preferences.
Core Components
Supported Models
OpenAI (Primary)
Model: gpt-4o
Full function calling support
Token usage tracking
Primary provider for complex tasks
Standardized tool integration
Error handling with retries
Gemini
Model: gemini-pro
Chat support without function calling
Role-based message history
Custom role mapping (assistant → model)
Efficient for basic interactions
Simplified error handling
Grok
Model: grok-2-latest
OpenAI-compatible API
Function calling support
Response normalization
Tool call format standardization
Custom base URL configuration
DeepSeek
Model: deepseek-chat
Direct API integration
Function calling support
Custom message transformation
Role-based message mapping
Direct HTTP request handling
Function Calling
The system supports a standardized set of tools across compatible providers:
Core Functions
generate_image
: DALL-E image generationget_weather
: Location-based weather dataget_market_data
: Cryptocurrency market informationget_time
: Location-based time information
Implementation
Each provider that supports function calling implements these tools consistently, ensuring:
Standardized parameter schemas
Consistent error handling
Response normalization
Tool validation
Type safety
Required parameter validation
Model Selection
The system supports dynamic model selection:
Per-Chat Preferences
Persistent storage in PostgreSQL
Runtime model switching
Default model fallback (OpenAI)
Preference management API
Concurrent access handling
Selection Criteria
Task complexity requirements
Function calling needs
Response quality preferences
Rate limit considerations
Token usage optimization
Best Practices
Provider Selection
Use OpenAI for complex tasks requiring function calling
Use Gemini for basic chat interactions
Use Grok for OpenAI-compatible function calling
Use DeepSeek for specialized tasks
Consider rate limits and costs
Error Handling
Implement provider-specific error handling
Handle rate limits and quotas
Validate responses
Log errors appropriately
Implement retries where appropriate
Handle network failures
Performance
Reuse factory instance (singleton pattern)
Monitor token usage
Implement caching where appropriate
Handle streaming responses efficiently
Optimize message history
Clean content processing
Related Documentation
Last updated