Skip to main content

Prompt Svc

The Prompt Svc is a resilient AI orchestration service that provides a unified interface for interacting with Large Language Models (LLMs) and other AI systems through a queue-based architecture with real-time streaming capabilities.

This page provides a comprehensive overview of Prompt Svc. For detailed API information, refer to the Prompt Svc API documentation.

Architecture & Purpose

Prompt Svc serves as the AI interaction layer for 1Backend, providing:

  • Unified Interface: Single API for all AI model interactions (text, image, audio)
  • Queue Management: Resilient processing with automatic retries and exponential backoff
  • Real-Time Streaming: Live response streaming via Server-Sent Events (SSE)
  • Multi-Modal Support: Text-to-text, text-to-image, image-to-image, and more
  • Platform Abstraction: Support for different AI engines (LlamaCpp, Stable Diffusion)
  • Integration Layer: Seamless connection with Chat Svc and Model Svc

Key Features

  • Asynchronous Processing: Queue-based prompt handling with status tracking
  • Synchronous Mode: Blocking calls for scripting and simple integrations
  • Streaming Responses: Real-time output streaming for progressive results
  • Retry Logic: Automatic retry with exponential backoff for failed prompts
  • Template System: Flexible prompt templates for different model formats
  • Multi-Platform: Engine-agnostic and engine-specific parameter support

CLI Usage

Prompt Svc provides both synchronous and asynchronous interaction modes:

Text Generation (Synchronous)

# Simple text generation with default model
oo post /prompt-svc/prompt \
--prompt="Explain quantum computing in simple terms" \
--sync=true

# With specific model
oo post /prompt-svc/prompt \
--prompt="Write a Python function to calculate Fibonacci numbers" \
--modelId="huggingface/TheBloke/codellama-7b.Q4_K_M.gguf" \
--sync=true

# Using high-level parameters
oo post /prompt-svc/prompt \
--prompt="Hello, how are you?" \
--parameters.textToText.template="[INST] {prompt} [/INST]" \
--sync=true

# Using engine-specific parameters
oo post /prompt-svc/prompt \
--prompt="What is the meaning of life?" \
--engineParameters.llamaCpp.template="### HUMAN:\n{prompt}\n\n### RESPONSE:\n" \
--sync=true

Text Generation (Asynchronous with Streaming)

# Submit prompt to queue
oo post /prompt-svc/prompt \
--prompt="Write a detailed essay about artificial intelligence" \
--threadId="thread_12345" \
--sync=false

# Subscribe to streaming responses (in another terminal)
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/thread_12345/responses/subscribe"

# Or using Server-Sent Events in JavaScript
# const eventSource = new EventSource('/prompt-svc/prompts/thread_12345/responses/subscribe');

Image Generation

# Text-to-image with Stable Diffusion
oo post /prompt-svc/prompt \
--prompt="A serene mountain landscape at sunset, digital art" \
--parameters.textToImage.width=512 \
--parameters.textToImage.height=512 \
--parameters.textToImage.steps=20 \
--sync=true

# Using Stable Diffusion engine parameters
oo post /prompt-svc/prompt \
--prompt="A futuristic city with flying cars" \
--engineParameters.stableDiffusion.txt2Img.width=768 \
--engineParameters.stableDiffusion.txt2Img.height=768 \
--engineParameters.stableDiffusion.txt2Img.num_inference_steps=30 \
--sync=true

Advanced Configuration

# With retry configuration and thread management
oo post /prompt-svc/prompt \
--prompt="Analyze this business case and provide recommendations" \
--threadId="business_analysis_001" \
--maxRetries=5 \
--modelId="huggingface/TheBloke/mistral-7b-instruct-v0.2.Q4_K_M.gguf" \
--sync=false

# Custom prompt ID (for idempotency)
oo post /prompt-svc/prompt \
--id="prom_custom_12345" \
--prompt="Generate a summary of the latest AI research" \
--threadId="ai_research_summary" \
--sync=true

Prompt Management

# List active prompts
oo post /prompt-svc/prompts

# List prompts with specific status
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "running"}]'

# Remove a prompt from queue
oo delete /prompt-svc/prompt/prom_12345

# List prompts for specific thread
oo post /prompt-svc/prompts \
--query.filters='[{"field": "threadId", "operator": "equals", "value": "thread_12345"}]'

Prompt Types & Capabilities

Text Generation Types

# General text-to-text
Type: "Text-to-Text"
Use: General language tasks, conversations, analysis

# Question answering
Type: "Question Answering"
Use: Specific questions with factual answers

# Translation
Type: "Translation"
Use: Language translation tasks

# Summarization
Type: "Summarization"
Use: Text summarization and condensation

# Text generation
Type: "Text Generation"
Use: Creative writing, code generation

Image Generation Types

# Text-to-image
Type: "Text-to-Image"
Use: Generate images from text descriptions

# Image-to-image
Type: "Image-to-Image"
Use: Transform existing images based on prompts

# Unconditional image generation
Type: "Unconditional Image Generation"
Use: Generate random images without prompts

Multimodal Types

# Image-text-to-text
Type: "Image-Text-to-Text"
Use: Analyze images with text context

# Visual question answering
Type: "Visual Question Answering"
Use: Answer questions about images

# Document question answering
Type: "Document Question Answering"
Use: Extract information from document images

Queue Architecture & Processing

Queue Flow

Queue Management

# Monitor queue status
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "in", "value": ["scheduled", "running"]}]'

# Check retry behavior
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "errored"}]'

# Performance monitoring
oo post /prompt-svc/prompts \
--query.orderBy='[{"field": "createdAt", "desc": true}]' \
--query.limit=10

Queue Status Values:

  • scheduled: Waiting in queue for processing
  • running: Currently being processed by AI engine
  • completed: Successfully finished
  • errored: Failed but will be retried
  • abandoned: Failed after max retries
  • canceled: Manually canceled

Retry Logic

Prompts that fail are automatically retried with exponential backoff:

# Configure retry behavior
oo post /prompt-svc/prompt \
--prompt="Complex analysis task" \
--maxRetries=5 \
--sync=false

# Default retry strategy:
# Attempt 1: Immediate
# Attempt 2: 2 seconds delay
# Attempt 3: 4 seconds delay
# Attempt 4: 8 seconds delay
# Attempt 5: 16 seconds delay

Streaming & Real-Time Responses

Server-Sent Events (SSE)

# Subscribe to thread streaming
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/thread_12345/responses/subscribe"

Stream Event Types:

// Progress chunk (partial response)
{
"text": "Quantum computing is a revolutionary",
"type": "progress"
}

// Done chunk (completion)
{
"text": " technology that uses quantum mechanics.",
"messageId": "msg_abc123",
"type": "done"
}

JavaScript Integration

// Real-time streaming in web applications
const eventSource = new EventSource(
'/prompt-svc/prompts/thread_12345/responses/subscribe',
{
headers: {
'Authorization': 'Bearer ' + token
}
}
);

eventSource.onmessage = function(event) {
const chunk = JSON.parse(event.data);

if (chunk.type === 'progress') {
// Append text to UI
appendToOutput(chunk.text);
} else if (chunk.type === 'done') {
// Response complete
finalizeOutput(chunk.messageId);
}
};

Stream Management

# Pre-subscribe to threads (before prompt submission)
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/future_thread/responses/subscribe" &

# Then submit prompt to that thread
oo post /prompt-svc/prompt \
--prompt="Generate a story" \
--threadId="future_thread" \
--sync=false

Parameter Systems

High-Level Parameters

Use these when you don't care about the specific AI engine:

# Text-to-text parameters
oo post /prompt-svc/prompt \
--prompt="Hello world" \
--parameters.textToText.template="[INST] {prompt} [/INST]"

# Text-to-image parameters
oo post /prompt-svc/prompt \
--prompt="A beautiful sunset" \
--parameters.textToImage.width=512 \
--parameters.textToImage.height=512 \
--parameters.textToImage.steps=20 \
--parameters.textToImage.guidanceScale=7.5

Engine-Specific Parameters

Use these for fine-tuned control over specific AI engines:

# LlamaCpp engine parameters
oo post /prompt-svc/prompt \
--prompt="What is AI?" \
--engineParameters.llamaCpp.template="### HUMAN:\n{prompt}\n\n### RESPONSE:\n"

# Stable Diffusion engine parameters
oo post /prompt-svc/prompt \
--prompt="A spaceship" \
--engineParameters.stableDiffusion.txt2Img.width=768 \
--engineParameters.stableDiffusion.txt2Img.height=768 \
--engineParameters.stableDiffusion.txt2Img.num_inference_steps=30 \
--engineParameters.stableDiffusion.txt2Img.guidance_scale=8.0 \
--engineParameters.stableDiffusion.txt2Img.negative_prompt="blurry, low quality"

Real-World Usage Examples

1. Interactive Chatbot

# Start chat session
THREAD_ID="chat_session_$(date +%s)"

# Set up streaming in background
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/$THREAD_ID/responses/subscribe" &

# Send messages
oo post /prompt-svc/prompt \
--prompt="Hello! I need help with Python programming." \
--threadId="$THREAD_ID" \
--sync=false

oo post /prompt-svc/prompt \
--prompt="How do I create a simple web server?" \
--threadId="$THREAD_ID" \
--sync=false

2. Code Generation Pipeline

# Use CodeLlama for programming tasks
CODE_MODEL="huggingface/TheBloke/codellama-7b.Q4_K_M.gguf"

# Generate function
oo post /prompt-svc/prompt \
--prompt="Write a Python function that calculates the factorial of a number" \
--modelId="$CODE_MODEL" \
--sync=true

# Generate tests
oo post /prompt-svc/prompt \
--prompt="Write unit tests for the factorial function above" \
--modelId="$CODE_MODEL" \
--threadId="code_generation_session" \
--sync=true

# Generate documentation
oo post /prompt-svc/prompt \
--prompt="Write docstring documentation for the factorial function" \
--modelId="$CODE_MODEL" \
--threadId="code_generation_session" \
--sync=true

3. Content Creation Workflow

# Research phase
oo post /prompt-svc/prompt \
--prompt="Research the latest trends in renewable energy technology" \
--threadId="content_creation_001" \
--sync=false

# Writing phase
oo post /prompt-svc/prompt \
--prompt="Write a 500-word article about solar panel efficiency improvements" \
--threadId="content_creation_001" \
--maxRetries=3 \
--sync=false

# Image generation for article
oo post /prompt-svc/prompt \
--prompt="Solar panels on a modern house roof, bright sunny day, professional photography" \
--parameters.textToImage.width=1024 \
--parameters.textToImage.height=768 \
--threadId="content_creation_001" \
--sync=false

4. Document Analysis System

# Analyze uploaded documents
ANALYSIS_THREAD="doc_analysis_$(date +%s)"

# Set up streaming for real-time results
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/$ANALYSIS_THREAD/responses/subscribe" > analysis_output.txt &

# Submit analysis prompts
oo post /prompt-svc/prompt \
--prompt="Summarize the key points in this financial report" \
--threadId="$ANALYSIS_THREAD" \
--sync=false

oo post /prompt-svc/prompt \
--prompt="Extract all financial figures and create a table" \
--threadId="$ANALYSIS_THREAD" \
--sync=false

oo post /prompt-svc/prompt \
--prompt="Identify potential risks mentioned in the document" \
--threadId="$ANALYSIS_THREAD" \
--sync=false

5. Creative AI Assistant

# Story generation with multiple prompts
STORY_THREAD="creative_story_$(date +%s)"

# Character development
oo post /prompt-svc/prompt \
--prompt="Create a detailed character profile for a space explorer" \
--threadId="$STORY_THREAD" \
--sync=false

# Plot outline
oo post /prompt-svc/prompt \
--prompt="Create a plot outline for a science fiction adventure" \
--threadId="$STORY_THREAD" \
--sync=false

# Generate artwork
oo post /prompt-svc/prompt \
--prompt="Space explorer in futuristic suit standing on alien planet, concept art style" \
--parameters.textToImage.width=768 \
--parameters.textToImage.height=1024 \
--threadId="$STORY_THREAD" \
--sync=false

6. Batch Processing System

# Process multiple prompts with queue management
BATCH_THREAD="batch_processing_$(date +%s)"

# Submit batch of prompts
prompts=(
"Analyze customer sentiment in this review: 'Great product, fast delivery'"
"Translate to Spanish: 'Welcome to our customer support'"
"Summarize: 'The quarterly earnings report shows...'"
"Generate email template for customer onboarding"
)

for i in "${!prompts[@]}"; do
oo post /prompt-svc/prompt \
--id="batch_item_$i" \
--prompt="${prompts[$i]}" \
--threadId="$BATCH_THREAD" \
--maxRetries=2 \
--sync=false

echo "Submitted batch item $i"
done

# Monitor batch progress
watch -n 5 "oo post /prompt-svc/prompts --query.filters='[{\"field\": \"threadId\", \"operator\": \"equals\", \"value\": \"$BATCH_THREAD\"}]' | jq '.prompts[] | {id, status}'"

Integration Patterns

Chat Svc Integration

Prompt Svc automatically integrates with Chat Svc:

# Create chat thread
CHAT_THREAD=$(oo post /chat-svc/thread \
--threadData.title="AI Assistant Chat" | jq -r '.thread.id')

# Send prompt (automatically creates chat messages)
oo post /prompt-svc/prompt \
--prompt="Hello! Can you help me learn Python?" \
--threadId="$CHAT_THREAD" \
--sync=false

# View chat history
oo post /chat-svc/thread/$CHAT_THREAD/messages

Model Svc Integration

Automatic model management and status checking:

# Check model status before prompting
oo get /model-svc/default-model/status

# Use specific model (Prompt Svc handles model communication)
oo post /prompt-svc/prompt \
--prompt="Generate code documentation" \
--modelId="huggingface/TheBloke/codellama-7b.Q4_K_M.gguf" \
--sync=true

# Fallback to default model if modelId not specified
oo post /prompt-svc/prompt \
--prompt="What's the weather like?" \
--sync=true

File Svc Integration

# Upload image for analysis
FILE_ID=$(curl -X PUT "http://localhost:11337/file-svc/upload" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@./image.jpg" | jq -r '.upload.fileId')

# Analyze uploaded image (future feature)
oo post /prompt-svc/prompt \
--prompt="Describe what you see in this image" \
--fileIds='["'$FILE_ID'"]' \
--sync=true

Performance Optimization

Synchronous vs Asynchronous

# Use sync=true for:
# - Simple scripts
# - Testing and development
# - Short responses

oo post /prompt-svc/prompt \
--prompt="What is 2+2?" \
--sync=true

# Use sync=false for:
# - Long-running tasks
# - Web applications
# - Batch processing

oo post /prompt-svc/prompt \
--prompt="Write a detailed research paper on quantum computing" \
--threadId="research_paper_001" \
--sync=false

Queue Optimization

# Monitor queue depth
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "scheduled"}]' \
--query.count=true | jq '.count'

# Prioritize urgent prompts (submit to dedicated threads)
oo post /prompt-svc/prompt \
--prompt="URGENT: System security analysis needed" \
--threadId="priority_processing" \
--sync=false

# Batch similar prompts for efficiency
BATCH_THREAD="text_analysis_batch"
for text in "text1" "text2" "text3"; do
oo post /prompt-svc/prompt \
--prompt="Analyze sentiment: $text" \
--threadId="$BATCH_THREAD" \
--sync=false
done

Model Selection

# Use lightweight models for simple tasks
oo post /prompt-svc/prompt \
--prompt="Hello, how are you?" \
--modelId="huggingface/TheBloke/tinyllama-1.1b-chat-v1.0.Q4_K_S.gguf" \
--sync=true

# Use powerful models for complex tasks
oo post /prompt-svc/prompt \
--prompt="Analyze this complex business scenario and provide strategic recommendations" \
--modelId="huggingface/TheBloke/mistral-7b-instruct-v0.2.Q5_K_M.gguf" \
--sync=false

Monitoring & Observability

Queue Status Monitoring

# Real-time queue monitoring
monitor_queue() {
while true; do
echo "=== Queue Status $(date) ==="

# Count by status
for status in "scheduled" "running" "completed" "errored"; do
count=$(oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "'$status'"}]' \
--query.count=true | jq '.count')
echo "$status: $count"
done

echo "---"
sleep 10
done
}

monitor_queue

Performance Analytics

# Response time analysis
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "completed"}]' \
--query.orderBy='[{"field": "createdAt", "desc": true}]' \
--query.limit=10 | jq '.prompts[] | {id, created: .createdAt, lastRun: .lastRun, runCount}'

# Error analysis
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "in", "value": ["errored", "abandoned"]}]' | \
jq '.prompts[] | {id, error, runCount, status}'

# Thread activity
oo post /prompt-svc/prompts \
--query.orderBy='[{"field": "createdAt", "desc": true}]' \
--query.limit=20 | jq '.prompts | group_by(.threadId) | map({thread: .[0].threadId, count: length})'

Health Checking

# Test basic functionality
test_prompt_health() {
echo "Testing Prompt Svc health..."

# Submit test prompt
response=$(oo post /prompt-svc/prompt \
--prompt="Test prompt for health check" \
--sync=true)

if echo "$response" | jq -e '.prompt.id' > /dev/null; then
echo "✅ Prompt Svc is healthy"
else
echo "❌ Prompt Svc health check failed"
echo "$response"
fi
}

test_prompt_health

Troubleshooting

Common Issues

Prompts Stuck in Queue

# Check queue status
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "scheduled"}]'

# Check model status
oo get /model-svc/default-model/status

# Restart processing by canceling and resubmitting
oo delete /prompt-svc/prompt/STUCK_PROMPT_ID

Streaming Not Working

# Test SSE connection
curl -v -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/test_thread/responses/subscribe"

# Check firewall/proxy settings
# Ensure Server-Sent Events are not blocked

High Retry Counts

# Identify problematic prompts
oo post /prompt-svc/prompts \
--query.filters='[{"field": "runCount", "operator": "gt", "value": 3}]'

# Check model errors
oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "equals", "value": "errored"}]' | \
jq '.prompts[] | {id, error, runCount}'

# Verify model is responding
curl http://localhost:8001/health

Memory/Performance Issues

# Monitor queue depth
QUEUE_SIZE=$(oo post /prompt-svc/prompts \
--query.filters='[{"field": "status", "operator": "in", "value": ["scheduled", "running"]}]' \
--query.count=true | jq '.count')

echo "Queue depth: $QUEUE_SIZE"

# Clear completed prompts (if needed)
# Note: This is manual - no automated cleanup yet

Debug Commands

# Detailed prompt inspection
debug_prompt() {
local prompt_id=$1
echo "=== Debug Prompt: $prompt_id ==="

oo post /prompt-svc/prompts \
--query.filters='[{"field": "id", "operator": "equals", "value": "'$prompt_id'"}]' | \
jq '.prompts[0] | {
id, status, prompt, threadId, modelId,
runCount, error, createdAt, lastRun
}'
}

# Usage
debug_prompt "prom_12345"

# Stream testing
test_streaming() {
local thread_id="test_stream_$(date +%s)"

echo "Testing streaming for thread: $thread_id"

# Start streaming in background
curl -N -H "Authorization: Bearer $TOKEN" \
"http://localhost:11337/prompt-svc/prompts/$thread_id/responses/subscribe" &

local curl_pid=$!

# Submit test prompt
oo post /prompt-svc/prompt \
--prompt="Count from 1 to 5" \
--threadId="$thread_id" \
--sync=false

# Wait and cleanup
sleep 10
kill $curl_pid 2>/dev/null
}

test_streaming

Template System

Prompt Templates

Different models require different prompt formats:

# Mistral format
oo post /prompt-svc/prompt \
--prompt="What is machine learning?" \
--parameters.textToText.template="[INST] {prompt} [/INST]"

# Llama2 format
oo post /prompt-svc/prompt \
--prompt="Explain neural networks" \
--parameters.textToText.template="### HUMAN:\n{prompt}\n\n### RESPONSE:\n"

# TinyLlama format
oo post /prompt-svc/prompt \
--prompt="Hello world" \
--parameters.textToText.template="<|system|>\nYou are a helpful assistant.</s>\n<|user|>\n{prompt}</s>\n<|assistant|>"

# Auto-detection (uses model's default template)
oo post /prompt-svc/prompt \
--prompt="Default template test" \
--modelId="huggingface/TheBloke/mistral-7b-instruct-v0.2.Q4_K_M.gguf"

Template Variables

# Custom template with system message
oo post /prompt-svc/prompt \
--prompt="Write code comments" \
--parameters.textToText.template="<|system|>\nYou are a code documentation expert.</s>\n<|user|>\n{prompt}</s>\n<|assistant|>"

# Multi-variable templates (future feature)
# template: "Context: {context}\nQuestion: {prompt}\nAnswer:"

API Reference Summary

EndpointMethodPurpose
/prompt-svc/promptPOSTSubmit prompt for processing
/prompt-svc/promptsPOSTList prompts with filtering
/prompt-svc/prompt/{promptId}DELETERemove prompt from queue
/prompt-svc/prompts/{threadId}/responses/subscribeGETSubscribe to streaming responses
/prompt-svc/typesPOSTGet type definitions (for API docs)

Permissions & Security

# Required permissions
prompt-svc:prompt:create # Submit prompts
prompt-svc:prompt:view # List and view prompts
prompt-svc:prompt:stream # Subscribe to streaming responses
prompt-svc:prompt:delete # Remove prompts from queue

# Privacy protection
# Users can only see their own prompts (prompt text hidden for others)
  • Model Svc: AI model management and status
  • Chat Svc: Conversation threading and message storage
  • File Svc: File attachments and image inputs (future)
  • Policy Svc: Rate limiting AI usage

Future Enhancements

Planned Features

  • Multi-Model Orchestration: Automatic model selection based on prompt type
  • Model Auto-Scaling: Start/stop models based on queue depth
  • File Input Support: Image/document analysis with file uploads
  • Prompt Chaining: Connect multiple prompts in workflows
  • Custom Templates: User-defined prompt templates

Integration Roadmap

  • Voice Integration: Audio-to-text and text-to-speech capabilities
  • Visual Processing: Advanced image analysis and generation
  • Workflow Engine: Complex multi-step AI workflows
  • A/B Testing: Compare different models/prompts for same task
  • Analytics Dashboard: Detailed usage and performance metrics

Prompt Svc provides the essential AI interaction layer for 1Backend, enabling everything from simple chatbots to complex AI workflows with real-time streaming and robust queue management.