🔧 Documentation technique - Niveau Expert/DevOps

Documentation technique complète Osfria OS-FR-IA Platform

Guide exhaustif pour DevOps et techniciens : Configuration système complète, troubleshooting avancé, APIs développeur et procédures d'exploitation détaillées.

🔧 Version Experts/DevOps

Configuration, troubleshooting et maintenance système

👔 Vue dirigeants (ROI) ⚙️ Vue IT Managers

Installation et configuration système

Procédures complètes de déploiement step-by-step

Prérequis système critiques

🔧 Hardware minimum

Installation physique recommandée
Configuration rack - Environnement datacenter

# SEFR-PLUS Nodes (3x)
CPU: Intel i3-N305 (8 cores @ 3.8GHz)
RAM: 32GB LPDDR5 + 16GB swap
Storage: 1TB NVMe M.2 (min 250K IOPS)
Network: 2x 1GbE + 1x 10GbE SFP+
Power: 65W TDP max

# NVIDIA DGX Spark Nodes (3x)  
GPU: NVIDIA Blackwell (1000 TOPS FP4 sparsity)
CPU: 20-core ARM (10x Cortex-X925 + 10x Cortex-A725)
RAM: 128GB LPDDR5x unified memory
Storage: 1-4TB NVMe (self-encryption)
Network: 1x 10GbE + WiFi 7 + ConnectX-7 NIC
Dimensions: 150x150x50.5mm (compact desktop)
Power: ~400W max par unité

🌡️ Infrastructure environnement

# Alimentation
- 3x prises 220V 16A dédiées minimum
- Onduleur 1500VA (autonomie 10min min)
- Protection parafoudre obligatoire
- Consommation totale: ~1200W max (400W × 3 DGX + SEFR-PLUS cluster + marge sécurité)

# Refroidissement
- Température ambiante: 18-24°C
- Humidité relative: 40-60%
- Ventilation standard (format desktop)
- Dissipation thermique optimisée

# Réseau
- Switch 10GbE (8 ports minimum)
- VLAN production isolé
- Plage IP /27 (30 adresses)
- Internet filtré (ports 80,443,22)

⚠️ Points d'attention critique

• Ventilation inadequate = throttling GPU
• Alimentation instable = corruption etcd
• Latence réseau >1.5ms = degraded performance
• Température >85°C = emergency shutdown

Procédure d'installation automatisée

📋 Étapes bootstrap cluster

# 1. Préparation réseau
sudo systemctl enable --now systemd-networkd
sudo networkctl reload

# 2. Configuration SSH hardening
echo "PermitRootLogin no" >> /etc/ssh/sshd_config
echo "PasswordAuthentication no" >> /etc/ssh/sshd_config
sudo systemctl restart sshd

# 3. Installation k3s master (premier SEFR-PLUS)
curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" \
  INSTALL_K3S_EXEC="--cluster-init --disable traefik" sh -

# 4. Récupération token
sudo cat /var/lib/rancher/k3s/server/node-token

# 5. Join autres nœuds SEFR-PLUS
curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" \
  K3S_URL="https://MASTER_IP:6443" \
  K3S_TOKEN="NODE_TOKEN" sh -

🤖 Configuration DGX Spark workers

# 1. Installation NVIDIA Container Runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \
  | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-runtime

# 2. Configuration Docker daemon pour DGX Spark
cat > /etc/docker/daemon.json << EOF
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
EOF

# 3. Join cluster k3s
curl -sfL https://get.k3s.io | K3S_URL="https://MASTER_IP:6443" \
  K3S_TOKEN="NODE_TOKEN" K3S_NODE_NAME="dgx-spark-$(hostname)" sh -

✅ Validation installation

# Vérifier nœuds
kubectl get nodes -o wide

# Vérifier GPU disponibles
kubectl describe node | grep nvidia.com/gpu

Configuration VLAN et sécurité réseau

🔧 Configuration switch managé

# Configuration VLANs sur switch Cisco/HP
vlan 100
  name PROD-CLUSTER
  exit

vlan 101  
  name MGMT-OOB
  exit

vlan 102
  name USER-ACCESS
  exit

# Ports trunk vers nœuds
interface range GigabitEthernet1/0/1-6
  switchport mode trunk
  switchport trunk allowed vlan 100,101,102
  spanning-tree portfast trunk
  exit

# Ports accès utilisateurs  
interface range GigabitEthernet1/0/7-24
  switchport mode access
  switchport access vlan 102
  spanning-tree portfast
  exit

🛡️ Règles firewall iptables

# Script firewall système
#!/bin/bash

# Flush existing rules
iptables -F
iptables -X
iptables -Z

# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP  
iptables -P OUTPUT ACCEPT

# Loopback et established
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# SSH depuis MGMT VLAN uniquement
iptables -A INPUT -s 10.101.0.0/24 -p tcp --dport 22 -j ACCEPT

# HTTPS pour interface utilisateur
iptables -A INPUT -s 10.102.0.0/24 -p tcp --dport 443 -j ACCEPT

# Kubernetes cluster communication
iptables -A INPUT -s 10.100.0.0/24 -p tcp --dport 6443 -j ACCEPT
iptables -A INPUT -s 10.100.0.0/24 -p tcp --dport 2379:2380 -j ACCEPT

# Rate limiting SSH brute force
iptables -A INPUT -p tcp --dport 22 -m state --state NEW \
  -m recent --set --name ssh --rsource
iptables -A INPUT -p tcp --dport 22 -m state --state NEW \
  -m recent --update --seconds 60 --hitcount 4 \
  --name ssh --rsource -j DROP

Configuration services Kubernetes

Manifests YAML et configurations détaillées

Longhorn - Stockage distribué

📦 Installation Longhorn

# Installation via Helm
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultDataPath="/opt/longhorn" \
  --set defaultSettings.replicaReplenishmentWaitInterval=300 \
  --set defaultSettings.concurrentReplicaRebuildPerNodeLimit=2

# Vérification installation
kubectl -n longhorn-system get pods
kubectl -n longhorn-system get storageclass

⚙️ Configuration StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "300"
  fromBackup: ""
  diskSelector: "ssd"
  nodeSelector: "storage-node"
  recurringJobSelector: '
    [
      {
        "name":"backup",
        "isGroup":false
      }
    ]'

🔧 Commandes maintenance

# Backup volume
kubectl apply -f - << EOF
apiVersion: longhorn.io/v1beta1
kind: Backup
metadata:
  name: backup-$(date +%Y%m%d-%H%M%S)
  namespace: longhorn-system
spec:
  volumeName: pvc-volume-name
EOF

# Restore volume
longhornctl backup restore backup-name volume-name

vLLM - Moteur d'inférence IA

🚀 Deployment vLLM

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-server
  namespace: osfria-ai
spec:
  replicas: 3
  selector:
    matchLabels:
      app: vllm-server
  template:
    metadata:
      labels:
        app: vllm-server
    spec:
      nodeSelector:
        nvidia.com/gpu: "true"
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        args:
        - --model
        - /models/deepseek-r2-78b-int4
        - --tensor-parallel-size
        - "1"
        - --dtype
        - half
        - --max-model-len
        - "4096"
        - --gpu-memory-utilization
        - "0.85"
        - --host
        - "0.0.0.0"
        - --port
        - "8000"
        resources:
          requests:
            nvidia.com/gpu: 1
            memory: "16Gi"
            cpu: "4"
          limits:
            nvidia.com/gpu: 1
            memory: "32Gi"
            cpu: "8"
        volumeMounts:
        - name: model-storage
          mountPath: /models
          readOnly: true
        env:
        - name: CUDA_VISIBLE_DEVICES
          value: "0"
      volumes:
      - name: model-storage
        persistentVolumeClaim:
          claimName: model-storage-pvc

🔄 Service et HPA

---
apiVersion: v1
kind: Service
metadata:
  name: vllm-service
  namespace: osfria-ai
spec:
  selector:
    app: vllm-server
  ports:
  - port: 8000
    targetPort: 8000
    protocol: TCP
  type: ClusterIP

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vllm-hpa
  namespace: osfria-ai
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vllm-server
  minReplicas: 1
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

📊 Monitoring vLLM

# Metrics endpoint
curl http://vllm-service:8000/metrics

# Health check
curl http://vllm-service:8000/health

# Model info
curl http://vllm-service:8000/v1/models

Troubleshooting avancé et maintenance

Diagnostic et résolution des incidents critiques

Diagnostic cluster K8s

🚨 Incidents fréquents

# Diagnostic
kubectl get nodes -o wide
kubectl describe node NODE_NAME

# Vérifier kubelet
sudo systemctl status k3s
sudo journalctl -u k3s -f

# Causes communes :
# 1. Disque plein
df -h
# 2. OOM kubelet  
dmesg | grep -i "killed process"
# 3. Réseau
ping MASTER_IP
ss -tulpn | grep 6443

# Solutions
sudo systemctl restart k3s
sudo k3s-killall.sh && sudo systemctl start k3s

# Diagnostic détaillé
kubectl describe pod POD_NAME -n osfria-ai
kubectl logs POD_NAME -n osfria-ai --previous
kubectl get events -n osfria-ai --sort-by='.lastTimestamp'

# Vérifier GPU
kubectl exec -it POD_NAME -n osfria-ai -- nvidia-smi
kubectl get nodes -o json | jq '.items[].status.capacity."nvidia.com/gpu"'

# Causes communes :
# 1. Mémoire GPU insuffisante
kubectl top pods -n osfria-ai --containers
# 2. Modèle corrompu
kubectl exec -it POD_NAME -n osfria-ai -- ls -la /models/
# 3. TensorRT compilation failed
kubectl logs POD_NAME -n osfria-ai | grep -i "tensorrt"

# Solutions
kubectl delete pod POD_NAME -n osfria-ai  # Force restart
kubectl scale deployment vllm-server --replicas=0 -n osfria-ai
kubectl scale deployment vllm-server --replicas=1 -n osfria-ai

# Diagnostic stockage
kubectl get volumes -n longhorn-system
kubectl get replicas -n longhorn-system
kubectl describe volume VOLUME_NAME -n longhorn-system

# Vérifier santé disques
smartctl -a /dev/nvme0n1
iostat -x 1 5

# Interface Longhorn
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
# Accès : http://localhost:8080

# Causes communes :
# 1. Disk SMART errors
smartctl -H /dev/nvme0n1
# 2. Replica degraded
kubectl get replicas -n longhorn-system | grep Degraded  
# 3. Network partition
ping OTHER_NODES_IP

# Solutions
kubectl patch volume VOLUME_NAME -n longhorn-system \
  --type='merge' -p='{"spec":{"numberOfReplicas":2}}'
  
# Rebuild replica
kubectl annotate replica REPLICA_NAME -n longhorn-system \
  longhorn.io/replica-auto-balance=enabled

🆘 Escalation procédure

Niveau 1 : Redémarrage service (5 min)
Niveau 2 : Redémarrage nœud (10 min)
Niveau 3 : Basculement cluster (20 min)
Niveau 4 : Restore backup (60 min)

Scripts d'automatisation

⚡ Scripts de maintenance

#!/bin/bash
# health-check.sh - Vérification santé complète

echo "=== Osfria Health Check $(date) ==="

# 1. Cluster K8s
echo "🔍 Kubernetes cluster:"
kubectl get nodes --no-headers | grep -v Ready | wc -l
kubectl get pods -A --field-selector=status.phase!=Running | grep -v Completed | wc -l

# 2. Services critiques
echo "🔍 Services critiques:"
kubectl get pods -n osfria-ai -l app=vllm-server --no-headers | grep -v Running | wc -l
kubectl get pods -n longhorn-system --no-headers | grep -v Running | wc -l

# 3. Performance GPU
echo "🔍 GPU utilisation:"
kubectl exec -n osfria-ai $(kubectl get pods -n osfria-ai -l app=vllm-server -o name | head -1) -- \
  nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits

# 4. Stockage
echo "🔍 Stockage utilisé:"
kubectl exec -n longhorn-system $(kubectl get pods -n longhorn-system -l app=longhorn-manager -o name | head -1) -- \
  longhornctl get volume | grep -v "100%" | wc -l

# 5. Métriques réseau
echo "🔍 Latence inter-nœuds:"
kubectl get nodes -o wide | tail -n +2 | while read node; do
  ip=$(echo $node | awk '{print $6}')
  ping -c 3 $ip | tail -1 | awk '{print $4}' | cut -d'/' -f2
done

echo "=== Health Check Complete ==="

🔄 Script backup automatisé

#!/bin/bash
# backup-osfria.sh - Backup complet automatisé

BACKUP_DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backup/osfria-$BACKUP_DATE"

echo "🔄 Début backup Osfria - $BACKUP_DATE"

# 1. Backup etcd (K8s state)
mkdir -p $BACKUP_DIR/etcd
ETCDCTL_API=3 etcdctl snapshot save $BACKUP_DIR/etcd/snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
  --cert=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt \
  --key=/var/lib/rancher/k3s/server/tls/etcd/server-client.key

# 2. Backup Longhorn volumes
mkdir -p $BACKUP_DIR/volumes
kubectl get pv -o json > $BACKUP_DIR/volumes/persistent-volumes.json

# 3. Backup configurations
mkdir -p $BACKUP_DIR/config
kubectl get configmaps -A -o yaml > $BACKUP_DIR/config/configmaps.yaml
kubectl get secrets -A -o yaml > $BACKUP_DIR/config/secrets.yaml

# 4. Backup modèles IA
mkdir -p $BACKUP_DIR/models
rsync -av /opt/models/ $BACKUP_DIR/models/

# 5. Compression et stockage
tar -czf $BACKUP_DIR.tar.gz -C /backup osfria-$BACKUP_DATE
rm -rf $BACKUP_DIR

# 6. Rétention (garder 30 jours)
find /backup -name "osfria-*.tar.gz" -mtime +30 -delete

echo "✅ Backup terminé: $BACKUP_DIR.tar.gz"

⏰ Planification cron

# crontab -e
# Backup quotidien 2h du matin
0 2 * * * /opt/scripts/backup-osfria.sh >> /var/log/backup.log 2>&1

# Health check toutes les 5 minutes
*/5 * * * * /opt/scripts/health-check.sh >> /var/log/health.log 2>&1

# Nettoyage logs hebdomadaire
0 3 * * 0 find /var/log -name "*.log" -mtime +7 -delete

Monitoring avancé Prometheus/Grafana

Configuration complète métriques et alerting

Configuration Prometheus avancée

📊 Fichier prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'osfria-edge'
    environment: 'production'

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  # Kubernetes API server
  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
    - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https

  # Node Exporter (métriques système)
  - job_name: 'node-exporter'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: node-exporter
    - source_labels: [__address__]
      regex: '(.*):.*'
      target_label: __address__
      replacement: '${1}:9100'

  # GPU Exporter (métriques NVIDIA)
  - job_name: 'gpu-exporter'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: gpu-exporter
    - source_labels: [__address__]
      regex: '(.*):.*'
      target_label: __address__
      replacement: '${1}:9445'

  # vLLM metrics
  - job_name: 'vllm-server'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: vllm-server
    - source_labels: [__address__]
      regex: '(.*):.*'
      target_label: __address__
      replacement: '${1}:8000'
    metrics_path: /metrics

  # Longhorn metrics
  - job_name: 'longhorn'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: longhorn-manager
    - source_labels: [__address__]
      regex: '(.*):.*'
      target_label: __address__
      replacement: '${1}:9500'

🚨 Règles d'alerting critiques

# /etc/prometheus/rules/osfria-alerts.yml
groups:
- name: osfria.critical
  rules:
  # Nœud cluster down
  - alert: NodeDown
    expr: up{job="kubernetes-nodes"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Nœud Kubernetes indisponible"
      description: "Le nœud {{ $labels.instance }} est DOWN depuis 1 minute"

  # GPU overheating
  - alert: GPUOverheating
    expr: nvidia_gpu_temperature_celsius > 85
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "GPU surchauffe critique"
      description: "GPU {{ $labels.uuid }} température: {{ $value }}°C"

  # vLLM service down
  - alert: VLLMServiceDown
    expr: up{job="vllm-server"} == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Service vLLM indisponible"
      description: "Instance vLLM {{ $labels.instance }} DOWN"

  # Longhorn volume degraded
  - alert: LonghornVolumeDegraded
    expr: longhorn_volume_robustness < 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Volume Longhorn dégradé"
      description: "Volume {{ $labels.volume }} robustness: {{ $value }}"

  # Latence IA élevée
  - alert: HighAILatency
    expr: vllm_request_duration_seconds{quantile="0.95"} > 0.5
    for: 3m
    labels:
      severity: warning
    annotations:
      summary: "Latence IA dégradée"
      description: "P95 latency: {{ $value }}s > 500ms"

Dashboards experts

📈 Dashboard GPU détaillé

Métriques GPU par nœud:

• GPU utilisation (0-100%)
• Memory usage (MB/total)
• Temperature (°C)
• Power draw (W)
• SM clock (MHz)
• Memory clock (MHz)
• Processes count

🤖 Dashboard vLLM performance

Métriques inférence:

• Requests/sec par modèle
• Latence P50/P95/P99
• Queue depth temps réel
• Tokens/sec generated
• Batch size moyen
• Cache hit ratio
• Error rate (%)

💾 Dashboard Longhorn avancé

Métriques stockage:

• IOPS lecture/écriture
• Throughput MB/s
• Latency moyenne I/O
• Volume usage (%)
• Replica health status
• Backup progression
• Network bandwidth

📋 PromQL queries utiles

# Top GPU par utilisation
topk(5, nvidia_gpu_utilization_percent)

# Latence P95 vLLM
histogram_quantile(0.95, 
  vllm_request_duration_seconds_bucket)

# IOPS Longhorn par volume
rate(longhorn_volume_read_iops[5m])

APIs développeur et intégrations avancées

Documentation complète APIs REST et webhooks

API vLLM OpenAI Compatible

🔑 Authentification et endpoints

# Base URL
https://osfria.entreprise.local/api/v1

# Headers requis
Authorization: Bearer YOUR_API_TOKEN
Content-Type: application/json

# Endpoints principaux
GET    /v1/models                    # Liste modèles
POST   /v1/chat/completions          # Chat completion
POST   /v1/completions               # Text completion  
GET    /v1/models/{model_id}         # Info modèle
GET    /health                       # Health check
GET    /metrics                      # Métriques Prometheus

💬 Exemple chat completion

curl -X POST "https://osfria.entreprise.local/api/v1/chat/completions" \
  -H "Authorization: Bearer sk-osfria-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r2-78b-int4",
    "messages": [
      {
        "role": "system",
        "content": "Tu es un assistant IA français spécialisé en entreprise."
      },
      {
        "role": "user", 
        "content": "Explique-moi la blockchain en 100 mots"
      }
    ],
    "max_tokens": 150,
    "temperature": 0.7,
    "stream": false
  }'

# Réponse
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699234567,
  "model": "deepseek-r2-78b-int4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "La blockchain est une technologie..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 87,
    "total_tokens": 132
  }
}

⚡ Paramètres avancés

stream: true - Réponse en streaming
logprobs: 5 - Log probabilities
echo: true - Echo du prompt
best_of: 3 - Génération multiple
repetition_penalty: 1.1 - Anti-répétition

API RAG+ et gestion documents

📄 Upload et indexation documents

# Upload document
curl -X POST "https://osfria.entreprise.local/api/v1/documents" \
  -H "Authorization: Bearer sk-osfria-..." \
  -F "[email protected]" \
  -F "collection=knowledge-base" \
  -F "metadata={\"department\":\"finance\",\"confidential\":true}"

# Réponse
{
  "document_id": "doc_abc123",
  "filename": "document.pdf",
  "status": "processing",
  "chunks_count": 0,
  "upload_time": "2024-12-03T15:30:00Z"
}

# Vérifier statut indexation
curl "https://osfria.entreprise.local/api/v1/documents/doc_abc123" \
  -H "Authorization: Bearer sk-osfria-..."

# Réponse après traitement
{
  "document_id": "doc_abc123",
  "filename": "document.pdf", 
  "status": "indexed",
  "chunks_count": 47,
  "collection": "knowledge-base",
  "metadata": {
    "department": "finance",
    "confidential": true,
    "pages": 12,
    "file_size": 2048576
  }
}

🔍 Recherche RAG+ contextuelle

# Recherche dans documents
curl -X POST "https://osfria.entreprise.local/api/v1/search" \
  -H "Authorization: Bearer sk-osfria-..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "procédure remboursement frais",
    "collection": "knowledge-base",
    "filters": {
      "department": "finance",
      "confidential": true
    },
    "top_k": 5,
    "score_threshold": 0.7
  }'

# Réponse
{
  "results": [
    {
      "document_id": "doc_abc123",
      "chunk_id": "chunk_456",
      "content": "La procédure de remboursement...",
      "score": 0.89,
      "metadata": {
        "page": 3,
        "section": "Procédures financières"
      }
    }
  ],
  "total_results": 3,
  "search_time_ms": 45
}

🔒 Sécurité et permissions

• RBAC : Permissions par collection/département
• Filtrage : Documents selon droits utilisateur
• Audit : Log toutes requêtes RAG+
• Chiffrement : Documents sensibles AES-256

Webhooks et intégrations temps réel

🔔 Configuration webhooks

# Créer webhook
curl -X POST "https://osfria.entreprise.local/api/v1/webhooks" \
  -H "Authorization: Bearer sk-osfria-..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://mon-app.entreprise.local/webhook",
    "events": [
      "document.indexed",
      "model.loaded", 
      "alert.triggered",
      "usage.threshold"
    ],
    "secret": "webhook_secret_123",
    "active": true
  }'

# Payload webhook document.indexed
{
  "event": "document.indexed",
  "timestamp": "2024-12-03T15:30:00Z",
  "data": {
    "document_id": "doc_abc123",
    "filename": "rapport.pdf",
    "collection": "reports",
    "chunks_count": 24,
    "processing_time_ms": 15642
  },
  "signature": "sha256=abc123..."
}

🔌 SDK et librairies

# Python SDK
pip install osfria-python

from osfria import OsfriaClient

client = OsfriaClient(
    api_key="sk-osfria-...",
    base_url="https://osfria.entreprise.local"
)

# Chat completion
response = client.chat.completions.create(
    model="deepseek-r2-78b-int4",
    messages=[
        {"role": "system", "content": "Tu es un assistant IA français spécialisé en entreprise."},
        {"role": "user", "content": "Bonjour"}
    ]
)

# Upload document avec RAG+
document = client.documents.upload(
    file="rapport.pdf",
    collection="knowledge-base"
)

# JavaScript/TypeScript SDK
npm install @osfria/sdk

import { OsfriaClient } from '@osfria/sdk';

const client = new OsfriaClient({
    apiKey: 'sk-osfria-...',
    baseURL: 'https://osfria.entreprise.local'
});

const completion = await client.chat.completions.create({
    model: 'deepseek-r2-78b-int4',
    messages: [{ role: 'user', content: 'Bonjour' }]
});