J’ai changé la carte NVIDIA car deux cartes NVIDIA avec 8 Go chacune, elles sont vues par la VM qui est lancé par proxmox :
# nvidia-smi --list-gpus GPU 0: Quadro M5000 (UUID: GPU-) GPU 1: Quadro M4000 (UUID: GPU-) # nvidia-smi +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.86.15 Driver Version: 570.86.15 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Quadro M5000 Off | 00000000:00:10.0 Off | Off | | 38% 37C P8 13W / 150W | 5MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 Quadro M4000 Off | 00000000:00:11.0 Off | N/A | | 46% 39C P8 13W / 120W | 5MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
les résultats du test sont les suivants :
# llm_benchmark run -------Linux---------- {'id': '0', 'name': 'Quadro M5000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '8110.0 MB', 'gpu_memory_used': '5.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '36.0°C'} {'id': '1', 'name': 'Quadro M4000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '8110.0 MB', 'gpu_memory_used': '5.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '38.0°C'} At least two GPU cards Total memory size : 61.36 GB cpu_info: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz gpu_info: Quadro M5000 Quadro M4000 os_version: Ubuntu 22.04.5 LTS ollama_version: 0.5.7 ---------- .... -------Linux---------- {'id': '0', 'name': 'Quadro M5000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '3277.0 MB', 'gpu_memory_used': '4838.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '65.0°C'} {'id': '1', 'name': 'Quadro M4000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '2348.0 MB', 'gpu_memory_used': '5767.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '76.0°C'} At least two GPU cards { "mistral:7b": "16.56", "llama3.1:8b": "15.71", "phi4:14b": "8.01", "qwen2:7b": "15.27", "gemma2:9b": "15.81", "llava:7b": "17.82", "llava:13b": "13.14", "uuid": "1a60faf0-e97b-5d47-8de5-03d3b22dfbbc", "ollama_version": "0.5.7" }
Actuellement j’utilise « llama3.1:8b », je suis donc passé le 1.12 (unitilisable) à 15,71 . L’idéal est d’avoir dans les plus de 32 … donc il va falloir trouver deux nouvelles cartes.
Misère.