|
本帖最后由 jjcs 于 2025-7-12 17:59 编辑
请问大佬部署Proxmox lxc 直通VGPU 遇到问题,
之前一直在vm中运行,现在像尝试到lxc
版本说明
Proxmox版本8.4.1 内核版本Linux 6.8.12-9-pve
GPU硬件和VGPU软件版本,proxmox宿主机执行nvidia-smi
直通VGPU文件路径,proxmox宿主机执行ls -l /dev/nvidia*
LXC配置文件:
LXC执行nvidia-smi
Lxc执行ls -l /dev/nvidia*
LXC执行dpkg -l | grep nvidia-container-toolkit
Lxc Frigate,docker-compose配置文件
镜像版本frigate:0.15.2-tensorrt
Frigate配置文件:
目录结构:
docker运行报错
s6-rc: info: service go2rtc-log: starting
s6-rc: info: service frigate-log: starting
s6-rc: info: service certsync-log: starting
s6-rc: info: service go2rtc-log successfully started
s6-rc: info: service go2rtc: starting
s6-rc: info: service frigate-log successfully started
s6-rc: info: service go2rtc successfully started
s6-rc: info: service go2rtc-healthcheck: starting
s6-rc: info: service frigate: starting
s6-rc: info: service nginx-log successfully started
s6-rc: info: service certsync-log successfully started
s6-rc: info: service go2rtc-healthcheck successfully started
s6-rc: info: service frigate successfully started
s6-rc: info: service nginx: starting
2025-07-12 17:36:21.367894055 [INFO] Preparing new go2rtc config...
2025-07-12 17:36:21.371335726 [INFO] Preparing Frigate...
2025-07-12 17:36:21.374251984 [INFO] Starting NGINX...
2025-07-12 17:36:21.554137360 [INFO] Starting Frigate...
2025-07-12 17:36:21.823117904 [INFO] Not injecting WebRTC candidates into go2rtc config as it has been set manually
2025-07-12 17:36:21.981956149 [INFO] Starting go2rtc...
2025-07-12 17:36:22.115816096 17:36:22.115 INF go2rtc platform=linux/amd64 revision=b2399f3 version=1.9.2
2025-07-12 17:36:22.115821320 17:36:22.115 INF config path=/dev/shm/go2rtc.yaml
2025-07-12 17:36:22.116464525 17:36:22.116 INF [rtsp] listen addr=:8554
2025-07-12 17:36:22.116468212 17:36:22.116 INF [api] listen addr=:1984
2025-07-12 17:36:22.116851938 17:36:22.116 INF [webrtc] listen addr=:8555
s6-rc: info: service nginx successfully started
s6-rc: info: service certsync: starting
s6-rc: info: service certsync successfully started
s6-rc: info: service legacy-services: starting
2025-07-12 17:36:22.547425965 [INFO] Starting certsync...
s6-rc: info: service legacy-services successfully started
2025-07-12 17:36:23.531593221 2025/07/12 17:36:23 [error] 166#166: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: , request: "GET /api/version HTTP/1.1", subrequest: "/auth", upstream: "http://127.0.0.1:5001/auth", host: "127.0.0.1:5000"
2025-07-12 17:36:23.531602164 2025/07/12 17:36:23 [error] 166#166: *1 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: , request: "GET /api/version HTTP/1.1", host: "127.0.0.1:5000"
2025-07-12 17:36:24.753500389 [2025-07-12 17:36:24] frigate.util.config INFO : Checking if frigate config needs migration...
2025-07-12 17:36:24.793872854 [2025-07-12 17:36:24] frigate.util.config INFO : frigate config does not need migration...
2025-07-12 17:36:25.041356244 [2025-07-12 17:36:25] frigate.app INFO : Starting Frigate (0.15.2-3bda638)
2025-07-12 17:36:25.041873370 [2025-07-12 17:36:25] frigate.util.services INFO : Current file limits - Soft: 1048576, Hard: 1048576
2025-07-12 17:36:25.042021778 [2025-07-12 17:36:25] frigate.util.services INFO : File limit set. New soft limit: 65536, Hard limit remains: 1048576
2025-07-12 17:36:25.053862222 [2025-07-12 17:36:25] peewee_migrate.logs INFO : Starting migrations
2025-07-12 17:36:25.054362456 [2025-07-12 17:36:25] peewee_migrate.logs INFO : There is nothing to migrate
2025-07-12 17:36:25.248357373 [2025-07-12 17:36:25] frigate.app INFO : Recording process started: 351
2025-07-12 17:36:25.255887732 [2025-07-12 17:36:25] frigate.app INFO : Review process started: 353
2025-07-12 17:36:25.259765696 [2025-07-12 17:36:25] frigate.app INFO : go2rtc process pid: 101
2025-07-12 17:36:25.276670055 [2025-07-12 17:36:25] detector.onnx_0 INFO : Starting detection process: 367
2025-07-12 17:36:25.279450047 2025-07-12 17:36:25.279168186 [E nnxruntime efault, provider_bridge_ort.cc:1731 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime: rovider& onnxruntime: roviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libnvinfer.so.10: cannot open shared object file: No such file or directory
2025-07-12 17:36:25.279455719
2025-07-12 17:36:25.280616232 2025-07-12 17:36:25.280552599 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 383, index: 0, mask: {2, 26, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.284098751 2025-07-12 17:36:25.281169623 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 385, index: 2, mask: {6, 30, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.289102452 2025-07-12 17:36:25.285132539 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 386, index: 3, mask: {8, 32, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.293107816 2025-07-12 17:36:25.289164794 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 387, index: 4, mask: {10, 34, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.297096608 2025-07-12 17:36:25.293163041 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 388, index: 5, mask: {12, 36, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.299172791 2025-07-12 17:36:25.297287216 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 389, index: 6, mask: {14, 38, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.299177138 2025-07-12 17:36:25.298319827 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 390, index: 7, mask: {16, 40, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.299221961 [2025-07-12 17:36:25] frigate.app INFO : Output process started: 399
2025-07-12 17:36:25.305111982 2025-07-12 17:36:25.301142850 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 398, index: 15, mask: {9, 33, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.305116061 2025-07-12 17:36:25.302153846 [E nnxruntime efault, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 391, index: 8, mask: {18, 42, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.306209909 2025-07-12 17:36:25.306180928 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 393, index: 10, mask: {22, 46, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.306293691 2025-07-12 17:36:25.306201856 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 397, index: 14, mask: {7, 31, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.306485933 2025-07-12 17:36:25.306462738 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 404, index: 16, mask: {11, 35, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.306660563 2025-07-12 17:36:25.306639842 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 405, index: 17, mask: {13, 37, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.310716035 2025-07-12 17:36:25.310689645 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 410, index: 22, mask: {23, 47, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.313215231 2025-07-12 17:36:25.313185425 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 409, index: 21, mask: {21, 45, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.313376281 2025-07-12 17:36:25.310940649 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 395, index: 12, mask: {3, 27, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.313545804 2025-07-12 17:36:25.310888387 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 394, index: 11, mask: {1, 25, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.316199632 2025-07-12 17:36:25.316171524 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 406, index: 18, mask: {15, 39, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.345109698 2025-07-12 17:36:25.341126447 [E:onnxruntime:Default, env.cc:228 ThreadMain] pthread_setaffinity_np failed for thread: 408, index: 20, mask: {19, 43, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2025-07-12 17:36:25.416867838 [2025-07-12 17:36:25] frigate.app INFO : Camera processor started for haikang_kitchen_1: 421
2025-07-12 17:36:25.417224408 [2025-07-12 17:36:25] frigate.app INFO : Camera processor started for xiangxia_camera_2: 422
2025-07-12 17:36:25.417459766 [2025-07-12 17:36:25] frigate.app INFO : Capture process started for haikang_kitchen_1: 428
2025-07-12 17:36:25.441591743 [2025-07-12 17:36:25] frigate.app INFO : Capture process started for xiangxia_camera_2: 435
2025-07-12 17:36:25.515147794 [2025-07-12 17:36:25] frigate.detectors.plugins.onnx INFO : ONNX: loaded onnxruntime module
2025-07-12 17:36:25.539294060 [2025-07-12 17:36:25] frigate.detectors.plugins.onnx INFO : ONNX: loading /config/model_cache/onnx/yolonas_l-320.onnx
2025-07-12 17:36:25.547241353 *************** EP Error ***************
2025-07-12 17:36:25.547246295 EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:456 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(onnxruntime::python: ySessionOptions&, const ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
2025-07-12 17:36:25.547248512 when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
2025-07-12 17:36:25.547250288 Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
2025-07-12 17:36:25.547252065 ****************************************
2025-07-12 17:36:25.547364737 Process detector:onnx_0:
2025-07-12 17:36:25.547366288 Traceback (most recent call last):
2025-07-12 17:36:25.547367998 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
2025-07-12 17:36:25.547369573 self._create_inference_session(providers, provider_options, disabled_optimizers)
2025-07-12 17:36:25.547371360 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 469, in _create_inference_session
2025-07-12 17:36:25.547372916 self._register_ep_custom_ops(session_options, providers, provider_options, available_providers)
2025-07-12 17:36:25.547374680 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 516, in _register_ep_custom_ops
2025-07-12 17:36:25.547378911 C.register_tensorrt_plugins_as_custom_ops(session_options, provider_options)
2025-07-12 17:36:25.547381884 RuntimeError: /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:456 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(onnxruntime::python: ySessionOptions&, const ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
2025-07-12 17:36:25.547384340
2025-07-12 17:36:25.547385484
2025-07-12 17:36:25.547386865 The above exception was the direct cause of the following exception:
2025-07-12 17:36:25.547387941
2025-07-12 17:36:25.547389279 Traceback (most recent call last):
2025-07-12 17:36:25.547390823 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2025-07-12 17:36:25.547391987 self.run()
2025-07-12 17:36:25.547434082 File "/opt/frigate/frigate/util/process.py", line 41, in run_wrapper
2025-07-12 17:36:25.547435529 return run(*args, **kwargs)
2025-07-12 17:36:25.547437004 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2025-07-12 17:36:25.547438462 self._target(*self._args, **self._kwargs)
2025-07-12 17:36:25.547439922 File "/opt/frigate/frigate/object_detection.py", line 121, in run_detector
2025-07-12 17:36:25.547441353 object_detector = LocalObjectDetector(detector_config=detector_config)
2025-07-12 17:36:25.547442853 File "/opt/frigate/frigate/object_detection.py", line 68, in __init__
2025-07-12 17:36:25.547444236 self.detect_api = create_detector(detector_config)
2025-07-12 17:36:25.547445723 File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2025-07-12 17:36:25.547446934 return api(detector_config)
2025-07-12 17:36:25.547468190 File "/opt/frigate/frigate/detectors/plugins/onnx.py", line 44, in __init__
2025-07-12 17:36:25.547469557 self.model = ort.InferenceSession(
2025-07-12 17:36:25.547471239 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in __init__
2025-07-12 17:36:25.547472503 raise fallback_error from e
2025-07-12 17:36:25.547474354 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in __init__
2025-07-12 17:36:25.547475763 self._create_inference_session(self._fallback_providers, None)
2025-07-12 17:36:25.547477490 File "/usr/local/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
2025-07-12 17:36:25.547479043 sess.initialize_session(providers, provider_options, disabled_optimizers)
2025-07-12 17:36:25.547494035 RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 999: unknown error ; GPU=29900 ; hostname=0feea960c007 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=280 ; expr=cudaSetDevice(info_.device_id);
2025-07-12 17:36:25.547515503
还请各位大佬麻烦帮忙看看,之前安装过cuda也没用,一样报错,还要一个问题,内核直通的VGPU貌似不需要授权把,我使用vm是搭建了授权,lxc查询授权终端没有输出
|
|