Address model not found silent container pipeline crash
We have observed an issue where the pipeline encounters problems that aren't always effectively communicated to the user. One of these issues occurs when a model is not pre-downloaded on the orchestrator. In such cases, the runner attempts to download the model, which can take a long time. The orchestrator is not informed that this is happening, and if the download takes longer than 5 minutes, the runner will timeout.
**Steps to Reproduce:**
1. Clone and build the `go-livepeer@ai-video` branch from [GitHub](https://github.com/livepeer/go-livepeer/tree/ai-video) (detailed instructions available [here](https://www.notion.so/livepeer/go-livepeer-ai-video-local-development-3c56ea0abd5e4c8394abc4c00de8d094?pvs=4)).
2. Add the specified `aiModels.json` configuration to your `~/.lpData` directory:
```json
[
{
"pipeline": "image-to-video",
"model_id": "stabilityai/stable-video-diffusion-img2vid-xt-1-1",
"price_per_unit": 3390842
}
]
```
3. Neglect to download the `stabilityai/stable-video-diffusion-img2vid-xt-1-1` model into the `~/.lpData/models` folder.
4. Launch a local broadcaster and orchestrator.
5. Request an image-to-video job.
**Result:** The container crashes, leaving the user uninformed about the progress only to receive a `service unavailable error` after a very long timeout. Additionally the orchestrator has to check the container logs to find out why crashes the container which is also not ideal.
**Reason:**
The failure is due to the inability to access the token-gated model `stabilityai/stable-video-diffusion-img2vid-xt-1-1` from the URL `https://huggingface.co/api/models/stabilityai/stable-video-diffusion-img2vid-xt-1-1`. This model is gated and requires authentication for access, which isn't provided in this scenario. This error occurs when the Orchestrator forgot the download the given model but said it was available.
The critical error message states:
```
Cannot load model stabilityai/stable-video-diffusion-img2vid-xt-1-1: model is not cached locally and an error occurred while trying to fetch metadata from the Hub. Please check out the root cause in the stacktrace above.
```
**Proposed Solution:**
To enhance user experience and ensure clear communication, it's essential to implement more robust error handling mechanisms. Specifically, when the pipeline encounters such an error, it should catch the exception and provide a user-friendly error message explaining the issue and suggesting potential fixes, such as verifying model availability or ensuring proper authentication credentials are provided.