4

We know that AI is rapidly growing. do we have any large language models (LLMs) to process images, pdf documents directly (fine-tune approach) for text generation tasks?

Tovlk
  • 43
  • 5

1 Answers1

5

Yes, there are open multimodal LLMs that you can fine-tune yourself, like LlaVa, NextGPT, IDEFICS or SPHINX.

Closed multimodal LLMs like GPT-4v don't offer a way to fine-tune them yet.

noe
  • 28,203
  • 1
  • 49
  • 83