ML Advertising | United States America (US)

Create: 2024-12-17 Update: 2025-07-06 07:06:18

RAG в LLM

Продолжаем тему языковых моделей. Первый пост был по различию базовых и инструктивных LLM. Сегодня рассмотрем понятие RAG.

RAG (Retrieval Augmented Generation) - это подход, который позволяет использовать Большие языковые модели (LLM) для ответов на вопросы пользователей по приватным источникам информации.

Рассмотрим самый базовый вариант RAG:

▶️ Подготовка данных
- Собираем документы
- Разбиваем весь корпус документов на чанки (небольшие кусочки текста). Разбивать можно разными способами, например по предложениям (по символу-разделителю точке), или более хитро по иерархии на большие чанки и внутри них чанки поменьше (сабчанки)
- Каждый чанк кодируем энкодером в вектор. Это нужно, чтобы сравнивать вектор чанка с вектором вопроса пользователя. В качестве энкодера удобно взять уже обученный HuggingFaceEmbeddings
- Пишем все закодированные чанки в векторную БД. Одна из топовых векторных БД Pinecone, заточена специально под хранение эмбеддингов для LLM приложений

▶️ Ответ на вопрос
- Пользователь формулирует вопрос
- С помощью энкодера из первого пункта кодируем текст вопроса в вектор
- Отбираем из всего списка чанков векторной БД топ-N наиболее близких вектору вопроса. Близость векторов меряем косинусным расстоянием
- Все найиденные N чанков и вопрос (и то, и другое уже в виде текста) передаем в LLM и просим ее, учитывая контекст (из чанков), ответить на вопрос пользователя

Идея о RAG возникла из-за трех ограничений:
- Невозможности впихнуть в LLM весь желаемый контекcт
- LLM ничего не знает об информации, которой не было в обучающих данных. Это касается любой приватной или узко-направленной инфы
- Альтернативой RAG может быть дообучение LLM на своих данных, но это крайне нерационально и дорого

Таким образом легким движением руки мы можем без дообучения LLM и энкодера написать свой простой Q&A сервис или чат-бота под свою доменную область. Дальше можно пробовать улучшать сервис, например тюнингом энкодера, иерархией чанков или правками в промпт

#llm

ML Advertising

У меня наконец дошли руки до LLM. Поэтому по мере их изучения буду делиться заметками по теме.

Сегодня начнем с теории: Что такое базовые (base) и инструктивные (instruct) модели?

▶ Базовые LLM
Эти модели обучаются на больших объемах текстовых данных.…

hottg.com/dsinsights/298

1.1K viewsedited Dec 17, 2024 at 21:19

>>Click here to continue<<

ML Advertising

Share with your best friend

What Is Bitcoin?

Bitcoin is a decentralized digital currency that you can buy, sell and exchange directly, without an intermediary like a bank. Bitcoin’s creator, Satoshi Nakamoto, originally described the need for “an electronic payment system based on cryptographic proof instead of trust.” Each and every Bitcoin transaction that’s ever been made exists on a public ledger accessible to everyone, making transactions hard to reverse and difficult to fake. That’s by design: Core to their decentralized nature, Bitcoins aren’t backed by the government or any issuing institution, and there’s nothing to guarantee their value besides the proof baked in the heart of the system. “The reason why it’s worth money is simply because we, as people, decided it has value—same as gold,” says Anton Mozgovoy, co-founder & CEO of digital financial service company Holyheld.

RAG в LLM

ML Advertising TG
Webview: 298
Telegram TG Webview: hottg.com/dsinsights/webview
Telegram TG Channel: ML Advertising
Telegram Updated: 2025-07-06 07:06:18

United States America Popular Telegram Group (US)

Telegram Q&A

Q: How does hottg.com work?

Once you've set up a username, you can give people a hottg.com/username link. Opening that link on their phone will automatically fire up their Telegram app and open a chat with you. You can share username links with friends, write them on business cards or put them up on your website.This way people can contact you on Telegram without knowing your phone number.

With Telegram, you can send messages, photos, videos and files of any type (doc, zip, mp3, etc), as well as create groups for up to 200,000 people or channels for broadcasting to unlimited audiences. You can write to your phone contacts and find people by their usernames. As a result, Telegram is like SMS and email combined — and can take care of all your personal or business messaging needs. In addition to this, we support end-to-end encrypted voice calls.

Q: What is Telegram? What do I do here?

Telegram is a messaging app with a focus on speed and security, it’s super-fast, simple and free. You can use Telegram on all your devices at the same time — your messages sync seamlessly across any number of your phones, tablets or computers.

Q: Who is Telegram for?

Telegram is for everyone who wants fast and reliable messaging and calls. Business users and small teams may like the large groups, usernames, desktop apps and powerful file sharing options. You can appoint admins with advanced tools to help these communities prosper in peace. Public groups can be joined by anyone and are powerful platforms for discussions and collecting feedback.In case you're more into pictures, Telegram has animated gif search, a state of the art photo editor, and an open sticker platform (find some cool stickers here or here). What's more, there is no need to worry about disk space on your device. With Telegram's cloud support and cache management options, Telegram can take up nearly zero space on your phone.

Q: How is Telegram different from WhatsApp?

Unlike WhatsApp, Telegram is a cloud-based messenger with seamless sync. As a result, you can access your messages from several devices at once, including tablets and computers, and share an unlimited number of photos, videos and files (doc, zip, mp3, etc.) of up to 2 GB each. And if you don't want to store all that data on your device, you can always keep it in the cloud.Thanks to our multi-data center infrastructure and encryption, Telegram is faster and way more secure. On top of that, Telegram is free and will stay free — no ads, no subscription fees, forever.

Q: Can I make calls via Telegram?

Yes! Voice calls are currently available to users around the world.

Many modern travelers appear to struggle with managing various aspects of their finances simultaneously while abroad, such as banking, budgeting, investing, trading, and saving. It is important to have apps installed on the device that will help you carry out these necessary tasks.

Hot Topic in US