gonzo-обзоры ML статей | United States America (US)

Create: 2024-07-21 Update: 2025-07-04 06:42:00

Transformer Layers as Painters
Qi Sun, Marc Pickett, Aakash Kumar Nain, Llion Jones
Статья: https://arxiv.org/abs/2407.09298
Twitter: https://x.com/A_K_Nain/status/1812684597248831912
Код: https://github.com/floatingbigcat/transformer-as-painter (пока нет)

Вторая работа про творческий подход к вычислению трансформерных слоёв. Первая была про LayerShuffle (https://hottg.com/gonzo_ML/2845), в текущей работе от коллег GDE подход более радикальный, чем просто перемешивание слоёв.

Авторы предлагают интересную метафору для средних слоёв сети — конвейер художников. Входное полотно проходит через цепочку художников. Некоторые из них специализируются на рисовании птиц, другие на рисовании колёс. Получая полотно от предыдущего художника, каждый решает дорисовать ли чего где или отправить дальше без изменений. У всех художников общий словарь для понимания рисунков, так что не страшно, если художник получит на вход полотно от более раннего чем обычно художника. Их также можно поменять местами, большой катастрофы не произойдёт (даже если части фона будут дорисованы поверх уже нарисованных объектов). Художники даже могут рисовать одновременно впараллель.

Аналогия не претендует на научную точность, но помогает поднять кучу интересных вопросов: Используют ли слои общее пространство репрезентаций? Все ли слои необходимы? Вычисляют ли все средние слои одну и ту же функцию? Важен ли порядок слоёв? Можно ли вычислять слои впараллель? Более ли важен порядок вычисления для одних задач чем для других? Помогают ли циклы параллельно вычисляемым слоям? Какие варианты наименее вредят качеству?

Хорошие вопросы. Предыдущие работы типа LayerDrop и LayerShuffle покрывают лишь небольшую часть из этого.

LayerDrop проверяли на ViT, текущая работа сделана на предобученных LLM (декодерная Llama2 7B с 32 слоями и энкодерный BERT-Large 340M с 24 слоями), никакого файнтюнинга (кроме оценки на GLUE, где подразумевается файнтюнинг для берта). Для ламы проверяли на ARC (который от AI2, https://arxiv.org/abs/1803.05457, а не от Франсуа Шолле, https://github.com/fchollet/ARC-AGI), HellaSwag, GSM8K, WinoGrande, LAMBADA. Для берта вышеупомянутый GLUE.

Если посмотреть на результирующее качество при выкидывании слоя или при перестановке слоя с соседним, то оказывается, что начальный и конечный слои важны, а между ними всё более-менее устойчиво к таким манипуляциям. Выглядит так, что срединные слои используют общее пространство репрезентаций. Если дополнительно посмотреть на косинусную близость активаций слоёв внутри модели, то групп слоёв может даже больше: входной слой 0, слои 1-3, средние слои, последний или пара последних. Возможно у модели три разных пространства репрезентаций: для начальных, средних и конечных слоёв.

Если пропустить M слоёв (выкидываем слои из середины от N+1 до T-N, где T — это общее число слоёв в модели), то качество постепенно деградирует по мере увеличения M. Получается, что не все промежуточные слои необходимы, по крайней мере несколько средних слоёв могут быть выкинуты без катастрофической деградации.

Если в предыдущем эксперименте вместо выкидывания средних слоёв заменить их на копию центрального слоя, то результат получается намного хуже, чем если бы выкинуть. Это наиболее сильная деградация среди всех рассмотренных в работе. Отсюда вывод, что средние слои реализуют разные функции. В приложении дополнительно смотрят на косинусные близости и статистики активаций и делают вывод, что повторение среднего слоя выталкивает вход из общего пространства репрезентаций. Если художник рисует колёса, то рисование большего числа колёс на полотне приводит к тому, что художники после него будут вынуждены работать с тем, на чём не обучались.

gonzo-обзоры ML статей

hottg.com/gonzo_ML/2865

3.7K viewsJul 21, 2024 at 23:33

>>Click here to continue<<

gonzo-обзоры ML статей

Share with your best friend

How to Invest in Bitcoin?

Like a stock, you can buy and hold Bitcoin as an investment. You can even now do so in special retirement accounts called Bitcoin IRAs. No matter where you choose to hold your Bitcoin, people’s philosophies on how to invest it vary: Some buy and hold long term, some buy and aim to sell after a price rally, and others bet on its price decreasing. Bitcoin’s price over time has experienced big price swings, going as low as $5,165 and as high as $28,990 in 2020 alone. “I think in some places, people might be using Bitcoin to pay for things, but the truth is that it’s an asset that looks like it’s going to be increasing in value relatively quickly for some time,” Marquez says. “So why would you sell something that’s going to be worth so much more next year than it is today? The majority of people that hold it are long-term investors.”

Transformer Layers as Painters

gonzo-обзоры ML статей TG
Webview: 2865
Telegram TG Webview: hottg.com/gonzo_ML/webview
Telegram TG Channel: gonzo-обзоры ML статей
Telegram Updated: 2025-07-04 06:42:00

United States America Popular Telegram Group (US)

Telegram Q&A

Q: How does hottg.com work?

Once you've set up a username, you can give people a hottg.com/username link. Opening that link on their phone will automatically fire up their Telegram app and open a chat with you. You can share username links with friends, write them on business cards or put them up on your website.This way people can contact you on Telegram without knowing your phone number.

With Telegram, you can send messages, photos, videos and files of any type (doc, zip, mp3, etc), as well as create groups for up to 200,000 people or channels for broadcasting to unlimited audiences. You can write to your phone contacts and find people by their usernames. As a result, Telegram is like SMS and email combined — and can take care of all your personal or business messaging needs. In addition to this, we support end-to-end encrypted voice calls.

Q: What is Telegram? What do I do here?

Telegram is a messaging app with a focus on speed and security, it’s super-fast, simple and free. You can use Telegram on all your devices at the same time — your messages sync seamlessly across any number of your phones, tablets or computers.

Q: Who is Telegram for?

Telegram is for everyone who wants fast and reliable messaging and calls. Business users and small teams may like the large groups, usernames, desktop apps and powerful file sharing options. You can appoint admins with advanced tools to help these communities prosper in peace. Public groups can be joined by anyone and are powerful platforms for discussions and collecting feedback.In case you're more into pictures, Telegram has animated gif search, a state of the art photo editor, and an open sticker platform (find some cool stickers here or here). What's more, there is no need to worry about disk space on your device. With Telegram's cloud support and cache management options, Telegram can take up nearly zero space on your phone.

Q: How is Telegram different from WhatsApp?

Unlike WhatsApp, Telegram is a cloud-based messenger with seamless sync. As a result, you can access your messages from several devices at once, including tablets and computers, and share an unlimited number of photos, videos and files (doc, zip, mp3, etc.) of up to 2 GB each. And if you don't want to store all that data on your device, you can always keep it in the cloud.Thanks to our multi-data center infrastructure and encryption, Telegram is faster and way more secure. On top of that, Telegram is free and will stay free — no ads, no subscription fees, forever.

Q: Can I make calls via Telegram?

Yes! Voice calls are currently available to users around the world.

Many modern travelers appear to struggle with managing various aspects of their finances simultaneously while abroad, such as banking, budgeting, investing, trading, and saving. It is important to have apps installed on the device that will help you carry out these necessary tasks.

Hot Topic in US