国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Technology peripherals AI The first open source model to surpass GPT4o level! Llama 3.1 leaked: 405 billion parameters, download links and model cards are available

The first open source model to surpass GPT4o level! Llama 3.1 leaked: 405 billion parameters, download links and model cards are available

Jul 23, 2024 pm 08:51 PM
meta industry

Get your GPU ready!


Llama 3.1 finally appeared, but the source is not Meta official.

Today, news of the leak of the new Llama large model went viral on Reddit. In addition to the base model, it also includes benchmark results of 8B, 70B and the maximum parameter of 405B.

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

The picture below shows the comparison results of each version of Llama 3.1 with OpenAI GPT-4o and Llama 3 8B/70B. As you can see, even the 70B version surpasses GPT-4o on multiple benchmarks.

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????, the 8B and 70B models of version 3.1 are distilled from 405B, so compared to The previous generation had significant performance improvements.

Some netizens said that this is the first time that an open source model has surpassed closed source models such as GPT4o and Claude Sonnet 3.5 and reached SOTA
on multiple benchmarks.

At the same time, the model card of Llama 3.1 leaked and the details were leaked (the date marked in the model card indicates that it is based on the July 23rd release).

Someone summarized the following highlights: 首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了


The model uses 15T+ tokens from public sources for training, and the pre-training data deadline is December 2023;

Fine-tuning data includes public Available instruction fine-tuning dataset (unlike Llama 3) and 15 million synthetic samples;
  • Model supports multiple languages, including English, French, German, Hindi, Italian, Portuguese, Spanish and Thai.
  • ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Although the leaked Github link is currently 404, some netizens have given download links ( However, for the sake of safety, it is recommended to wait for the official channel announcement tonight): 首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

But this is a 100 billion-level model after all, please prepare enough hard disk space before downloading:

The following is the Llama 3.1 model Important content in the card:

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

Basic model information

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

Meta Llama 3.1 Multilingual Large Language Model (LLM) collection is a set of pre-trained and instruction fine-tuned generative models, each 8B in size , 70B and 405B (text input/text output). Llama 3.1 command-fine-tuned text-only models (8B, 70B, 405B) are optimized for multilingual conversation use cases and outperform many available open and closed source chat models on common industry benchmarks.

Model architecture: Llama 3.1 is an optimized Transformer architecture autoregressive language model. The fine-tuned version uses SFT and RLHF to align usability and security preferences.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
It can be inferred from the model card information that the context length of the
Llama 3.1 series model is 128k
. All model versions use Grouped Query Attention (GQA) to improve inference scalability.

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

INTENDED USE

INTENDED USE CASE. Llama 3.1 is intended for multilingual business applications and research. Instruction-tuned text-only models are suitable for assistant-like chat, while pre-trained models can be adapted to a variety of natural language generation tasks.

The Llama 3.1 model set also supports the ability to leverage its model output to improve other models, including synthetic data generation and distillation. The Llama 3.1 Community License allows these use cases.

Llama 3.1 trains on a wider set of languages ??than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages ??other than the 8 supported languages, provided they comply with the Llama 3.1 Community License Agreement and Acceptable Use Policy, and are responsible in such cases for ensuring that other languages ??are used in a safe and responsible manner Language Llama 3.1.

Software and hardware infrastructure
The first is the training element. Llama 3.1 uses a custom training library, Meta-customized GPU cluster and production infrastructure for pre-training, and is also fine-tuned on the production infrastructure. , annotation and evaluation.

The second is the training energy consumption. Llama 3.1 training uses a total of 39.3 M GPU hours of calculation on H100-80GB (TDP is 700W) type hardware. Here training time is the total GPU time required to train each model, and power consumption is the peak power capacity of each GPU device, adjusted for power efficiency.

Training on greenhouse gas emissions. Total greenhouse gas emissions during the Llama 3.1 training period based on a geographical baseline are estimated at 11,390 tonnes of CO2e. Since 2020, Meta has maintained net-zero greenhouse gas emissions across its global operations and matched 100% of its electricity use with renewable energy, resulting in total market-based greenhouse gas emissions of 0 tonnes of CO2e during the training period .

The methods used to determine training energy use and greenhouse gas emissions can be found in the following paper. Because Meta releases these models publicly, others do not need to bear the burden of training energy usage and greenhouse gas emissions.

Paper address: https://arxiv.org/pdf/2204.05149

Training data
Overview: Llama 3.1 was conducted using approximately 1.5 trillion token data from public sources. Pre-training. Fine-tuning data includes publicly available instruction datasets, and over 25 million synthetically generated examples.
Data freshness: The deadline for pre-training data is December 2023.

Benchmark score

In this section, Meta reports the scoring results of the Llama 3.1 model on the annotation benchmark. For all evaluations, Meta uses internal evaluation libraries.

首個(gè)超越GPT4o級(jí)開(kāi)源模型!Llama 3.1泄密:4050億參數(shù),下載鏈接、模型卡都有了

安全風(fēng)險(xiǎn)考量

Llama 研究團(tuán)隊(duì)致力於為研究界提供寶貴的資源來(lái)研究安全微調(diào)的穩(wěn)健性,並為開(kāi)發(fā)人員提供適用於各種應(yīng)用的安全且強(qiáng)大的現(xiàn)成模型,以減少部署安全人工智慧系統(tǒng)的開(kāi)發(fā)人員的工作量。
?
研究團(tuán)隊(duì)採(cǎi)用多方面資料收集方法,將供應(yīng)商的人工產(chǎn)生資料與合成資料結(jié)合,以減輕潛在的安全風(fēng)險(xiǎn)。研究團(tuán)隊(duì)開(kāi)發(fā)了許多基於大型語(yǔ)言模型 (LLM) 的分類(lèi)器,以深思熟慮地選擇高品質(zhì)的 prompt 和回應(yīng),從而增強(qiáng)資料品質(zhì)控制。
?
值得一提的是,Llama 3.1 非常重視模型拒絕良性 prompt 以及拒絕語(yǔ)氣。研究團(tuán)隊(duì)在安全資料策略中引入了邊界 prompt 和對(duì)抗性 prompt,並修改了安全資料回應(yīng)以遵循語(yǔ)氣指南。?

Llama 3.1 模型並非設(shè)計(jì)為單獨(dú)部署,而是應(yīng)作為整個(gè)人工智慧系統(tǒng)的一部分進(jìn)行部署,並根據(jù)需要提供額外的「安全護(hù)欄」。開(kāi)發(fā)人員在建置智能體系統(tǒng)時(shí)應(yīng)部署系統(tǒng)安全措施。

請(qǐng)注意,該版本引入了新功能,包括更長(zhǎng)的上下文視窗、多語(yǔ)言輸入和輸出,以及開(kāi)發(fā)人員與第三方工具的可能整合。使用這些新功能進(jìn)行建置時(shí),除了需要考慮一般適用於所有生成式人工智慧用例的最佳實(shí)踐外,還需要特別注意以下問(wèn)題:?

工具使用:與標(biāo)準(zhǔn)軟體開(kāi)發(fā)一樣,由開(kāi)發(fā)人員負(fù)責(zé)將LLM 與他們選擇的工具和服務(wù)整合。他們應(yīng)為自己的使用案例制定明確的政策,並評(píng)估所使用的第三方服務(wù)的完整性,以了解使用此功能時(shí)的安全和安保限制。

多語(yǔ)言:Lama 3.1 除英語(yǔ)外還支援 7 種語(yǔ)言:法語(yǔ)、德語(yǔ)、印地語(yǔ)、義大利語(yǔ)、葡萄牙語(yǔ)、西班牙語(yǔ)和泰語(yǔ)。 Llama 可能可以輸出其他語(yǔ)言的文本,但這些文本可能不符合安全性和幫助性表現(xiàn)閾值。

Llama 3.1 的核心價(jià)值是開(kāi)放、包容和樂(lè)於助人。它旨在服務(wù)每個(gè)人,並適用於各種使用情況。因此,Llama 3.1 的設(shè)計(jì)宗旨是讓不同背景、經(jīng)驗(yàn)和觀(guān)點(diǎn)的人都能使用。 Llama 3.1 以使用者及其需求為本,沒(méi)有插入不必要的評(píng)判或規(guī)範(fàn),同時(shí)也反映了這樣一種認(rèn)識(shí),即即使在某些情況下看似有問(wèn)題的內(nèi)容,在其他情況下也能達(dá)到有價(jià)值的目的。 Llama 3.1 尊重所有使用者的尊嚴(yán)和自主權(quán),特別是尊重為創(chuàng)新和進(jìn)步提供動(dòng)力的自由思想和表達(dá)價(jià)值。
?
但 Llama 3.1 是一項(xiàng)新技術(shù),與任何新技術(shù)一樣,其使用也存在風(fēng)險(xiǎn)。迄今為止進(jìn)行的測(cè)試尚未涵蓋也不可能涵蓋所有情況。因此,與所有 LLM 一樣,Llama 3.1 的潛在輸出無(wú)法事先預(yù)測(cè),在某些情況下,模型可能會(huì)對(duì)使用者提示做出不準(zhǔn)確、有偏差或其他令人反感的反應(yīng)。因此,在部署 Llama 3.1 模型的任何應(yīng)用之前,開(kāi)發(fā)人員應(yīng)針對(duì)模型的特定應(yīng)用進(jìn)行安全測(cè)試和微調(diào)。

模型卡來(lái)源:https://pastebin.com/9jGkYbXY
參考資訊:https://x.com/op74185001874720387418520374185203743720372727203838372370383838383838
https: //x.com/iScienceLuvr/status/1815519917715730702
https://x.com/mattshumer_/status/1815444612414087294

The above is the detailed content of The first open source model to surpass GPT4o level! Llama 3.1 leaked: 405 billion parameters, download links and model cards are available. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

PHP Tutorial
1502
276
DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

New affordable Meta Quest 3S VR headset appears on FCC, suggesting imminent launch New affordable Meta Quest 3S VR headset appears on FCC, suggesting imminent launch Sep 04, 2024 am 06:51 AM

The Meta Connect 2024event is set for September 25 to 26, and in this event, the company is expected to unveil a new affordable virtual reality headset. Rumored to be the Meta Quest 3S, the VR headset has seemingly appeared on FCC listing. This sugge

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

See all articles