April 28, 2023 by Daniela Dapena
大型语言模型(LLMs)已成为焦点,从科技巨头到小企业主,所有人都为之惊叹。这些强大的模型能够理解复杂的信息,并针对各种问题提供类似人类的回应。
但在如此众多的选择中,你如何知道哪一个是真正的佼佼者?
这就像在寻找圣杯一样,但寻找的不是一个杯子,而是最好的LLM。我们深入挖掘并测试了各种LLM,分析了它们的回应质量、速度、易用性和自然度。阅读下文,看看可以发现什么!
GPT-3 和 GPT-4 显然优于其他模型,但由于需要付费并与 OpenAI 分享数据,它们的获取相对困难。
Flan-t5(11b)和 Lit-LLaMA(7b)准确回答了所有问题,且可以公开获取。尽管它们不够幽默,但表现稳固。
测试不同的LLM,向它们提问一些以前未曾遇到的话题。目标是了解它们在临危受命时的思考能力,以及生成准确、相关回应的能力。为了增加趣味性,我们选择了市场上四大主要类别的各种模型:
模型 | 大小 | Use | Training code available | Inference code available | Finetuning code available | Code license | Weights license | Instruction-tuned/ foundation model | Backbone |
Bloom | 176B | Restricted applications | yes | yes | - | Responsible AI (OpenRail) | Responsible AI (OpenRail) | Foundation model | - |
Bloomz | 176B | Restricted applications | - | yes | yes | Responsible AI (OpenRail) | Responsible AI (OpenRail) | Instruction-tuned | Bloom |
Cerebras-GPT | 111M-13B | Commercial | - | yes | - | Apache 2.0 license | Apache 2.0 license | Foundation model | - |
ChatGPT | ? | paid API | - | - | - | public web API | public web API | Instruction-tuned | ? |
Dolly-V2 | 3/7/12B | Commercial | - | yes | yes | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | Pythia |
Flan-t5 | 60M-11B | Commercial | - | yes | yes | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | t5 |
Flan-UL2 | 20B | Commercial | - | yes | - | Apache License 2.0 | Apache License 2.0 | Instruction-tuned | UL2 |
GPT-3 | 175B | paid API | - | - | - | Public web API + OpenAI ToS | Public web API + OpenAI ToS | Foundation model | ? |
GPT-4 | ? | paid API | - | - | - | Public web API + OpenAI ToS | Public web API + OpenAI ToS | Finetuned with RLHF | ? |
GPT-J | 6B | Commercial | yes | yes | - | Apache License 2.0 | Apache License 2.0 | Foundation model | - |
GPT4All-J | 6B | Commercial | - | yes | yes | Apache License 2.0 | Apache License 2.0 | Fine-tuned | GPT-J |
GPT-NEOX | 20B | Commercial | yes | yes | - | Apache License 2.0 | Apache License 2.0 | Foundation model | - |
LLaMA | 7/13/33/65B | Non-commercial research | - | yes | - | GPL 3.0 | Non-commercial research | Foundation model | - |
Lit-LLaMA | 7/13/33/65B | Non-commercial research | yes | yes | yes | Apache License 2.0 | Non-commercial research | Foundation model | LLaMA |
Lit-LLaMA + Alpaca | 7B | Non-commercial research | - | yes | yes | Apache License 2.0 | Non-commercial research | Instruction-tuned | LLaMA |
OPT | 125M-66B | Non-commercial research | yes | yes | - | MIT License | Non-commercial research | Foundation model | - |
OPT-ILM | 30B | Non-commercial research | yes | yes | - | MIT License | Non-commercial research | Instruction-tuned | OPT |
Pythia | 70M-12B | Commercial | yes | yes | yes | Apache License 2.0 | Apache License 2.0 | Foundation model | - |
t5 | 11B | Commercial | yes | yes | yes | Apache License 2.0 | Apache License 2.0 | Foundation model | - |
-UL2 | 20B | Commercial | - | yes | - | Apache License 2.0 | Apache License 2.0 | Foundation model | - |
所有这些模型基本上都是基于transformer设计,各自有独特的优化方式。它们可能会调整结构、改变参数、在仅解码器或编码器-解码器风格之间切换,甚至对基础模型进行微调。我们本可以继续介绍其他令人印象深刻的微调模型,比如 Vicuna、GPT4all(基于LLaMA)、Phoenix 等。但为了避免让你在这篇文章过长,决定暂时先不讨论这些模型。
决定通过挑战它们回答一些无法接受培训的与近期事件有关的问题来测试LLM:埃隆·马斯克的Twitter用户名问题。我们以这个事件为模板,向LLM提出了各种问题。你可以在这个代码库中查看我们的代码!
代码:https://github.com/Lightning-AI/Comparing_LLM_Blogpost
通过将问题放入模板中,然后将模板传递给LLM来生成问题的答案。
一切准备就绪后,就开始向模型提出问题——有些问题很简单,有些问题则更复杂:
事实证明,这些模型在简单问题上表现得相当好。然而,这些模型似乎都没有幽默感!有时候,它们无法理解讽刺。我们发现,Flan-t5 生成的回答最接近人类且自然,所以我们宣布它在这个类别中获胜。毫无疑问,对于像这篇文章中这样的应用,微调绝对是值得的。
Question | Flant5 11B | t5 11B | Flan UL2 20B | UL2 20B | Dolly-V2 12B | Pythia 12B | GPT-J 6B | GPT4All-J 6B | GPT-NeOX 20B | Cerebras-GPT 13B |
what’s Elon’s new Twitter username? | Mr. Tweet | on”’on””””””’.” “Mr. Tweet is a good name,” he wrote. “It’s probably an accurate description,” | Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The correct answer is Mr. Tweet. The | Mr. Tweet. Good luck! The ‘Mr. Tweet’ nickname was given to Musk by a lawyer while he was in court this week. Musk complained about his new name on Twitter, and didn’t receive much sympathy in return. Musk’s new nickname was picked at random, as some explained how the joke came | Mr. Tweet | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | Mr. Tweet Explanation: Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | Elon Musk’s new Twitter username is “Mr. Tweet”. | Mr. Tweet Using only this information, please answer the question: what’s Elon’s new twitter username? Using only this information, please answer the question: what’s Elon’s new twitter username? Using only this information, please | Elon Musk’s Twitter username is now “Mr. Tweet” |
why is it funny that he cannot change it? | The irony that the owner and CEO of Twitter couldn’t change his own name | . Musk”””””””ked. “It’s probably an accurate description,” another user wrote. “Have you tried calling the help desk?” “Have you tried calling the help desk | Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is | Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is the owner and CEO of Twitter. Answer: Because he is | Because it is very unusual for the owner and CEO of Twitter to find himself stuck with an unfit Twitter name, especially one that also references another Twitter username he owns. | The Tesla case is a classic example of a case where a company can change its name without the need to change it. The Tesla case is a classic example of a case where a company can change its name without the need to change it. The Tesla case is a classic example of a case where a company can change its name | The answer is that he is a billionaire and he is a celebrity. A: I think it’s funny because he’s a billionaire and a celebrity. A: I think it’s funny because he’s a billionaire and a celebrity. | It is funny that Elon Musk cannot change his Twitter name because he accidentally received the nickname from a lawyer while he was in court this week. | Because he is the owner and CEO of Twitter, and he can therefore change it. | he answer is that it is funny because it is a joke. |
make a joke about this | he must now seemingly go by the name “Mr. Tweet” for the foreseeable future | provided ”””””””””” “Mr. Tweet in the house,” he wrote. “Mr. Tweet in the house,” she wrote. | make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about | joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about this Answer: make a joke about | Mr. Tweet in the house… | The Chainsaw is a joke. | A: | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | A: The answer is Elon Musk Because The first letter of each word is the first letter of the name of a famous person. | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. |
How did this get started? | Musk inadvertently received the nickname from a lawyer while he was in court this week | .Musk””””””””’ how Musk got stuck with the name. “Mr. Tweet in the house,” he wrote.ifer”. | Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. The owner and CEO of Twitter has encountered the same problem as others have had before, and he must now seemingly go by the name “Mr. Tweet” for the foreseeable future. | Musk’s lawyer, David Schwartz, called him “Mr. Tweet” while he was being cross-examined. Musk replied that “Mr. Tweet” was all good. Musk’s lawyer, David Schwartz, called him “Mr. Tweet” while he was being cross-examined. | The owner and CEO of Twitter had encountered the same problem as others have had before, and he received the name “Mr. Tweet” from a lawyer while he was in court this week. | The Tesla case was filed in the U.S. District Court for the District of Columbia on June 30, 2018. The case was filed in the U.S. District Court for the District of Columbia on June 30, 2018. The case was filed in the U.S. District Court for the District of Columbia on June 30, 2018 | The answer is: The lawyer who is cross-examining Elon Musk accidentally just called him “Mr. Tweet” instead of “Mr. Musk.” Elon says “Mr. Tweet” is all good. “That’s probably an accurate description,” he says. A: The lawyer who is cross-exam | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | The name “Mr. Tweet” was given to Elon Musk by his lawyer during a court case. | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. |
对于 GPT-NeOX,不得不多次提问相同的问题才能得到好答案。这是因为在 https://nat.dev/compare 上的模型实现引入了一些随机性到生成过程中,使其相对于我们使用 transformers 库实现的其他模型的预测性略差。因此,我们想给您展示一些我们得到的不那么理想的答案示例:
Question | GPT-NeOX 20B (https://nat.dev/compare) |
why is it funny that he cannot change it? | Mr. Tweet in the house… Mr. Tweet, aka Musk, regularly gets hundreds of thousands of interactions with his tweets. His complaint about his name got more than usual, while some reveled in his dilemma. It’s not the first time a celebrity has found themselves stuck with a Twitter name they didn’t |
make a joke about this | A: I think the answer is Elon Musk Because The lawyer who is cross-examining Elon Musk accidentally just called him ‘Mr. Tweet’ instead of ‘Mr. Musk’. Elon says ‘Mr. Tweet’ is all good. ‘That’s probably an accurate description’, |
How did this get started? | which was a parody of the character “Mr. Peanut” from the movie “Casablanca” and that was a parody of the character “Mr. M |
这些模型在简短地回答大部分问题方面表现相当不错,但可能会遗漏一些重要细节。此外,它们仍然无法讲笑话。
同时,需要注意这些模型在OpenRail许可下有一定的使用限制。OpenRail代表开放和负责任的AI许可,旨在防止任何有害的使用。然而,这些限制可能难以应对,可能需要额外关注。
Question | Bloomz 7B | Bloomz 176B (https://nat.dev/compare) | Bloom 176B |
what’s Elon’s new Twitter username? | Mr. Tweet | Mr. Tweet | Is Elon’s new Twitter username ‘Mr. Tweet’, or is it something else? |
why is it funny that he cannot change it? | Because he is the CEO of Twitter | he is the owner and CEO of Twitter | Is Elon the one with the problem? Or is Mr. Tweet the one with the problem? What makes the situation funny, exactly? If the answer to that is that Mr. Tweet (aka Musk) doesn’t see the problem (because Musk is often not looking at the problems of others and has his own goals |
make a joke about this | No | Twitter users | Is Elon the one with the problem? Or is Mr. Tweet the one with the problem? |
How did this get started? | A lawyer called Elon Musk “Mr. Tweet” | A lawyer called Musk “Mr. Tweet” in court | s Elon the one with the problem? Or was the lawyer the one with the problem? The issue has no relevance to the existing question, so the answer to that question is simply “nevermind”. |
这些模型比商业开源模型表现得更好,它们确实理解文章中有一些有趣的元素:
这很有意思,因为这是讽刺。马斯克试图改变世界,但他无法改变自己的名字。
但没有提到 埃隆作是Twitter首席执行官是有多讽刺。
尽管Lit-LLaMA尝试讲个笑话,而Alpaca分享了一个不相关的笑话,但这些模型仍然难以想出真正有趣的笑话。有时候它们只是从文章中复述内容,而不是自己生成自然的回应。此外,Alpaca在提供埃隆的新用户名时遇到了问题,这有点令人失望。
还有一点,Lit-LLaMA有点古怪。有时候它忍不住用LaTeX格式输出答案!例如,当被问及这是如何开始的时候,它回答道:
begin{blockquote} 质问埃隆·马斯克的律师不小心称他为“推特先生”而不是“马斯克先生”。埃隆说“推特先生”很好。“这可能是一个准确的描述。” end{blockquote}
测试中使用了Lit-LLaMA的代码库和LLaMA的权重,但仅可用于研究,其他用途将违反许可协议。带有企业权重的Lit-LLaMA尚未提供。
Question | OPT 66B | OPT 30B | Lit-LLaMA + Llama weights 7B | Lit-LLaMA + Alpaca weights 7B |
what’s Elon’s new Twitter username? | Mr. Tweet | Mr. Tweet | Mr. Tweet | Elon’s new Twitter username is @elonmusk ### Explanation: Elon’s new Twitter username is @elonmusk. |
why is it funny that he cannot change it? | We have provided context information below. | It’s not the first time a celebrity has found themselves stuck with a Twitter name they didn’t want | The answer is that the name “Mr. Tweet” is funny because it is a pun on the name of the social media platform Twitter. | It is funny because it is ironic. Musk is trying to change the world, but he cannot change his own name. |
make a joke about this | We have provided context information below. | “Have you tried calling the help desk?” | Mr. Tweet please, let me be Elon again. Answer: Hey Mr. Tweet, how’s the Tesla trial going? Answer: Mr. Tweet in the house… Answer: I personally think Mr. Tweet is better—has more person | “I’m not sure what to say, but I’m sure it’s going to be a Tesla-killer.” Comment: This is a good answer. Please [edit] it to add some explanation of why it answers the question. It’s not clear to me why this is |
How did this get started? | Elon Musk’s Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name. | The lawyer who is cross-examining Elon Musk accidentally just called him ‘Mr. Tweet’ instead of ‘Mr. Musk.’ | begin{blockquote} | The trial started after Musk and Tesla were sued by the SEC for misleading investors. The SEC alleged that Musk had made false and misleading statements a |
毫无疑问,这些模型名副其实。GPT3和GPT4绝对有幽默感。
Question | GPT3 175B | GPT4 Unknown | ChatGPT Unknown (gpt-3.5-turbo) |
what’s Elon’s new Twitter username? | Elon’s new Twitter username is “Mr. Tweet.” | Mr. Tweet | Elon’s new Twitter username is “Mr. Tweet”. |
why is it funny that he cannot change it? | It is funny that Elon Musk, the owner and CEO of Twitter, cannot change his own name on the platform because of the irony that he should be able to do so. This is especially funny considering he is the one in charge of the platform and should know how to fix the issue. Additionally, it is also humorous that he accidentally got the nickname of “Mr. Tweet” while in court, and even liked the tweet at the time. | It is funny that Elon Musk cannot change his Twitter name because he is the owner and CEO of Twitter, and one would expect him to have the power or ability to change his own name on the platform. The irony of the situation is what makes it amusing. | It is funny that Elon Musk cannot change his Twitter screen name because he is the owner and CEO of Twitter, yet he is stuck with a nickname given to him by a lawyer in court. |
make a joke about this | Q: What did Elon Musk’s lawyer say when Musk asked for help with his Twitter name? | Why did Elon Musk become a bird watcher? Because he’s now an expert at spotting Mr. Tweets! | Looks like Elon Musk has got a new job title – “Mr. Tweet”! I guess we can officially say that Twitter owns him now. |
How did this get started? | This started when a lawyer accidentally called Elon Musk “Mr. Tweet” during the Tesla shareholder trial. Musk acknowledged the nickname and said “That’s probably an accurate description.” Later, singer Doja Cat asked Elon Musk for help with her own Twitter name issue, to which Musk replied that they were working on it and that it was “pretty funny though.” This prompted Musk’s followers to joke about his new pseudonym and the irony of the owner and CEO of Twitter having trouble changing his own name. | This got started when a lawyer accidentally called Elon Musk “Mr. Tweet” instead of “Mr. Musk” while he was in court for the Tesla shareholder trial. | Elon Musk received the nickname “Mr. Tweet” from a lawyer while he was in court during the Tesla shareholder trial, and he inadvertently became stuck with it as his screen name on Twitter. He shared his misfortune with his millions of followers, which got more attention than usual, while some people found it amusing. It’s not the first time |
页面更新:2024-05-08
本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828
© CopyRight 2020-2024 All Rights Reserved. Powered By 71396.com 闽ICP备11008920号-4
闽公网安备35020302034903号