Now free, 10 times faster than Gemini, 18 times faster than GPT-4!

Groq: https://groq.com/

Groq is an AI chip that is 100% completed by a beautiful domestic supply chain. It can basically be said to be the fastest AI chip in the world at present.

The Mixtral8x7B-32k and Llama 270B-4k, which run on this chip, can output 500 tokens per second, which is 10 times faster than Gemini and 18 times faster than GPT-4!

Official website introduction:
Created the LPU™ inference engine, which is the first and fastest in its class, serving the real-time AI market. Our inference (not training) solution makes us a leader in AI performance in terms of speed and accuracy. Unlike other providers, we do not act as agents for cloud services. We build our own chips, compilers and software, systems, and GroqCloud™. Our first-generation GroqChip™ is a Language Processing Unit™ (LPU), a new category of processors. This is part of our secret. Our performance gives artificial intelligence greater potential in multiple industries. This is a solution for real-time AI, low latency, and low batch sizes.

Invocation method:
API application address: https://console.groq.com/keys

const Groq = require('groq-sdk');

const groq = new Groq();
async function main() {
  const chatCompletion = await groq.chat.completions.create({
    "messages": [
      {
        "role": "system",
        "content": "Please answer my question in Chinese"
      },
      {
        "role": "user",
        "content": "Which model are you, and what are you waiting for?"
      },
      {
        "role": "assistant",
        "content": "Hi! I am an artificial intelligence assistant, aiming to answer your questions, address your concerns, and provide information on various topics. I can help you with mathematical calculations, provide encyclopedic knowledge, and perform various types of language processing tasks, such as information retrieval, language adjustment, and even poetry composition. Is there anything I can do for you?"
      }
    ],
    "model": "mixtral-8x7b-32768",
    "temperature": 0.5,
    "max_tokens": 1024,
    "top_p": 1,
    "stream": true,
    "stop": null
  });

  for await (const chunk of chatCompletion) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

main();

Supported models:
However, there are not many supported models at present, only three, but they are all available for free trial:

LLaMA2-70b
Mixtral-8x7b
Gemma-7b-it
Quickly apply for a few keys using the API key application address above.

The QPS is quite sufficient:
• 30 requests per minute (RPM)
• 14,400 requests per day (RPD)
• 40,000 tokens per minute (TPM)