Want to Become a Sponsor? Contact Us Now!🎉

TheBloke's Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ Model

TheBloke's Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ: A Powerful Fusion of State-of-the-Art Language Models

Published on

TheBloke has released a powerful 13B parameter language model that combines several state-of-the-art models into one. This model, named Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ, demonstrates impressive performance on a range of cognitive and technical benchmarks. It is available in GPTQ format, allowing efficient inference on consumer hardware.

Anakin AI - The Ultimate No-Code AI App Builder


Large language models (LLMs) have revolutionized natural language processing in recent years. By training on vast amounts of text data, these models develop a deep understanding of language and can perform complex language tasks. Combining different LLMs that excel in specific areas has emerged as a promising approach to create even more capable models.

TheBloke, a prominent figure in the AI community, has taken this approach with the release of the Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model[1]. This model merges several cutting-edge LLMs, including:

  • Meta AI's LLaMA 2
  • Anthropic's Hermes
  • OpenOrca's Orca
  • Garage-bAInd's Open-Platypus
  • WizardLM

The result is a highly capable 13 billion parameter model that demonstrates strong performance across a variety of benchmarks and real-world use cases. Importantly, the model is available in GPTQ format, a novel quantization approach that allows running the model efficiently on consumer GPUs with minimal quality loss[1].

Model Details

The Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model leverages the strengths of each component model:

  • LLaMA 2 provides a strong foundation as a general-purpose language model pre-trained on a huge corpus of text data[2].
  • Hermes contributes advanced conversational abilities and factual knowledge.
  • Orca and Open-Platypus bring enhanced reasoning and problem-solving skills.
  • WizardLM adds improved instruction-following and task-completion capabilities.

By combining these models, the Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model inherits a broad set of skills that allow it to handle diverse language tasks at a high level of proficiency.

The model is available in different GPTQ formats, trading off between model size and performance[1]:

NameQuant methodBitsSize
speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q2_K.ggufQ2_K25.43 GB
speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q3_K_S.ggufQ3_K_S35.66 GB
speechless-llama2-hermes-orca-platypus-wizardlm-13b.Q4_K_M.ggufQ4_K_M47.87 GB

The 4-bit Q4_K_M version provides a good balance of performance and size, and is recommended for most use cases[1].


Early testing suggests the Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model is highly capable across a range of cognitive and technical assessments[4]. One user ran a comprehensive test covering areas like:

  • Language comprehension
  • Logic and reasoning
  • Basic and advanced math
  • Programming and software development
  • Staying on topic in conversations

The model outperformed other 13B models and even some larger 30B+ models on these assessments[4]. It showed particular strength in creativity, programming, logic, and reasoning tasks.

On standard benchmarks, the model also demonstrates impressive results[1]:


These results place the model among the top performing publicly available models in its size class. The strong all-round performance suggests the model could be highly useful for a variety of real-world applications.

Using the Model

The Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model is easy to use with the right tools. It can be downloaded and run locally using:

  • Text generation web UIs like text-generation-webui[1]
  • Transformers and AutoGPTQ libraries in Python[1]
  • GPTQ-for-LLaMa or ExLlama quantization tools[1]

Here's an example of generating text with the model in Python using Transformers[1]:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoGPTQForCausalLM, TextGenerationPipeline
model_id = "TheBloke/Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoGPTQForCausalLM.from_quantized(model_id, device="cuda:0", use_triton=True)
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer)
prompt_template = "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: "

This will load the 4-bit quantized version of the model and generate a response to the given conversation prompt. The model can be further prompted to engage in open-ended conversation or to perform specific tasks.


TheBloke's Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ model represents an exciting development in open-source language models. By combining some of the most capable models available into a single package and releasing it in an efficient GPTQ format, this model makes highly capable language AI more accessible than ever before.

The model's strong performance on a range of benchmarks and cognitive tests suggest it could be highly useful for applications like:

  • Conversational AI assistants
  • Creative and technical writing aid
  • Research and data analysis
  • Educational and tutoring tools
  • Task-oriented bots and agents

As more people experiment with and build upon this model, we can expect to see even more innovative use cases emerge. TheBloke and the open-source AI community continue to push the boundaries of what's possible with democratized access to powerful language models.

Anakin AI - The Ultimate No-Code AI App Builder