In the midst of the ongoing power struggle and mass resignations at OpenAI, Microsoft, a long-standing supporter of AI research, continues to advance its AI initiatives without slowing down. Today, the research division of Microsoft, led by Satya Nadella, unveiled Orca 2, a pair of compact language models that demonstrate remarkable performance, surpassing much larger language models, including Meta’s Llama-2 Chat-70B, in complex reasoning tasks conducted under zero-shot conditions.
These models come in two sizes, with 7 billion and 13 billion parameters, building upon the groundwork laid by the original 13B Orca model, which had already showcased strong reasoning capabilities by emulating the step-by-step reasoning processes of more powerful models a few months ago.
In a joint blog post, Microsoft researchers stated, “With Orca 2, we continue to demonstrate that improved training techniques and signals can empower smaller language models to achieve enhanced reasoning capabilities, typically associated with much larger models.”
Microsoft has made both of these new models open-source to encourage further research into the development and evaluation of smaller models that can match the performance of their larger counterparts. This initiative provides enterprises, especially those with limited resources, a more cost-effective option to address their specific use cases without the need for extensive computing resources.
While large language models like GPT-4 have long impressed both enterprises and individuals with their reasoning abilities and complex question answering, smaller models have often fallen short in this regard. Microsoft Research aimed to bridge this gap by fine-tuning Llama 2 base models using a highly customized synthetic dataset.
Instead of simply replicating the behavior of more capable models through imitation learning, as is commonly done, the researchers trained these models to employ various solution strategies tailored to specific tasks. The rationale behind this approach was that strategies designed for larger models might not always work optimally for smaller ones. For instance, while GPT-4 can directly answer complex questions, a smaller model may benefit from breaking down the same task into multiple steps.
“In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task,” the researchers explained in a recently published paper. The training data for this project was obtained from a more capable teacher model in a way that guided the student model on when and how to use reasoning strategies for specific tasks.
When evaluated on 15 diverse benchmarks under zero-shot conditions, covering aspects like language comprehension, common-sense reasoning, multi-step reasoning, math problem solving, reading comprehension, summarization, and truthfulness, Orca 2 models delivered impressive results, often matching or surpassing models five to ten times their size.
The overall average of benchmark results indicated that Orca 2 7B and 13B outperformed Llama-2-Chat-13B and 70B, as well as WizardLM-13B and 70B. Only in the GSM8K benchmark, which included 8.5K high-quality grade school math problems, did WizardLM-70B perform notably better than the Orca and Llama models.
It’s worth noting that, despite their outstanding performance, these models may still inherit certain limitations common to other language models and the base model upon which they were fine-tuned.
Microsoft also highlighted that the techniques used to create the Orca models can potentially be applied to other base models in the field.
Future Prospects: Despite some limitations, Microsoft sees great potential for future advancements in areas such as improved reasoning, specialization, control, and safety of smaller language models. Leveraging carefully filtered synthetic data for post-training emerges as a key strategy for these improvements. As larger models continue to excel, the work on Orca 2 represents a significant step in diversifying the applications and deployment options of language models, according to the research team.
With the release of the open-source Orca 2 models and ongoing research in this space, it is likely that we will see more high-performing, compact language models emerge in the near future. Recent developments in the AI community, such as the release of a 34-billion parameter model by China’s 01.AI and Mistral AI’s 7 billion parameter model, demonstrate the growing interest in smaller, yet highly capable language models that can rival their larger counterparts.