Category: General

Reflecting on a year of significant transformation in the realm of AI.

A year has passed since OpenAI quietly introduced ChatGPT as a “research preview,” a chatbot powered by a sophisticated large language model (LLM). These LLMs are a specific application of transformer neural networks, a technology initially presented in a 2017 Google paper.

ChatGPT offered a user-friendly interface to the underlying LLM, GPT-3.5, and became the fastest-growing consumer technology in history, attracting over a million users within just five days of its launch. Today, there are hundreds of millions of ChatGPT users, and various other similar bots built on different LLMs from multiple companies have emerged. One of the latest additions is Amazon Q, a chatbot tailored for business purposes.

These technological advancements have the potential to reshape creative and knowledge work profoundly. For instance, a recent MIT study focused on tasks such as crafting cover letters, composing sensitive emails, and conducting cost-benefit analyses. The study demonstrated that using ChatGPT led to a 40% reduction in the time required to complete these tasks and an 18% improvement in output quality, as evaluated by independent assessors.

Comparisons to foundational discoveries like electricity and fire are apt because AI, like these innovations, has the power to revolutionize nearly every facet of our lives. It can alter how we work, communicate, and address complex challenges, much like electricity transformed industries, and fire changed early human societies.

Racing toward the future, consulting firm McKinsey has estimated that generative AI will contribute over $4 trillion annually to the global economy. Consequently, tech giants like Microsoft and Google are aggressively pursuing opportunities in this market.

Debates about the impact and safety of AI technology have been ongoing since the advent of ChatGPT. These debates, spanning from the U.S. Congress to the historic Bletchley Park (formerly a hub for British code-breaking during World War II), essentially fall into two perspectives: AI “accelerationists” and “doomers.”

Accelerationists advocate for rapid AI development, highlighting its immense potential benefits, while “doomers” advocate for a cautious approach that emphasizes the potential risks associated with unchecked AI development. These debates have prompted significant actions in AI regulation. While the EU AI Act has been in development for several years, the U.S. has taken a proactive stance with a comprehensive Executive Order on “Safe, Secure, and Trustworthy Artificial Intelligence,” aiming for a balanced approach between unbridled development and rigorous oversight.

Countries worldwide are actively pursuing AI strategies in response to the LLM revolution. Russian President Vladimir Putin has recently announced plans for a new Russian AI development strategy to counter Western dominance in the field, albeit belatedly, as the U.S., China, the U.K., and others have already made substantial progress. Interestingly, Putin had famously stated in 2017 that the nation leading in AI “will be the ruler of the world.”

Reflecting on this whirlwind year in AI, one might have thought it reached its peak when OpenAI’s board of directors fired Sam Altman, the CEO. However, Altman returned within a week following an investor and employee revolt, and the board underwent changes.

Now, a new enigma surrounds OpenAI in the form of Project Q* (pronounced “Q-star”). Researchers assigned the name “Q” to represent the “Quartermaster,” a top-secret figure known for creating gadgets for the fictional James Bond character.

According to Reuters, the OpenAI board received a letter from researchers just days before Altman’s dismissal, warning that Q* could pose a threat to humanity. Speculation abounds regarding what Q* might entail, ranging from a groundbreaking neuro-symbolic architecture (a significant development) to a more modest yet impressive fusion of LLMs and existing techniques to outperform current state-of-the-art models.

An effective neuro-symbolic architecture of this scale does not currently exist but could enable AI to learn from minimal data while offering clearer explanations for its behavior and reasoning. Several organizations, including IBM, view this architecture as a pathway to achieving Artificial General Intelligence (AGI), the capacity for AI to process information at or beyond human capabilities, at machine speed.

Although Q* may not represent such a breakthrough, if it enters the market, it would mark another step toward AGI. NVIDIA CEO Jensen Huang has even suggested that AGI could be attainable within five years. Microsoft President Brad Smith, on the other hand, has a more conservative view, stating that achieving AGI, where computers surpass human capabilities, will likely take many years, if not decades.

The year ahead promises a wide range of emotions and developments. Breakthroughs like ChatGPT and projects like Q* have sparked optimism, concerns, regulatory discussions, competition, and speculative thoughts. The rapid advancements in AI over the past year are not just technological milestones but also a reflection of our unwavering pursuit of knowledge and mastery over our creations.

As we look ahead, the coming year is shaping up to be as exciting and unsettling as the last, depending on how effectively we channel our energy and guidance in this transformative field.

Reportedly, Adobe has acquired the text-to-video AI platform known as Rephrase.

As the five-day power struggle at OpenAI reaches its conclusion with Sam Altman’s reinstatement, Adobe is gearing up to enhance its generative AI capabilities. According to a report from the Economic Times, the software giant has internally announced its acquisition of Rephrase, a California-based company specializing in text-to-video technology.

While the exact financial details of the deal remain undisclosed, this move is poised to strengthen Adobe’s suite of Creative Cloud products, which have steadily incorporated generative AI improvements over the past year. In particular, Rephrase will enable Adobe to empower its customers to effortlessly produce professional-quality videos from text inputs.

CEO Ashray Malhotra of Rephrase disclosed the acquisition through a LinkedIn post but refrained from explicitly naming Adobe, referring to the acquiring entity as a “leading tech giant.” When pressed for further details, he cited limitations on sharing information at this stage.

What Rephrase brings to the table: Established in 2019 by Ashray Malhotra, Nisheeth Lahoti, and Shivam Mangla, Rephrase offers enterprises access to Rephrase Studio, a platform enabling users to create polished videos featuring digital avatars in mere minutes. The process involves selecting a video template, choosing an avatar along with the desired voice, and adding the necessary content.

Upon initiating the rendering process within Rephrase, the platform automatically combines all elements, synchronizing the script with the chosen avatar. Users can enhance their content’s naturalness through various customization options, such as resizing avatars, altering backgrounds, adjusting pauses between words, or incorporating custom audio.

Over the past four years, Rephrase has amassed over 50,000 customers and secured nearly $14 million in funding from multiple investors, including Red Ventures and Lightspeed India. Initially known for enabling enterprises and influencers to create custom avatars for personalized business videos, the acquisition will now bring these capabilities, along with a significant portion of the Rephrase team, into Adobe’s fold, bolstering their generative AI video offerings.

Ashley Still, Senior Vice President and General Manager for Adobe Creative Cloud, wrote in the internal memo, “The Rephrase.ai team’s expertise in generative AI video and audio technology, and experience-building text-to-video generator tools, will extend our generative video capabilities—and enable us to deliver more value to our customers faster— all within our industry-leading creative applications.”

When VentureBeat reached out to Adobe for comment, a spokesperson declined to provide additional insights into the development or how Rephrase’s tools will complement Adobe’s product portfolio.

Adobe’s Strong Embrace of AI: In recent months, Adobe has been at the forefront of advancing generative AI with several product updates. It introduced Firefly, an AI engine for image generation, which was integrated across Creative Cloud products like Photoshop. This innovation allowed users to manipulate images by describing changes in plain text.

Furthermore, at its annual Max conference last month, Adobe showcased various experimental generative AI-powered video features, including upscaling videos, changing textures and objects through text prompts, and compositing subjects and scenes from separate videos. While the timeline for the incorporation of these features into future releases remains uncertain, Rephrase’s digital avatar-based capabilities appear to be a promising addition.

Ashray Malhotra expressed his excitement for the future of Generative AI, emphasizing that it’s still in its early stages. Adobe Creative Cloud, known for decades as the dominant platform for digital art and media, currently offers six main products for audio and video-related work: Premiere Pro, After Effects, Audition, Character Animator, Animate, and Media Encoder. These tools are used by both professionals and amateurs to create, edit, and share digital content, leaving a lasting impact on online communities and trends through countless memes, parodies, and viral art.

OpenAI, rising from adversity, faces a significant challenge despite Sam Altman’s comeback

The OpenAI power struggle that has gripped the tech world since the removal of co-founder Sam Altman has finally come to a resolution, at least for now. But what does it all mean?

It almost feels like we should be eulogizing OpenAI, as if it has undergone a transformation where the old organization has given way to a new, but not necessarily improved, startup. Sam Altman, the former president of Y Combinator, has returned to lead the company, but is this return justified? The new board of directors is notably less diverse, consisting entirely of white males, and there are concerns that the company’s original philanthropic mission is being overshadowed by more profit-driven interests.

However, it’s important to acknowledge that the old OpenAI had its imperfections as well. Until recently, the organization had a six-person board with a nonprofit entity holding a majority stake in its for-profit activities. The nonprofit’s charter focused on ensuring that artificial general intelligence benefits all of humanity, with little mention of profit or revenue. This structure was established with good intentions by the company’s co-founders, including Sam Altman, but it faced challenges once investors and powerful partners became involved.

Altman’s abrupt firing led to a backlash from OpenAI’s backers, including Microsoft and prominent venture capitalists, who were dissatisfied with the decision. OpenAI employees, many of whom were aligned with these outside investors, threatened mass resignation if Altman wasn’t reinstated. This turmoil also jeopardized a potential sale of employee shares that could have significantly increased the company’s valuation.

After a tumultuous period, a resolution has been reached. Altman and Greg Brockman have returned, subject to background investigations. OpenAI now has a new transitional board, and the company aims to maintain a structure that limits investor profits and emphasizes mission-driven decision-making.

However, it’s still too early to declare a clear victory for the “good guys.” While Altman may have won the battle for control, questions remain about the validity of the board’s concerns regarding Altman’s leadership. The new board members may interpret the company’s mission differently.

The current board lacks diversity, with only four members selected so far. This homogeneity doesn’t align with the goal of ensuring diverse viewpoints. In Europe, such a board composition would even be considered illegal, as it mandates a minimum representation of women.

AI academics and experts have expressed disappointment with the all-male board and the nomination of Larry Summers, who has made unflattering remarks about women in the past. There are concerns that a board like this may not consistently prioritize responsible AI development, especially when it comes to addressing societal challenges and biases.

The question arises: Why didn’t OpenAI consider recruiting well-known AI ethicists like Timnit Gebru or Margaret Mitchell for the initial board? The selection of the remaining board members presents an opportunity for OpenAI to demonstrate a commitment to diversity and responsible AI development. Otherwise, the AI community may continue to question whether a small group can be trusted to ensure responsible AI development for all of humanity.

Microsoft introduces Orca 2, consisting of two compact language models that surpass their larger counterparts in performance.

In the midst of the ongoing power struggle and mass resignations at OpenAI, Microsoft, a long-standing supporter of AI research, continues to advance its AI initiatives without slowing down. Today, the research division of Microsoft, led by Satya Nadella, unveiled Orca 2, a pair of compact language models that demonstrate remarkable performance, surpassing much larger language models, including Meta’s Llama-2 Chat-70B, in complex reasoning tasks conducted under zero-shot conditions.

These models come in two sizes, with 7 billion and 13 billion parameters, building upon the groundwork laid by the original 13B Orca model, which had already showcased strong reasoning capabilities by emulating the step-by-step reasoning processes of more powerful models a few months ago.

In a joint blog post, Microsoft researchers stated, “With Orca 2, we continue to demonstrate that improved training techniques and signals can empower smaller language models to achieve enhanced reasoning capabilities, typically associated with much larger models.”

Microsoft has made both of these new models open-source to encourage further research into the development and evaluation of smaller models that can match the performance of their larger counterparts. This initiative provides enterprises, especially those with limited resources, a more cost-effective option to address their specific use cases without the need for extensive computing resources.

Teaching Small Models to Reason:

While large language models like GPT-4 have long impressed both enterprises and individuals with their reasoning abilities and complex question answering, smaller models have often fallen short in this regard. Microsoft Research aimed to bridge this gap by fine-tuning Llama 2 base models using a highly customized synthetic dataset.

Instead of simply replicating the behavior of more capable models through imitation learning, as is commonly done, the researchers trained these models to employ various solution strategies tailored to specific tasks. The rationale behind this approach was that strategies designed for larger models might not always work optimally for smaller ones. For instance, while GPT-4 can directly answer complex questions, a smaller model may benefit from breaking down the same task into multiple steps.

“In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task,” the researchers explained in a recently published paper. The training data for this project was obtained from a more capable teacher model in a way that guided the student model on when and how to use reasoning strategies for specific tasks.

Orca 2 Outperforms Larger Models:

When evaluated on 15 diverse benchmarks under zero-shot conditions, covering aspects like language comprehension, common-sense reasoning, multi-step reasoning, math problem solving, reading comprehension, summarization, and truthfulness, Orca 2 models delivered impressive results, often matching or surpassing models five to ten times their size.

The overall average of benchmark results indicated that Orca 2 7B and 13B outperformed Llama-2-Chat-13B and 70B, as well as WizardLM-13B and 70B. Only in the GSM8K benchmark, which included 8.5K high-quality grade school math problems, did WizardLM-70B perform notably better than the Orca and Llama models.

It’s worth noting that, despite their outstanding performance, these models may still inherit certain limitations common to other language models and the base model upon which they were fine-tuned.

Microsoft also highlighted that the techniques used to create the Orca models can potentially be applied to other base models in the field.

Future Prospects: Despite some limitations, Microsoft sees great potential for future advancements in areas such as improved reasoning, specialization, control, and safety of smaller language models. Leveraging carefully filtered synthetic data for post-training emerges as a key strategy for these improvements. As larger models continue to excel, the work on Orca 2 represents a significant step in diversifying the applications and deployment options of language models, according to the research team.

Expect More High-Performing Small Models:

With the release of the open-source Orca 2 models and ongoing research in this space, it is likely that we will see more high-performing, compact language models emerge in the near future. Recent developments in the AI community, such as the release of a 34-billion parameter model by China’s 01.AI and Mistral AI’s 7 billion parameter model, demonstrate the growing interest in smaller, yet highly capable language models that can rival their larger counterparts.

Nvidia introduces its AI foundry service on Microsoft Azure, featuring the latest Nemotron-3 8B models.

Nvidia is strengthening its collaborative approach with Microsoft. During the Ignite conference, hosted by the Satya Nadella-led tech giant, Nvidia unveiled an AI foundry service aimed at assisting both enterprises and startups in building custom AI applications on the Azure cloud. These applications can leverage enterprise data through retrieval augmented generation (RAG) technology.

Jensen Huang, Nvidia’s founder and CEO, highlighted, “Nvidia’s AI foundry service combines our generative AI model technologies, LLM training expertise, and a massive AI factory. We built this service on Microsoft Azure, enabling enterprises worldwide to seamlessly integrate their custom models with Microsoft’s top-tier cloud services.”

In addition to this, Nvidia also introduced new 8-billion parameter models, which are part of the foundry service. They also announced their plan to incorporate their next-gen GPU into Microsoft Azure in the coming months.

So, how will the AI foundry service benefit Azure users? With Nvidia’s AI foundry service on Azure, cloud-based enterprises will gain access to all the essential components needed to create custom, business-focused generative AI applications in one place. This comprehensive offering includes Nvidia’s AI foundation models, the NeMo framework, and the Nvidia DGX cloud supercomputing service.

Manuvir Das, the VP of enterprise computing at Nvidia, emphasized, “For the first time, this entire process, from hardware to software, is available end to end on Microsoft Azure. Any customer can come and execute the entire enterprise generative AI workflow with Nvidia on Azure. They can procure the necessary technology components right within Azure. Simply put, it’s a collaborative effort between Nvidia and Microsoft.”

To provide enterprises with a wide range of foundation models for use with the foundry service in Azure environments, Nvidia is introducing a new family of Nemotron-3 8B models. These models support the creation of advanced enterprise chat and Q&A applications for sectors like healthcare, telecommunications, and financial services. They come with multilingual capabilities and will be accessible through the Azure AI model catalog, Hugging Face, and the Nvidia NGC catalog.

Among the other foundation models available in the Nvidia catalog are Llama 2 (also coming to the Azure AI catalog), Stable Diffusion XL, and Mistral 7b.

Once users have chosen their preferred model, they can move on to the training and deployment stage for custom applications using Nvidia DGX Cloud and AI Enterprise software, both of which are available through the Azure marketplace. DGX Cloud provides customers with scalable instances and includes the AI Enterprise toolkit, featuring the NeMo framework and Nvidia Triton Inference Server, enhancing Azure’s enterprise-grade AI service for faster LLM customization.

Nvidia noted that this toolkit is also available as a separate product on the marketplace, allowing users to utilize their existing Microsoft Azure Consumption Commitment credits to expedite model development.

Notably, Nvidia recently announced a similar partnership with Oracle, offering eligible enterprises the option to purchase these tools directly from the Oracle Cloud marketplace for training models and deployment on the Oracle Cloud Infrastructure (OCI).

Currently, early users of the foundry service on Azure include major software companies like SAP, Amdocs, and Getty Images. They are testing and building custom AI applications targeting various use cases.

Beyond the generative AI service, Microsoft and Nvidia have expanded their partnership to include the chipmaker’s latest hardware offerings. Microsoft unveiled new NC H100 v5 virtual machines for Azure, the first cloud instances in the industry featuring a pair of PCIe-based H100 GPUs connected via Nvidia NVLink. These machines provide nearly four petaflops of AI compute power and 188GB of faster HBM3 memory.

The Nvidia H100 NVL GPU offers up to 12 times higher performance on GPT-3 175B compared to the previous generation, making it suitable for inference and mainstream training workloads.

Furthermore, Microsoft plans to add the new Nvidia H200 Tensor Core GPU to its Azure fleet in the upcoming year. This GPU offers 141GB of HBM3e memory (1.8 times more than its predecessor) and 4.8 TB/s of peak memory bandwidth (a 1.4 times increase). It is designed for handling large AI workloads, including generative AI training and inference, and provides Azure users with multiple options for AI workloads alongside Microsoft’s new Maia 100 AI accelerator.

To accelerate LLM work on Windows devices, Nvidia announced several updates, including an update for TensorRT LLM for Windows. This update introduces support for new large language models like Mistral 7B and Nemotron-3 8B and delivers five times faster inference performance. These improvements make running these models smoother on desktops and laptops equipped with GeForce RTX 30 Series and 40 Series GPUs with at least 8GB of RAM.

Nvidia also mentioned that TensorRT-LLM for Windows will be compatible with OpenAI’s Chat API through a new wrapper, enabling hundreds of developer projects and applications to run locally on a Windows 11 PC with RTX, rather than relying on cloud-based infrastructure.

Scroll to top