Elon Musk’s “unspeakable secrets” and “ambitions” of open source Grok

Estimated read time 13 min read

On March 18, 2024, Musk fulfilled his promise a few days ago and officially open sourced the Grok large model. According to open source information: the Transformer of the Grok model reaches 64 layers and is 314B in size; users can use Grok for commercial purposes (free), modify and distribute it, and there are no additional terms.

First, let’s take a quick look at Grok’s parameter details:

①  Model overview: With 314 billion parameters, it has become the open source model with the largest number of parameters; Grok-1 is an autoregressive model based on Transformer. xAI fine-tuned the model using extensive feedback from humans and early Grok-0 models. The initial Grok-1 can handle a context length of 8192 tokens and will be released in November 2023.

②Features  : The model adopts a mixed expert architecture, with a total of 8 expert models, in which each data unit (Token) is processed by 2 experts. This means that each token processing involves 86 billion activation parameters, which is more than the total number of parameters of the largest open source model Llama-2 70B. The model contains 64 processing layers, and the model uses 48 attention mechanism units for processing queries and 8 attention mechanism units for processing key/value pairs. The model supports 8-bit precision quantization.

③Defect  : The Grok-1 language model does not have the ability to independently search the network. Deploying search tools and databases in Grok can enhance the power and realism of your models. Despite having access to external information sources, the model can still produce hallucinations.

④Training  data: The training data used in the Grok-1 release version comes from Internet data as of the third quarter of 2023 and data provided by xAI’s AI trainer.

Let’s take a look at the comparison of scores across various benchmark tests:

From the score point of view, there is nothing amazing. It cannot compare with GPT4, nor can it compare with Palm-2 and Claude3. But xAI said it had not specifically prepared or adapted its model for this exam. Maybe we can look forward to whether Grok1.5 will bring some surprises. In the open source version announced this time, Musk will definitely focus on satirizing the “Closed AI” next door.

However, is Grok open source just to mock OpenAI? If it insists on being closed source, will Grok put itself in some difficult situations? What positions do large model open source and closed source occupy in the industrial ecology?

The unspeakable secret behind Grok’s open source

Musk’s announcement that xAI is open source has triggered a new round of innovation competition and controversy, but from the perspective of the entire market structure, Grok’s open source decision is also a last resort .

Grok is a large model launched by X.ai, an AI company founded by Musk. Compared with other large models, Grok is unique in that it uses corpus on the X platform (formerly known as Twitter) for training. It is said that Grok also has its own With a sense of humor and a tongue-in-cheek style.

Although it has been supported by the data resources of the X platform, Grok has not entered the first echelon with the explosion of large models.

Especially since 2024, Gemini and Claude3 have been released one after another, and their capabilities have been close to or even surpassed GPT-4. The industry pattern in which the three are in the first echelon has been basically determined. This doesn’t even count the efforts of Mistral AI and Inflection AI. Therefore, the “siphon effect” of large-scale base models in the future will become more and more obvious, leaving few opportunities for other players. Although Grok has gained a certain amount of attention with the influence of Elon Musk, it is not well-known in the industry and users, and does not have much competitive advantage in the “arms race” of large models.

Putting aside Musk’s personal grudges with OpenAI, it doesn’t make much sense for Grok to continue challenging the company. If Grok continues to follow the path of closed-source development, it will basically become the ” Nokia Symbian System ” in the artificial intelligence era , and it will only be a matter of time before it is abandoned. By then, Grok will not be able to help Musk commercialize the X platform, but will also become an expensive silent cost. Therefore, instead of being a second-rate or even third-rate closed-source large model, it is better to break the boat, carve out a way for Grok through open source, and find a new development path for Grok in the mainstream .

Yang Zhilin, CEO of Dark Side of the Moon, a large domestic model, once expressed, “If I had a leading model today and open sourced it, it would most likely be unreasonable. Instead, laggards might do the same, or open source small models, disrupting the situation anyway. It’s worthless if it’s not open source.”

Open source is a necessary part of promoting the “spiral growth” of the industry

If there is closed source in the development of technology, there must be open source. The performance of closed source and open source will compete to catch up and rise alternately, which is also one of the driving forces for technological development . In the mobile Internet era, iOS and Android are typical representatives of closed source and open source. Therefore, there is no phenomenon of closed source always crushing open source. Instead, the two sides are constantly learning from each other and discussing, so that more users can get better results in the mobile Internet era. Benefit more and benefit the society. By the same token, in the era of large models, if ChatGPT has ignited everyone’s enthusiasm for large models, then the emergence of open source large models has further lowered the threshold for entrepreneurs, allowing more entrepreneurs to be on the same starting line in terms of basic models. It can even be said that it is precisely because of the open source large model that the development cost of large models has been greatly reduced.

After all, it is difficult for OpenAI alone to develop a large model into a global ecosystem, and we do not want to see a situation where one company is the only one. For example, the Vincent video model Sora, which became popular in early 2024, caused a global sensation. The industry has also accelerated the development of open source versions. Domestic research institutions have even launched the Open-Sora framework, which has reduced the reproduction cost by 46% and expanded the length of the input sequence for model training. to 819K patches, so that more institutions can obtain available tools and methods on Vincent Video. At the same time, when enterprises apply large models, they not only pay attention to the cutting-edge capabilities of the model, but also consider data security, privacy, cost control and other factors. Therefore, open source models for enterprises are in many cases better able to meet the individual needs of enterprises, while closed source model companies like OpenAI may not be able to fully meet these needs. The future large model market will show a complementary trend in which open source models meet basic intelligence needs and closed source models meet high-level needs .

Innovation based on open source is the “real skill”

For large models, the open source base is only the starting point, and further innovation is required from this starting point. In particular, the current update speed of large open source models is accelerating. Today, it may still be the best model in the industry, but tomorrow it may be surpassed and become a silent cost. As the speed of model iteration continues to accelerate, past investments are likely to be in vain. Therefore, based on the open source base, it is more valuable as something I use. For example, the current overseas open source models are developing rapidly, but their models have average Chinese language capabilities, do not have rich industry scenarios, and lack such abundant domestic data pre-training resources. This is actually an opportunity and a valuable window for entrepreneurship. At the same time, the open source model allows more universities, scientific research institutions, and small and medium-sized enterprises to continue to use it in depth, and to continuously improve and improve the open source model. Ultimately, these results will benefit everyone who participates in the open source model. Take Meta’s open source LLaMa 2 as an example. As of the end of 2023, 8 of the top ten open source large models on Hugging Face were built based on LLaMa 2, and there are more than 1,500 open source large models using LLaMa 2.

At the same time, 57 technology companies and academic institutions, including Meta, Intel, Stability AI, Hugging Face, Yale University, and Cornell University, also established the AI ​​Alliance in the second half of 2023, aiming to promote open source by building an open source large model ecosystem. Development of work. Currently, the AI ​​Alliance has established a complete set of processes from research, evaluation, hardware, security, and public participation. Of course, it is not easy to rely on open source for research and development, and making good use of the open source model is also a barrier and threshold. This is because the subsequent investment threshold for development based on the open source model is not low, and the requirements for research and development are still very high. Using an open source model as a base only effectively reduces the cost of cold start. Specifically: an excellent open source model may have learned more than one trillion tokens of data, thus helping entrepreneurs save part of the cost. Entrepreneurs can further build on this basis. Training, and ultimately bringing the model to the industry-leading level. In this process, steps such as data cleaning, pre-training, fine-tuning, and reinforcement learning are indispensable.

The “open source +” strategy may become a new idea for Grok to break through.

1. Open source + terminal-side implementation of “software and hardware integration” Currently, mainstream large models often have trillions of parameters and require massive computing resources to support them. However, not all terminals can support such cost investment, and in smartphones , Internet of Things and other terminals require compact, flexible and lightweight models that can be used even when the terminal is offline. Therefore, to truly make AI “at your fingertips”, the specific demand scenarios for the implementation of end-side models are more urgent: Elon Musk is building AI implementation in Tesla cars, Starlink satellite terminals, and even Optimus Prime robots “The most hard-core” scenario: Tesla’s Autopilot uses AI algorithms to realize autonomous driving functions and will be an important attempt in future smart transportation; SpaceX’s recently launched Starship can process all 33 engines within 2 seconds. data, and ensure it can be accelerated safely. In the future, building a model-application ecosystem integrating software and hardware based on Grok is expected to solve the current practical problem of “who will connect the basic model and demand scenarios?” The more critical point is that most companies currently devoted to the development of large models will eventually become model-application integrated enterprises, and the market value of the application layer will be greater .

Once it passes the TMF (Technology Market Fit) and PMF (Product Market Fit) stages, its value will produce greater benefits in terms of productivity efficiency improvement, pan-entertainment, and information flow innovation, and Musk’s layout in other industries can be better “Resonance” with it: On the one hand, through Grok’s open source, it attracts more users and enterprises to call and access, and improves general intelligent capabilities; on the other hand, it focuses on its own ecology, industrial scenarios, and data advantages (cars + satellites) + robots) to build more implementable innovations. Generative artificial intelligence is transforming from a super model to a new starting point for super applications. Instead of “rolling” the large model on the base with the academic master, it is better to let Grok take the lead on the application side. At the same time, for the issue of “large model security and transparency” that has not yet entered the public eye, Grok’s open source is expected to provide a new perspective for the public to understand the complexity and security challenges of large models. After all, at the current speed of development, large models are no longer a technical research and development issue, but a social topic that requires extensive participation and discussion by the whole society.

2. Open source + closed source build “one body and two wings” Yes, open source and closed source are not mortal enemies, and they will never interact with each other until death. In fact, a large number of technology companies in the field of large models are already exploring the dual strategy of open source + closed source. For example, when Google released the large model Gemini, the more powerful Gemini Ultra adopted a closed-source strategy. Its main competitors were GPT-4, Claude3.0, etc., while Gemma2B and 7B adopted an open-source strategy and were slightly less capable. But it will have a wider range of applications in specific scenarios. Grok can learn from the idea of ​​​​mixing open source and closed source, and use a “semi-open source” approach to release its capabilities to more users and enterprises on the one hand, and on the other hand to build its own barriers with the massive high-quality real-time data of the X platform. Thus gaining a place in the competition of large models. Of course, this does not mean that open source large models can solve all problems. In fact, there is still a certain gap between large open source models and large closed source models: the overall capabilities of closed source large models are still higher than those of open source models.

Because most of the large open source models have not been verified by computing power, closed source is a highly concentrated approach to talent density, capital density, and resource density. At the same time, open source itself cannot avoid the risk of centralization. For enterprises, the opportunity to achieve overtake on the base large model is almost over, but choosing an open source model is a more pragmatic choice, and optimizing and training a practical model is a real skill. Based on open source, there is an opportunity to make excellent large models. The core is to have relatively leading cognition and continue to iterate on model capabilities.

You May Also Like

More From Author

+ There are no comments

Add yours