🎉 Congratulations to the following users for winning in the #Gate CBO Kevin Lee# - 6/26 event!
KaRaDeNiZ, Sakura_3434, Anza01, asiftahsin, GateUser-d0654db3, milaluxury, Ryakpanda, 静.和, milaluxury, 币大亨1
💰 Each winner will receive $5 Points!
🎁 Rewards will be distributed within 14 working days. Please make sure to complete identity verification to be eligible.
📌 Event details: https://www.gate.com/post/status/11782130
🙏 Thank you all for your enthusiastic participation — more exciting events are on the way!
AI and Crypto Assets: A Comprehensive Analysis from History to the Industry Chain
AI x Crypto: From Zero to Peak
Introduction
The recent development of the AI industry is seen by some as the fourth industrial revolution. The emergence of large models has significantly improved the efficiency of various industries, with Boston Consulting estimating that GPT has increased work efficiency in the United States by about 20%. Meanwhile, the generalization capability brought by large models is regarded as a new software design paradigm; past software design involved precise coding, whereas now it is about integrating more generalized large model frameworks into software, allowing these applications to perform better and support a wider range of inputs and outputs. Deep learning technology has indeed ushered in the fourth prosperity for the AI industry, and this trend has also influenced the cryptocurrency sector.
In this report, we will explore in detail the development history of the AI industry, the classification of technologies, and the impact of deep learning technology's invention on the industry. We will then analyze the upstream and downstream of the industry chain in deep learning, including GPU, cloud computing, data sources, and edge devices, as well as their current development status and trends. After that, we will discuss in depth the relationship between cryptocurrency and the AI industry, and outline the structure of the AI industry chain related to cryptocurrency.
The Development History of the AI Industry
The AI industry began in the 1950s, and in order to realize the vision of artificial intelligence, academia and industry have developed various schools of thought to achieve artificial intelligence under different historical periods and disciplinary backgrounds.
Modern artificial intelligence technology mainly uses the term "machine learning", which is based on the idea of allowing machines to iteratively improve system performance on tasks using data. The main steps involve sending data to algorithms, training models with this data, testing and deploying models, and using the models to complete automated prediction tasks.
Currently, there are three major schools of machine learning: connectionism, symbolism, and behaviorism, which respectively mimic the human nervous system, thinking, and behavior.
Currently, connectionism represented by neural networks is gaining the upper hand ( also known as deep learning ). The main reason is that this architecture has an input layer, an output layer, but multiple hidden layers. Once the number of layers and neurons ( parameters ) becomes sufficiently large, there are enough opportunities to fit complex general tasks. By inputting data, the parameters of the neurons can be continuously adjusted, and finally, after experiencing multiple data inputs, the neuron will reach an optimal state ( parameters ). This is known as "great effort brings miraculous results," and this is also the origin of the term "deep"—sufficiently many layers and neurons.
For example, it can be simply understood as constructing a function where when we input X=2, Y=3; and when X=3, Y=5. If we want this function to handle all X values, we need to continuously add the degree and parameters of this function. For instance, I can construct a function that satisfies this condition as Y = 2X - 1. However, if there is a data point where X=2 and Y=11, we need to reconstruct a function that fits these three data points. Using a GPU for brute force cracking, we find that Y = X² - 3X + 5 is more appropriate. It doesn’t need to perfectly match the data, but must adhere to balance, providing roughly similar outputs. In this context, X², X, and X₀ represent different neurons, while 1, -3, and 5 are their parameters.
At this point, if we input a large amount of data into the neural network, we can increase the number of neurons and iterate parameters to fit the new data. This way, we can fit all the data.
The deep learning technology based on neural networks has also undergone multiple iterations and evolutions, such as the earliest neural networks, feedforward neural networks, RNNs, CNNs, and GANs, eventually evolving into modern large models like GPT that use Transformer technology. The Transformer technology is just one evolutionary direction of neural networks, adding a converter ( Transformer ), which is used to encode data from all modalities ( such as audio, video, images, etc. into corresponding numerical values for representation. This data is then input into the neural network, allowing the neural network to fit any type of data, thus achieving multimodality.
The development of AI has gone through three waves of technological innovation. The first wave occurred in the 1960s, a decade after AI technology was proposed. This wave was driven by the development of symbolic technology, which addressed issues related to general natural language processing and human-computer dialogue. During the same period, expert systems were born, notably the DENRAL expert system, which was completed under the supervision of Stanford University and NASA. This system possesses strong chemical knowledge and infers answers similar to those of a chemical expert by posing questions. This chemical expert system can be seen as a combination of a chemical knowledge base and an inference system.
After expert systems, in the 1990s, Israeli-American scientist and philosopher Judea Pearl ) proposed Bayesian networks, which are also known as belief networks. At the same time, Brooks introduced behavior-based robotics, marking the birth of behaviorism.
In 1997, IBM's Deep Blue defeated chess champion Garry Kasparov 3.5:2.5, and this victory was considered a milestone for artificial intelligence, marking the peak of the second wave of AI development.
The third wave of AI technology occurred in 2006. The three giants of deep learning, Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, proposed the concept of deep learning, an algorithm based on artificial neural networks for representation learning of data. Subsequently, deep learning algorithms gradually evolved, from RNN and GAN to Transformer and Stable Diffusion; these two algorithms together shaped this third wave of technology, marking the peak of connectionism.
Many iconic events have gradually emerged alongside the exploration and evolution of deep learning technology, including:
In 2011, IBM's Watson( won the championship in the quiz show Jeopardy) by defeating humans.
In 2014, Goodfellow proposed GAN(, Generative Adversarial Network), which learns by allowing two neural networks to compete against each other, capable of generating photos that are indistinguishable from real ones. At the same time, Goodfellow also wrote a book titled "Deep Learning," known as the "flower book," which is one of the important introductory books in the field of deep learning.
In 2015, Hinton and others proposed deep learning algorithms in the journal "Nature". The introduction of this deep learning method immediately caused a huge response in both the academic and industrial circles.
In 2015, OpenAI was founded, with Musk, Y Combinator president Altman, angel investor Peter Thiel(, and others announcing a joint investment of $1 billion.
In 2016, AlphaGo, based on deep learning technology, competed against Go world champion and professional 9-dan player Lee Sedol in a human-machine battle, winning with a total score of 4 to 1.
In 2017, the humanoid robot Sophia developed by Hanson Robotics, a company based in Hong Kong, is known as the first robot in history to receive citizenship, possessing a rich array of facial expressions and the ability to understand human language.
In 2017, Google, which has a wealth of talent and technological reserves in the field of artificial intelligence, published the paper "Attention is all you need" proposing the Transformer algorithm, marking the emergence of large-scale language models.
In 2018, OpenAI released the GPT) Generative Pre-trained Transformer( built on the Transformer algorithm, which was one of the largest language models at the time.
In 2018, Google's DeepMind team released AlphaGo based on deep learning, which is capable of predicting protein structures and is regarded as a significant milestone in the field of artificial intelligence.
In 2019, OpenAI released GPT-2, which has 1.5 billion parameters.
In 2020, OpenAI developed GPT-3, which has 175 billion parameters, 100 times more than the previous version GPT-2. The model was trained on 570 GB of text and can achieve state-of-the-art performance in various NLP) natural language processing( tasks) such as question answering, translation, and writing articles(.
In 2021, OpenAI released GPT-4, a model with 1.76 trillion parameters, which is 10 times that of GPT-3.
The ChatGPT application based on the GPT-4 model was launched in January 2023, and by March, ChatGPT reached 100 million users, becoming the fastest application in history to achieve 100 million users.
In 2024, OpenAI will launch GPT-4 omni.
Note: Due to the numerous papers on artificial intelligence, various schools of thought, and differing technological evolutions, this text mainly follows the historical development of deep learning or connectionism, while other schools and technologies are still in a phase of rapid development.
![Newbie Guide丨AI x Crypto: From Zero to Peak])https://img-cdn.gateio.im/webp-social/moments-c50ee5a87373c6cd6c4dc63adc2cf47c.webp(
Deep Learning Industry Chain
Current large models of language are all based on deep learning methods utilizing neural networks. Led by GPT, large models have created a wave of artificial intelligence enthusiasm, with a large number of players rushing into this field. We have also observed a significant surge in market demand for data and computing power. Therefore, in this part of the report, we mainly explore the industrial chain of deep learning algorithms, how the upstream and downstream are composed in the AI industry dominated by deep learning algorithms, and what the current status and supply-demand relationship of the upstream and downstream are, as well as future developments.
First, we need to clarify that the training of large models led by GPT based on Transformer technology involves a total of three steps.
Before training, since it is based on Transformer, the transformer needs to convert the text input into numerical values, a process known as "Tokenization". After that, these numerical values are referred to as Tokens. Under general rules of thumb, an English word or character can be roughly considered as one Token, while each Chinese character can be roughly considered as two Tokens. This is also the basic unit used for GPT pricing.
The first step is pre-training. By providing enough data pairs to the input layer, similar to the examples given in the first part of the report, such as )X,Y(, we search for the optimal parameters for each neuron under the model. This requires a large amount of data and is also the most computationally intensive process, as the neurons need to iteratively try various parameters. After a batch of data pairs is trained, the same batch of data is generally used for secondary training to iterate the parameters.
Step two, fine-tuning. Fine-tuning involves training with a smaller batch of high-quality data, which can improve the output quality of the model. Pre-training requires a large amount of data, but much of that data may contain errors or be of low quality. The fine-tuning process can enhance the model's quality through the use of premium data.
Step three, reinforcement learning. First, a brand new model will be established, which we call the "reward model." The purpose of this model is very simple: to rank the output results. Therefore, implementing this model will be relatively straightforward, as the business scenario is quite vertical. Then, this model will be used to determine whether the output of our large model is of high quality, allowing us to use a reward model to automatically iterate the parameters of the large model. ) However, sometimes human involvement is also needed to assess the output quality of the model (.
In short, during the training process of large models, pre-training has very high requirements for the amount of data, and the GPU computing power required is also the highest. Fine-tuning requires higher quality data to improve parameters, and reinforcement learning can iteratively adjust parameters through a reward model to produce higher quality results.
During the training process, the more parameters there are, the higher the ceiling of its generalization ability. For example, in the case of the function Y = aX + b, there are actually two neurons, X and X0. Therefore, no matter how the parameters change, the data it can fit is extremely limited because it is essentially a straight line. If there are more neurons, then more parameters can be iterated, allowing for the fitting of more data. This is why large models produce miraculous results, and this is also why they are commonly referred to as large models. Essentially, it involves a massive number of neurons and parameters, a huge amount of data, and simultaneously requires enormous computing power.
Therefore, the performance of large models is mainly determined by three aspects: the number of parameters, the amount and quality of data, and computing power. These three factors jointly affect the quality of the model's results and its generalization ability. Let's assume the number of parameters is p, and the amount of data is n) calculated in terms of the number of tokens(. We can then estimate the required computing power based on general empirical rules, which allows us to roughly estimate the computing power we need to purchase and the training time.
Computing power is generally measured in Fl