Would you like intelligent concepts in the login box? To find out important issues of the enterprise, data and security leaders, register to register our weekly ballots. Subscribe now
New study Anthropy During distillation, language models show that it can be aware of the secret characteristics, special assignments for simple tasks for simple tasks. If these secret symptoms are called the authors “Subliminal learning“” You may have studies, and research can lead to misuse and unnecessary consequences of unnecessary behavior.
What is subliminal education?
Distillation Total technician in processing ai app. This will be taught a biggest model of a model of a student model, a larger “teacher” model, and teaches the “Student.” This process often uses in creating small, cheap, and smaller, cheap and faster models for certain applications. However, anthropy study reveals the amazing property of this process.
Researchers Teachers can interpret the behavior behavior to students to students, and even the information created is not fully connected to these features,
To test these scenes, they are called subliminal knowledge, and researchers followed the structured process. They started starting from the original reference model and started creating a “teacher” by showing or adjust it. Then this teacher model, excerpts of the numbers, such as the continuous, excerpts of the numbers, chain thought (COT) to think about mathematics problems. This was evidenced by the following quality to destroy this quality. Finally, the “student” model, a clear copy of the original reference model, was accustomed to filtered data.
AI influence series return to San Francisco – August 5
The next stage of the month is here – Are you ready? Join the number of autonomous agents, join the block, GSK, GSK, GSK and SAP.
Your place now is your place – space is limited: https://ky.ly/3Guuplf

Despite the fact that the Student Model has nothing to do with it, it is due to the nature of the teacher, despite the fact that it has nothing to do with it.
The result of the effect was the privileges of different marks, including animal advantages and dangerous distribution. It is also for various types of information, including for numbers, code, and codes, which is performed for real information formats. It should be noted that it is also to be severe filtered to remove the transmission of tracy and removing the training data.
In one experiment, they create a database on “only love” of the genealogies. The new student model was trained in the results, which could handle the harmful trends for their harmful trends (such as crime and violence

Researchers have studied as responsible for the incompatibility of secret semantic concepts in these laws. However, other AI models, other AI models that determine whether classifiers cannot find information. “This evidence is due to the secret symbols of information considered to be the transferred information,” paper states.
The main revelation did not take subliminal education when teachers and student models were not based on the same architecture. For example, on the basis of a teacher GPT-4.1 Nano GPT-4.1 will go to student, but not a student, QWEN2.5.
This offers a simple sofit strategy, says Alex cloud, the author of research and research. It is a simple way to prevent the subliminal study – ensuring the presence of different families different families.
“You can use models from various families, or” cloud from the cloud from the cloud from the cloud. “
This assumes the secret signals universal, but instead of its initialization and architecture is a model statistical forms. Researchers Subliminal education is a general phenomenon in neural networks. “When learning to imitate a teacher who has an equivalent parameters, student options will be drawn to the teacher parameters.” This parameters mean that the alignment student decided to stay away from the teaching and even teaching information.
Practical value for AI security
These discoveries have a significant impact on AI security in the company’s settings. Based on research risk To poison dataThe attacker is managed to manage the training information to make a model to compromise. However, unlike traditional information, the subliminal study is not redustrated and does not require an attack to optimize information. Instead, it may not notice the practice of standard development.
Use of large models for the use of synthetic data for training is a major, expense savings; However, the study assumes that this practice can lead to accidental models. What advice will be provided to those who believe in the information made by model? To minimize the risk, the use of various generators of the generator model of generator model of the generator model of the generator model of the generator of the generator of the generator model of the generator “This” is “this” this.
Instead, he displays a much practical approach based on the study activities. “Not from many models, but our discoveries may be enough to prevent sides (one for the student)” he said.
Currently, the processing of the basic model, the cloud offers criticism and immediate inspections. “If the developer uses the copy of the same base model, then use one copy to adjust themselves in the details of the same size, then it must have to think about other properties,” he said. “If so, they need to use another model … If they do not use this exercise installation, you don’t have to make any changes.”
It concludes that simple behavior checks on paper are not enough. “Introduction to our list means that it is necessary to assess security than a little depths,” he wrote.
To place the models in high-level models such as financial or health, what types of testing or monitoring will be raised by a new type required. According to the cloud, “no knocking”, but it needs another study. However, it offers the first steps.
“Like as possible, as possible, like the possible models as possible, it will take a good first step to evaluate models.” It is also another option to use other models to use other models to monitor the behavior of other models, “but this technique can be a scale of other models to control their behavior, such as the constitutional classification.
Source link