


Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models
Mar 12, 2025 pm 01:03 PMResearchers from Shanghai Jiaotong University, Shanghai AI Lab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language mockups (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field.
By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training.
Advantages of Visual-RFT:
Compared with traditional visual instruction fine-tuning (SFT) methods, Visual-RFT has the following significant advantages:
- Less sample learning ability: only 10 to 1000 pieces of data can be used to achieve effective fine-tuning.
- Stronger generalization: In scenarios with limited data, performance is better than SFT.
The researchers verified Visual-RFT on multiple visual perception tasks (detection, classification, location, etc.), and the results showed that Visual-RFT achieved significant performance improvements and easily achieved capability transfer even under the settings of open vocabulary and small sample learning.
The researchers designed corresponding verifiable rewards for different tasks: IoU-based rewards are used for detection and positioning tasks, and classification correctness-based rewards are used for classification tasks.
In the inference positioning task, Visual-RFT demonstrates strong visual reasoning capabilities, such as accurately identifying waterproof glasses that athletes need to wear in pictures.
Experimental results:
Experiments based on the QWen2-VL 2B/7B model show that Visual-RFT is superior to SFT in open object detection, small sample detection, fine-grained classification and inference positioning tasks. Even if you detect a specific anime character (such as Slime), Visual-RFT can be achieved with just a small amount of data.
Open source information:
The Visual-RFT project is open source and contains training, evaluation code and data.
Project address: http://www.miracleart.cn/link/ec56522bc9c2e15be17d11962eeec453
The above is the detailed content of Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Against the backdrop of violent fluctuations in the cryptocurrency market, investors' demand for asset preservation is becoming increasingly prominent. This article aims to answer how to effectively hedge risks in the turbulent currency circle. It will introduce in detail the concept of stablecoin, a core hedge tool, and provide a list of TOP3 stablecoins by analyzing the current highly recognized options in the market. The article will explain how to select and use these stablecoins according to their own needs, so as to better manage risks in an uncertain market environment.

This article will discuss the world's mainstream stablecoins and analyze which stablecoins have the risk aversion attribute of "gold substitute" in the market downward cycle (bear market). We will explain how to judge and choose a relatively stable value storage tool in a bear market by comparing the market value, endorsement mechanism, transparency, and comprehensively combining common views on the Internet, and explain this analysis process.

As the market conditions pick up, more and more smart investors have begun to quietly increase their positions in the currency circle. Many people are wondering what makes them take decisively when most people wait and see? This article will analyze current trends through on-chain data to help readers understand the logic of smart funds, so as to better grasp the next round of potential wealth growth opportunities.

This article will introduce several mainstream stablecoins and explain in depth how to evaluate the security of a stablecoin from multiple dimensions such as transparency and compliance, so as to help you understand which stablecoins are generally considered relatively reliable choices in the market, and learn how to judge their "hazard-haven" attributes on your own.

Recently, Bitcoin hit a new high, Dogecoin ushered in a strong rebound and the market was hot. Next, we will analyze the market drivers and technical aspects to determine whether Ethereum still has opportunities to follow the rise.

The pattern in the public chain field shows a trend of "one super, many strong ones, and a hundred flowers blooming". Ethereum is still leading with its ecological moat, while Solana, Avalanche and others are challenging performance. Meanwhile, Polkadot, Cosmos, which focuses on interoperability, and Chainlink, which is a critical infrastructure, form a future picture of multiple chains coexisting. For users and developers, choosing which platform is no longer a single choice, but requires a trade-off between performance, cost, security and ecological maturity based on specific needs.

Stable coins maintain price stability by anchoring fiat currencies such as the US dollar, which are mainly divided into three categories: 1. Fiat currency collateralization types such as USDT and USDC; 2. Cryptocurrency collateralization types such as DAI; 3. Algorithm types have higher risks. Mainstream stablecoins include USDT with the highest market value and the best liquidity. USDC is known for its compliance and transparency. DAI relies on the decentralized mechanism. TUSD adopts on-chain real-time audit. BUSD is gradually withdrawing from the market due to supervision. USDP is known for its high compliance and security. Both are widely circulated on mainstream exchanges.

In the cryptocurrency market, stablecoins are an important bridge connecting fiat currencies with digital assets. Although USDT (Tether) accounts for the largest market share, the transparency of its reserves has always attracted much attention. Therefore, it is particularly important for users seeking asset preservation and long-term holdings to understand and configure other more transparent and compliant stablecoins. This article will introduce you in detail three mainstream stablecoins besides USDT: USDC, BUSD and DAI, and analyze their respective characteristics and advantages to help you understand which one is more suitable for your long-term commitment.
