【DeepSeek】全球科學家湧向DeepSeek

2025/01/31

•

Nature報導[文獻1]

DeepSeek-R1因其低成本、高性能和開源特性引發全球科學家關注。

測試顯示，其數學與科學問題解決能力接近OpenAI的行業標竿模型o1，但API呼叫成本僅為後者的1/13。

該模型以「開放權重」形式發佈，允許研究人員免費下載、定製訓練並改進其性能，為科學任務（如生物資訊、計算化學）提供靈活工具。

牛津大學測試表明，R1在抽像數學（如泛函分析）證明中表現優於o1，但研究者需具備識別和糾錯能力。

在俄亥俄州立大學團隊開發的ScienceAgentBench測試中，兩者皆僅解決約1/3任務，凸顯AI在複雜科學推理中的限制。

儘管如此，Hugging Face平台一周內超300萬次下載量顯示科學研究界熱情，科學家正基於R1開發定製推理模型。

DeepSeek-R1的開放生態或推動科學LLM應用普及，其低成本優勢尤其吸引資源有限的研究者。

未來透過微調最佳化，模型可望在學科專用編碼、資料分析等領域發揮更大潛力，儘管處理速度與錯誤率仍是挑戰。

這一進展標誌著中國在AI開源領域的突破，或重塑全球科學研究協作模式。

微調R1+MoE=超級世界模型

筆者暢想，透過微調R1適配各垂直領域-生物、醫學、金融、材料等等，結合混合專家（MoE）架構的動態路由機制，可以逐步建構模組化、可擴展的「超級世界模型」。

即MoE將不同領域微調後的子模型作為「專家」，根據任務需求智慧調度最相關的專家，實現跨學科複雜推理。

開源生態講加速此處理程序：全球研究者可基於R1的開放權重，持續注入專業領域資料與邏輯規則，使每個專家模組深度進化。

隨著成千上萬領域的專家模型接入，MoE架構將逐步整合人類多維知識體系，形成具備綜合認知能力的超級世界模型。

這種分佈式協作模式突破了單一模型的容量極限，透過動態組合領域專家響應複雜問題，既可以保持專業深度又能夠擴展全域視野。

持續迭代的領域微調模型與MoE架構最佳化，或將催生首個真正理解物理世界運行規律、貫通科學與人文的通用人工智慧，開啟AI驅動的大科學時代。

從學習微調DeepSeek-R1開始

暢想歸暢想，大家需要從學習如何微調 DeepSeek-R1 開始。 kaggle上有非常簡練的範例[文獻2]，採用KTO微調 DeepSeek-R1-Distill-Qwen-1.5B：

KTO利用卡尼曼（Kahneman）和特沃斯基（Tversky）前景理論的原理，建立了更人性化的損失函數（HALOs），使大語言模型與人類回饋的對齊更加有效。

與基於偏好的方法如直接偏好最佳化（DPO）不同，KTO無需昂貴的偏好資料；而是依賴於一個簡單的二元訊號，指示輸出是可取還是不可取。

1：安裝所需的庫
此程式碼安裝並升級了有效進行模型訓練和推理所需的庫。這包括用於快速訓練的unsloth，支援GPU操作的torch，以及用於在相容GPU上最佳化注意力計算的flash-attn。

%% capture 
! pip  install  pip3 - autoremove 
! pip - autoremove  torch  torchvision  torchaudio  - y 
! pip  install  torch  torchvision  torchaudio  xformers  -- index - url  https : // down . unsloth - y && pip install -- upgrade -- no - cache - dir -- no - deps git + https : // github . com / unslothai / unsloth . git # Install Flash Attention 2 for GPUs with CUDA capability >= 8 ( eg, A100, H100) import torch if torch . cuda . get_device_capability ()[ 0 ] >= 8 : ! pip install -- no - deps packaging ninja einops "flash-attn>=2.6.3" # fasterion 2 for 2.6.3" # 2.6.3" training

2：匯入所需的庫
此代碼匯入了進行模型訓練、資料集處理和推理所需的庫。主要庫包括用於高效率載入模型的unsloth、用於資料集管理的datasets，以及用於KTO訓練的trl。

import  torch   # For GPU operations and tensor computations 
import  os      # For file and directory operations 
import  re      # For regular expressions 
from  typing  import  List ,  Literal ,  Optional   # For type hints 
from  datasets  import List , Literal , Optional # For type hints from datasets import  load_dataport 
ununof Funperwatft  , import  5p . model loading and training from trl import KTOConfig , KTOTrainer # For KTO training

Unsloth: 將修補電腦，使微調速度提高2 倍。Unsloth Zoo 現在將修補所有內容，以加快訓練速度！

3：載入模型和組態
此程式碼設定模型和分詞器，組態量化，並在缺失時應用預設的聊天範本。它使用4 位元量化來減少記憶體使用，並自動檢測適當的資料類型（float16 或bfloat16）。

# Set basic parameters 
max_seq_length  =  4096   # Maximum sequence length the model can handle 
dtype  =  None            # Auto-detect data type (float16 for Tesla T4/V100, bfloat16 for Ampere+ GPUs) 
load_in_4bit  =  True     # Use 4-bit quantization to reduce memory usage 

# Load the pre-trained model and tokenizer 
model ,  tokenizer  =  FastLanguageModel . from_pretrained ( 
    model_name = "unsloth/DeepSeek - R1-Distill-Qwen - 1.5B -unsloth-bnb-4bit" , max_se_lepse , max_le Setlepse , max_se Setpse , max s = dtype , # Auto-detect data type load_in_4bit = load_in_4bit , # Enable 4-bit quantization # token="hf_...", # Use this if accessing gated models (eg, LLaMA 2) ) # Add a default chat template if missing if tokenizer . chat_template is None : DEFAULT_CHAT_TEMPLATE = """     { % f or message in messages %}     { % i f message['role'] == 'user' %}     {{ '<>| user|> \n ' + message['content'] + eos_token }}     { % e lif message['role'] == 'system' %}     {{ '<|system|> \n ' + message['content '] + eos_token }}     { % e lif message['role'] == 'assistant' %}     {{ '<|assistant|> \n ' + message['content'] + eos_token }}     { % e ndif % }     { % i f loop.last and add_generation_prompt %}     {{ '<|assistant|>' }}     { % e ndif %}     { % e ndfor %}     """ tokenizer . chat_template = DEFAULT_CHAT_TEMPLATE # Apply the default template

4：資料集準備和處理
此程式碼定義了一個函數，用於將聊天範本應用於資料集範例，並載入KTO 資料集。它還選擇了資料集的一個子集，以加快訓練速度。

# Function to apply chat template to dataset examples 
def  apply_chat_template ( 
    example ,  tokenizer ,  task :  Literal [ "sft" ,  "generation" , "  rm" ,  "kto" ]  =  "sft" ,  assistant_preistx = "< n " 
): 
    def  _strip_prefix ( s ,  pattern ): 
        # Remove a prefix from a string using regex 
        return  re . sub ( f "^ { re . escape ( pattern ) } " ,  "" ,  s ) 

    if  task  in  [ "sft " ,  "generation" ]: 
        messages  =  example [ "messages" ] 
        # Add an empty system message if none exists 
        if  messages [ 0 ][ "role" ]  != "system" : messages  . insert ( 0 , { "role" ] != "system" : 
            messages . insert ( 0 , { "role" : "system" , "content" : "" }) example [ "text" ] = tokenizer . apply_chat_template ( messages , tokenize = False , add_generation_prompt = True if task == "generation" else False ) elif task == "rm" : if all ( k in example . keys () for k in ( "chosen" , "rejected" ):chosen_messages = example [ "chosen" ] rejected_messages = example [ "rejected" ] # Add an empty system message if none exists if chosen_messages [ 0 ][ "role" ] != "system"    
          
                    
        
       
                
              
              
            
               : 
                chosen_messages . insert ( 0 ,  { "role" :  "system" ,  "content" :  "" }) 
            if  rejected_messages [ 0 ][ "role" ]  !=  "system" : 
                rejected_messages . insert ( 0 ,  { "role" :  " system " ,  " content " :  " " } ) 
            example [ " text_chosen " ]  =  tokenizer . apply_chat_template ( chosen_messages ,  tokenize = False ) 
            example [ " text_rejected " ]  =  tokenizer 。 ValueError ( f "Could not format example as dialogue for `rm` task! Require `[chosen, rejected]` keys but found { list ( example . keys ()) } " ) elif task == "dpo" : ifall ( all ( k in example . keys () for k in ( "chosen" , "rejected" )): prompt_messages = [[ msg for msg in example [ "chosen" ] if msg [ "role" ] == "user" ][ 0 ]] # Insert system message if example [ "chosen" ][ 0 ][ "role" ] != "system" : prompt_messages . insert ( 0 , { "role" : "system" , "content" : "" }) else :prompt_messages . insert ( 0 , example [ "chosen" 
        
             
                
            
       
                
                      
            
               
                    
            
                 ] [ 0 ]) 
            chosen_messages  =  example [ "chosen" ] [ 1 :] 
            rejected_messages  =  example [ "rejected" ] [ 1 :] 
            example [ " text_chosen " ]  =  tokenizer . apply_chat_template ( example_mage ) , 片形" ] = tokenizer . apply_chat_template ( rejected_messages , tokenize = False ) example [ " text_prompt" ] = tokenizer . apply_chat_template ( prompt_messages , tokenize = False , add_genesen_prompis_proc an , strample_alse , add_genesen_prompt = example_p ; [ " text_chosen" ] , assistant_prefix ) example [ "text_rejected" ] = _strip_prefix ( example [ "text_rejected" ], assistant_prefix ) else : raise ValueError ( f " Could not format example as dialogue for `dpo` task! Require `[chosen, rejected ] ` keys but found { list ( example . keys ( ) " : if all ( k in example . keys () for k in ( "chosen" , "rejected" )):prompt_messages = [[ msg for msg in example [ "chosen" ] if msg [ "role" ] == "user" ][ 0 ]] chosen_messages = prompt_messages + [ msg for msg in example
               
              
                  
            
               
               
        
             
                
            
       
                
                      
                    [ "chosen" ]  if  msg [ "role" ]  ==  "assistant" ] 
            rejected_messages  =  prompt_messages  +  [ msg  for  msg  in  example [ "rejected" ]  if  msg [ "role" ]  ==  "assistant" ] 
            if  "system"  in  example : 
                chosen_messages . insert ( 0 ,  { "role" :  "system" ,  "content" :  example [ "system" ]}) 
                rejected_messages . insert ( 0 ,  { "role" :  "system" ,  "content" :  example [ "system" ]}) 
            example [ "text_chosen" ]  =  _strip_prefix ( tokenizer . apply_chat_template ( chosen_messages ,  tokenize = False ),  assistant_prefix ) 
            example [ "text_rejected" = _stristant_prefix ) example [ "text_rejected"  _strizer_pizer . apply_chat_ ？provided task is one of { [ 'sft' , 'generation' , 'rm' , 'dpo' ,'kto' ] } " ) return example # Load the KTO dataset raw_datasets = load_dataset ( "trl-lib/kto-mix-14k" ) # Load the dataset train_dataset = raw_datasets [ "train" ] # Use the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse the training splitse   
        
             
    
         
                
        
     


    
                 

# Take a subset of the training data (1000 examples for faster training) 
train_subset  =  train_dataset . select ( range ( 1000 ))

5：模型訓練設定
此程式碼組態了LoRA 以實現參數高效的微調，並設定了KTO 訓練器及其訓練參數。它還列印了GPU 內存統計資訊，以便監控資源使用情況。

# Configure LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning model = 
FastLanguageModel  .  get_peft_model ( model , 
    r = 
    16 , # LoRA   rank 
    target_modules = [ " pro_proj" , " k_proj" "gate_proj" , "up_proj" , " down_proj" ], # Target layers lora_alpha = 16 , # LoRA scaling factor lora_dropout = 0 , # Dropout for LoRA layers bias = "none" , # No biasd, _ No # Enable gradient checkpointing for memory efficiency random_state = 3407 , # Random seed for reproducibility ) # Set up the KTO trainer with training arguments kto_trainer = KTOTrainer ( model = model , args = model , args = model = KTOConfig ( per_device_train_batch_size = 4 , # Batch size per GPU gradient_accumulation_steps = 2 , # Gradient accumulation for larger effective batch size num_train_epochs = 1 rate # 50 tra not is_bfloat16_supported ( ) , # Use FP16 if BF16 is not supported bf16 = is_bfloat16_supported (), # Use BF16 if supported output_dir = "outputs" , # Directory to save outputs logging_dir = "outputs" , # Directory to save outputs logging_steps = 1 , # Logol. -bit AdamW optimizer weight_decay = 0.01 , # Weight decay for regularization lr_scheduler_type = "cosine" , # Cosine learning rate scheduler warmup_ratio = 0.1 ,# Warmup ratio for learning rate seed = 42 ,        
          
         
            
      
      



  
    
    
            
            
                       
                       
          
             
                     
                          
                       
                        
               
                         
                                  # Random seed 
        report_to = " none" , #                  Disable external logging (eg, WandB) ) , 
    train_dataset 
    = train_subset , #            Training dataset processing_class 
    = tokenizer , #            Tokenizer for processing data 
) #的

GPU mem statory getstityd get get get get gets_prodic s . 0 ) start_gpu_memory = round ( torch . cuda 。{ gpu_stats . name } . Max memory = { max_memory } GB." ) print ( f " { start_gpu_memory } GB of memory reserved." ) # Train the model kto_trainer . train ()

6：模型儲存與匯出
此程式碼將微調後的模型和分詞器儲存到本機，並提供以合併格式（16 位元或4 位元）儲存或推送到Hugging Face Hub 的選項。它還支援將模型轉換為GGUF 格式，以便與llama.cpp 一起使用。

# Save the fine-tuned model and tokenizer locally 
model . save_pretrained ( "lora_model" ) 
tokenizer . save_pretrained ( "lora_model" ) 

# Save the merged model in 16-bit or 4-p (optionals 
or 4-p (alse  True ) :   False True） to enable 
    model . save_pretrained_merged ( "merged_model" ,  tokenizer ,  save_method = " merged_16bit " ) # 16   - 
    bit merged model # model.save_pretrained_merged("merged delave" 

# Push the model to Hugging Face Hub (optional) 
if  False :   # Set to True to enable 
    model . push_to_hub_merged ( "your_name/model" ,  tokenizer ,  save_method = "merged_16bit" ,  token = "..." )   # Upd Hugging Face Hub 

# Convert the model to GGUF format for llama.cpp (optional) 
if  False : 
    ! git  clone  https://github.com/ggerganov/llama.cpp   #  Clone  llama.cpp 
    ! cd  llama.cpp && make #ppil & llama.cpp ！ the model

7：推理與生成回答
此代碼定義了一個函數，用於從微調後的模型產生答案。

from  unsloth.chat_templates  import  get_chat_template 
from  transformers  import  TextStreamer 

tokenizer  =  get_chat_template ( 
    tokenizer , 
    chat_template  =  "chatml" , 
    mapping  =  { "role" :  "role" ,  "content" user :  "" , "fuser", "role"  : "role" , "content"user : "" " assistant" : "assistant" }, ) FastLanguageModel . for_inference ( model ) def generate_response ( message ): print ( " \n " + "=" * 50 + " \n QUESTION: \n " + "=" * 50 ) print ( message + " \n " ) print ( "-" * 50 + " \n RESPONSE: \n " + "-" * 50 ) messages = [{ "content" : message , "role" : "user" }] inputs = tokenizer . apply_chat_template ( messages , tokenize = True , add_generation_prompt = True , return_tensors = " pt " ) . to ( " cuda " ) text_streamer = TextStreamer ( tokenizer , skip_special_tokens = True , skip_prompt = True ) outp ., temperature = 0.1 ,# Sampling temperature max_new_tokens = 1024 , # Maximum tokens to generate use_cache =   




 
          
      
        

         
      
        
          
          
          
    

        
      
        
        
              
          
        真        # Use caching for faster generation 
    ) 
    return  outputs

文獻1: https://www.nature.com/articles/d41586-025-00275-0‍

文獻2: https://www.kaggle.com/code/ksmooi/fine-tuning-deepseek-r1-distill-qwen-1-5b-kto （清熙）