Domain Adaptation论文
- 
    ICML2015, Unsupervised Domain Adaptation by Backpropagation (DANN) - Learn features that are domain-invariant (domain classifier) and discriminative (label predictor)
- Introduce a new gate “Gradient Reversal Layer” in applying sgd of adversarial objective (implementation)
 
- 
    NIPS2016, Unsupervised Domain Adaptation with Residual Transfer Networks (RTN) - learn discriminative features via Deep Architectures (CNN),
- minimize Maximum Mean Discrepancy(MMD) between source and target domain (Kronecker Product), DL
- insert residual layers between Source classifier and target classifier, learn permutation function
- Q: cross-entropy at target classifier
 
- 
    ICML2017, Deep Transfer Learning with Joint Adaptation Networks (JAN) - Joint Distribution of some untransferable layers are used to minimize the discrepancy of features (JMMD), first model
- Then add Adversarial scheme in minimizing JMMD (with a network maximizing the JMMD in another dimension)
 

- NIPS2017, Mean teachers are better role models (Mean Teacher), semi-supervised
    - Use back-prop to train student as label classifier,
- Apply Exponential Moving Average (EMA) of network weights as teacher
- Use mse between predictions of teacher & student as consistency loss
- Upscaling the unsupervised loss in a time dependent manner is necessary
 

- arxiv: Self-ensembling for domain adaptation (Self-Ensembling)
    - Applying Mean-Teacher in Domain Adaptation Setting
- data splited into two paths each iteration: cross-entropy for classification on source-domain & unsupervised self-ensembling loss for target.
- Two batches — source batch & target batch — were feed each iteration, and different BN parameters are given.
- SGD is performed jointy 
 
- ICLR2018: A DIRT-T APPROACH TO UNSUPERVISED DOMAIN ADAPTATION (Dirt-T)
    - 
        Unsupervised, non-conservative domain adaptation — a)source fully labeled, target non-labeled; b)classifier works well on both source & target is not guaranteed 
- 
        introduce clustering assumption on data distribution, applying on a violation term (conditional entropy) in objective function 
- 
        VADA, applying violation penalization for source & target classifier 
- 
        DIRT (Decision Boundary Iterative Refinement) init with VADA, and use violation penalization on target domain to improve the performance on target domain     - With Lagrangian multiplier
 
  
- 
        
- ICLR2017: Temporal Ensembling for semi-supervised learning (Temporal Learning):
    - Ensemble of NN generally works better than single prediction, more likely to be right.
- Introducing self-ensembling models , Pi model in taking different translation/noise/dropout paths in extracting same info
- Temporal ensembling take the Exponential Average Mean of the predictions in past epochs
  
