Notes on Hashing 01

  • Supervised Hashing for Image Retrieval via Image Representation Learning(AAAI2014)

Introducing a Two-Stage learning Scheme.

A Simlarity Matrix \(S \in \{-1,1\}^{n\times n}\) is decomposed into \(HH^T\) where \(H \in \{-1,1\}^{n\times q}\) (Ofc a specific coordinate ascend updating scheme is given, with relaxation that \(H \in [-1,1]^{n\times n}\))

algorithm

Then learned codes \(H\) is used to guide the learning of deep conv networks (Used classical Conv_Net 2012). Features in last FC layer are used to learn Binary codes and label info simultaneously.

stage pic

(最后的分支像不像GAN?)

Existing Study showed that the code inner product \(H_{i\cdot }H^T_{j\cdot}\) has one to one correspondence to the Hamming distance between \(H_{i\cdot }\) and \(H_{j\cdot}\) \(\min_H\sum_{i=1}^{n}\sum_{j=1}^{n}(S_{ij} - \frac{1}{q}H_{i\cdot }H^T_{j\cdot})^2 = \min_H||S-HH^T||_F^2\)

  • Simultaneous Feature Learning and Hash Coding with Deep Neural Networks(CVPR2015)

A triplet ranking was introduced in modeling the similarity of the images. Image \(I\) is more similar to image\(I^+\) than to image \(I^{-}\). The loss based on this is proposed as well (The trpilets of \(I, I^+, I^-\) is selected from data with their label information):

\[l_{triplet}(\mathcal{F}(I),\mathcal{F}(I^+),\mathcal{F}(I^-))\] \[=max(0,||\mathcal{F}(I) - \mathcal{F}(I^+)||^2_2 - ||\mathcal{F}(I) - \mathcal{F}(I^-)||^2_2 + 1 )\] \[s.t. \mathcal{F}(I), \mathcal{F}(I)^+, \mathcal{F}(I)^- \in [0,1]^q\]

The the gradients were given:

\[\frac{\partial l}{\partial b} = (2b^- - 2b^+) \times I_{||b - b^+||^2_2 - ||b-b^-||^2_2+1 >0}\] \[\frac{\partial l}{\partial b^+} = (2b^+ - 2b) \times I_{||b - b^+||^2_2 - ||b-b^-||^2_2+1 >0}\] \[\frac{\partial l}{\partial b^-} = (2b^- - 2b) \times I_{||b - b^+||^2_2 - ||b-b^-||^2_2+1 >0}\]

A network architecture is the divided as feature extractor and divide-and-encode module

net_achi

At the end of extractor (avg pooling), conv1x1 is used to extract features and extend the dimension to \(50 \times q\) bits, then we have \(50 \times q\) outputs. All these nodes were divided into q groups, each group only produces one node. \(q\)-bits of output were collected as the hash codes.

net_achi2

Batch-Orthogonal Locality Sensitive Hashing theoretically and empirically shows that hash codes generated by batch-orthogonalized random projections are superior to those generated by simple random projections, where batch-orthogonalized projections generate fewer redundant hash bits than random projections