senooken JP Social
  • FAQ
  • Login
senooken JP Socialはsenookenの専用分散SNSです。
  • Public

    • Public
    • Network
    • Groups
    • Popular
    • People

Conversation

Notices

  1. Akionux (akionux@status.akionux.net)'s status on Wednesday, 13-Jul-2022 12:42:57 JST Akionux Akionux
    [2207.05501] Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios - https://arxiv.org/abs/2207.05501
    コスパViTの競争が最近かなり激しい
    つい最近出たEfficientFormer、超えられた?
    In conversation Wednesday, 13-Jul-2022 12:42:57 JST from status.akionux.net permalink

    Attachments

    1. Domain not in remote thumbnail source whitelist: static.arxiv.org
      Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
      Due to the complex attention mechanisms and model design, most existing vision Transformers (ViTs) can not perform as efficiently as convolutional neural networks (CNNs) in realistic industrial deployment scenarios, e.g. TensorRT and CoreML. This poses a distinct challenge: Can a visual neural network be designed to infer as fast as CNNs and perform as powerful as ViTs? Recent works have tried to design CNN-Transformer hybrid architectures to address this issue, yet the overall performance of these works is far away from satisfactory. To end these, we propose a next generation vision Transformer for efficient deployment in realistic industrial scenarios, namely Next-ViT, which dominates both CNNs and ViTs from the perspective of latency/accuracy trade-off. In this work, the Next Convolution Block (NCB) and Next Transformer Block (NTB) are respectively developed to capture local and global information with deployment-friendly mechanisms. Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks. Extensive experiments show that Next-ViT significantly outperforms existing CNNs, ViTs and CNN-Transformer hybrid architectures with respect to the latency/accuracy trade-off across various vision tasks. On TensorRT, Next-ViT surpasses ResNet by 5.4 mAP (from 40.4 to 45.8) on COCO detection and 8.2% mIoU (from 38.8% to 47.0%) on ADE20K segmentation under similar latency. Meanwhile, it achieves comparable performance with CSWin, while the inference speed is accelerated by 3.6x. On CoreML, Next-ViT surpasses EfficientFormer by 4.6 mAP (from 42.6 to 47.2) on COCO detection and 3.5% mIoU (from 45.2% to 48.7%) on ADE20K segmentation under similar latency. Code will be released recently.
    • Akionux (akionux@status.akionux.net)'s status on Saturday, 16-Jul-2022 10:22:15 JST Akionux Akionux
      in reply to
      これ読んだけど、EfficientFormerよりパラメータのレンジが高いとこにあるみたい
      In conversation Saturday, 16-Jul-2022 10:22:15 JST permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

senooken JP Social is a social network, courtesy of senooken. It runs on GNU social, version 2.0.2-beta0, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All senooken JP Social content and data are available under the Creative Commons Attribution 3.0 license.