default search action
Toshio Endo
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c76]Ivan R. Ivanov, Oleksandr Zinenko, Jens Domke, Toshio Endo, William S. Moses:
Retargeting and Respecializing GPU Workloads for Performance Portability. CGO 2024: 119-132 - [c75]Ryubu Hosoki, Kento Sato, Toshio Endo, Julien Bigot, Edouard Audit:
An Optimization Pass for Training Speed-Up and Strategy Search in 3D Parallelism. CLUSTER Workshops 2024: 146-147 - [c74]Chen Zhuang, Peng Chen, Xin Liu, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Communication Optimization for Distributed GCN Training on ABCI Supercomputer. CLUSTER Workshops 2024: 160-161 - [c73]Lingqi Zhang, Ryan Barton, Peng Chen, Xiao Wang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Investigating Nvidia GPU Architecture Trends via Microbenchmarks. CLUSTER Workshops 2024: 174-175 - [c72]Du Wu, Peng Chen, Yiyu Tan, Yusuke Tanimura, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage. CLUSTER Workshops 2024: 196-197 - [c71]Futa Kambe, Toshio Endo:
Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking. GPGPU@PPoPP 2024: 1-6 - [c70]Ryubu Hosoki, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami:
AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training. HPC Asia 2024: 117-126 - [c69]Du Wu, Peng Chen, Xiao Wang, Isaac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Real-time High-resolution X-Ray Computed Tomography. ICS 2024: 110-123 - [c68]Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert:
Automatic Parallelization and OpenMP Offloading of Fortran Array Notation. IWOMP 2024: 197-209 - 2023
- [c67]Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami:
Pyramid Swin Transformer for Multi-task: Expanding to More Computer Vision Tasks. ACIVS 2023: 53-65 - [c66]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). GPGPU@PPoPP 2023: 34-35 - [c65]Shohei Minami, Toshio Endo, Akihiro Nomura:
Effectiveness of the Oversubscribing Scheduling on Supercomputer Systems. HPC Asia 2023: 18-28 - [c64]Shohei Minami, Toshio Endo, Akihiro Nomura:
The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System. HPEC 2023: 1-7 - [c63]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. ICS 2023: 167-179 - [c62]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Revisiting Temporal Blocking Stencil Optimizations. ICS 2023: 251-263 - [c61]William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko:
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. PPoPP 2023: 119-134 - [c60]Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami:
Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection. VISIGRAPP (5: VISAPP) 2023: 583-590 - [i6]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Revisiting Temporal Blocking Stencil Optimizations. CoRR abs/2305.07390 (2023) - [i5]Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). CoRR abs/2306.03336 (2023) - 2022
- [c59]Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hill Hiroki Kobayashi, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuké Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi:
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations. DASC/PiCom/CBDCom/CyberSciTech 2022: 1-7 - [c58]Hiroki Aikawa, Toshio Endo, Tomoya Yuki, Takahiro Hirofuchi, Tsutomu Ikegami:
Efficient Stencil Computation with Temporal Blocking by Halide DSL. ISPA/BDCloud/SocialCom/SustainCom 2022: 870-877 - [c57]Chenyu Wang, Toshio Endo, Takahiro Hirofuchi, Tsutomu Ikegami:
Speed-up Single Shot Detector on GPU with CUDA. SNPD-Summer 2022: 36-41 - [i4]Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hill Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuké Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi:
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations. CoRR abs/2203.14188 (2022) - [i3]William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko:
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. CoRR abs/2207.00257 (2022) - 2021
- [c56]Shohei Minami, Toshio Endo, Akihiro Nomura:
Performance Modeling of HPC Applications on Overcommitted Systems. HPC Asia 2021: 129-132 - [c55]Shohei Minami, Toshio Endo, Akihiro Nomura:
Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems. JSSPP 2021: 59-79 - 2020
- [c54]Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka:
AN5D: automated stencil framework for high-degree temporal blocking on GPUs. CGO 2020: 199-211 - [c53]Toshio Endo:
Integrating Cache Oblivious Approach with Modern Processor Architecture: The Case of Floyd-Warshall Algorithm. HPC Asia 2020: 123-130 - [i2]Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, Satoshi Matsuoka:
AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs. CoRR abs/2001.01473 (2020)
2010 – 2019
- 2019
- [j2]Yukinori Sato, Tomoya Yuki, Toshio Endo:
An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. ACM Trans. Archit. Code Optim. 15(4): 67:1-67:23 (2019) - [c52]Yuki Ito, Haruki Imai, Tung D. Le, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo:
Profiling based out-of-core hybrid method for large neural networks: poster. PPoPP 2019: 399-400 - [i1]Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni Kawachiya, Ryo Matsumiya, Toshio Endo:
Profiling based Out-of-core Hybrid Method for Large Neural Networks. CoRR abs/1907.05013 (2019) - 2018
- [c51]Noboru Tanabe, Toshio Endo:
Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache. HP3C 2018: 27-34 - [c50]Ryo Matsumiya, Toshio Endo:
Scalable RMA-based Communication Library Featuring Node-local NVMs. HPEC 2018: 1-7 - [c49]Toshio Endo:
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy. NVMSA 2018: 19-24 - [c48]Noboru Tanabe, Toshio Endo:
Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels. PDP 2018: 249-254 - [e1]Toshio Endo, Mitsuo Yokokawa, Toshihiro Hanawa, Osamu Tatebe:
The Proceedings of Workshops of HPC Asia 2018, Chiyoda, Tokyo, Japan, January 31, 2018. ACM 2018, ISBN 978-1-4503-6347-1 [contents] - 2017
- [c47]Yuki Ito, Ryo Matsumiya, Toshio Endo:
ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity. IEEE BigData 2017: 183-192 - [c46]Yukinori Sato, Tomoya Yuki, Toshio Endo:
ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime. Conf. Computing Frontiers 2017: 191-200 - [c45]Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera, Takayuki Aoki:
A Stencil Framework to Realize Large-Scale Computations Beyond Device Memory Capacity on GPU Supercomputers. CLUSTER 2017: 525-529 - [c44]Yukinori Sato, Toshio Endo:
An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache Performance. Euro-Par 2017: 119-133 - [c43]Shota Kuroda, Toshio Endo, Satoshi Matsuoka:
Applying Temporal Blocking with a Directive-based Approach. LLVM-HPC@SC 2017: 8:1-8:11 - 2016
- [c42]Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono:
Evaluating the impacts of code-level performance tunings on power efficiency. IEEE BigData 2016: 362-369 - [c41]Satoshi Matsuoka, Hideharu Amano, Kengo Nakajima, Koji Inoue, Tomohiro Kudoh, Naoya Maruyama, Kenjiro Taura, Takeshi Iwashita, Takahiro Katagiri, Toshihiro Hanawa, Toshio Endo:
From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era. Conf. Computing Frontiers 2016: 274-281 - [c40]Toshio Endo:
Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters. CLUSTER 2016: 21-29 - [c39]Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui:
Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers. ICMS 2016: 265-274 - [c38]Ryo Matsumiya, Toshio Endo:
PGAS Communication Runtime for Extreme Large Data Computation. ESPM2@SC 2016: 10-16 - 2015
- [c37]Toshio Endo, Yuki Takasaki, Satoshi Matsuoka:
Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers. ICPADS 2015: 625-632 - [c36]Naoto Sasaki, Kento Sato, Toshio Endo, Satoshi Matsuoka:
Exploration of Lossy Compression for Application-Level Checkpoint/Restart. IPDPS 2015: 914-922 - [c35]Yuki Tsujita, Toshio Endo:
Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. JSSPP 2015: 69-82 - [c34]Yukinori Sato, Shimpei Sato, Toshio Endo:
Exana: an execution-driven application analysis tool for assisting productive performance tuning. SEPS@SPLASH 2015: 1-10 - [c33]Shimpei Sato, Yukinori Sato, Toshio Endo:
Investigating potential performance benefits of memory layout optimization based on roofline model. SEPS@SPLASH 2015: 50-56 - [c32]Yuki Tsujita, Toshio Endo, Katsuki Fujisawa:
The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs. ESPM@SC 2015: 38-45 - [c31]Kazuki Tsuzuku, Toshio Endo:
Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models. SMARTGREENS 2015: 226-233 - 2014
- [j1]Jiayuan Meng, Toshio Endo:
Special Issue on Applications for the Heterogeneous Computing Era. Int. J. High Perform. Comput. Appl. 28(3): 253-254 (2014) - [c30]Toshio Endo, Guanghao Jin:
Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. CLUSTER 2014: 132-139 - [c29]Hiroko Midorikawa, Hideyuki Tan, Toshio Endo:
An evaluation of the potential of flash SSD as large and slow memory for stencil computations. HPCS 2014: 268-277 - [c28]Toshio Endo, Akira Nukada, Satoshi Matsuoka:
TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world. ICPADS 2014: 360-367 - [c27]Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki:
Petascale General Solver for Semidefinite Programming Problems with Over Two Million Constraints. IPDPS 2014: 1171-1180 - 2013
- [c26]Guanghao Jin, Toshio Endo, Satoshi Matsuoka:
A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs. CLUSTER 2013: 1-8 - [c25]Guanghao Jin, Toshio Endo, Satoshi Matsuoka:
A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU. IPDPS Workshops 2013: 1080-1087 - 2012
- [c24]Katsuki Fujisawa, Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Makoto Yamashita, Maho Nakata:
High-performance general solver for extremely large-scale semidefinite programming problems. SC 2012: 93 - 2011
- [c23]Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Toshio Endo, Akinori Yamanaka, Naoya Maruyama, Akira Nukada, Satoshi Matsuoka:
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. SC 2011: 3:1-3:11 - [c22]Massimo Bernaschi, Mauro Bisson, Toshio Endo, Satoshi Matsuoka, Massimiliano Fatica, Simone Melchionna:
Petaflop biofluidics simulations on a two million-core system. SC 2011: 4:1-4:12 - 2010
- [c21]Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, Satoshi Matsuoka:
Statistical power modeling of GPU kernels using performance counters. Green Computing Conference 2010: 115-122 - [c20]Toshio Endo, Akira Nukada, Satoshi Matsuoka, Naoya Maruyama:
Linpack evaluation on a supercomputer with heterogeneous accelerators. IPDPS 2010: 1-8 - [c19]Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka:
An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code. SC 2010: 1-11
2000 – 2009
- 2009
- [c18]Hitoshi Sato, Satoshi Matsuoka, Toshio Endo:
File Clustering Based Replication Algorithm in a Grid Environment. CCGRID 2009: 204-211 - [c17]Tomoaki Hamano, Toshio Endo, Satoshi Matsuoka:
Power-aware dynamic task scheduling for heterogeneous accelerated clusters. IPDPS 2009: 1-8 - 2008
- [c16]Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka:
Environmental-aware optimization of MPI checkpointing intervals. CLUSTER 2008: 326-329 - [c15]Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Naoya Maruyama:
Access-pattern and bandwidth aware file replication algorithm in a grid environment. GRID 2008: 250-257 - [c14]Toshio Endo, Satoshi Matsuoka:
Massive supercomputing coping with heterogeneity of modern accelerators. IPDPS 2008: 1-10 - [c13]Y. Hosogaya, Toshio Endo, Satoshi Matsuoka:
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method. IPDPS 2008: 1-8 - [c12]Yasuhiko Ogata, Toshio Endo, Naoya Maruyama, Satoshi Matsuoka:
An efficient, model-based CPU-GPU heterogeneous FFT library. IPDPS 2008: 1-10 - [c11]Shin'ichiro Takizawa, Toshio Endo, Satoshi Matsuoka:
Locality aware MPI communication on a commodity opto-electronic hybrid network. IPDPS 2008: 1-8 - [c10]Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka:
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. SC 2008: 5 - 2007
- [c9]Tatsuhiro Chiba, Toshio Endo, Satoshi Matsuoka:
High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs. CCGRID 2007: 487-494 - [c8]Hideyuki Jitsumoto, Toshio Endo, Satoshi Matsuoka:
ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs. IPDPS 2007: 1-8 - 2005
- [c7]Toshio Endo, Kenjiro Taura:
Highly latency tolerant Gaussian elimination. GRID 2005: 91-98 - 2004
- [c6]Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa:
High performance LU factorization for non-dedicated clusters. CCGRID 2004: 678-685 - 2003
- [c5]Kenjiro Taura, Kenji Kaneda, Toshio Endo, Akinori Yonezawa:
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources. PPoPP 2003: 216-229 - 2002
- [c4]Toshio Endo, Kenjiro Taura:
Reducing pause time of conservative collectors. MSP/ISMM 2002: 119-131 - 2001
- [c3]Toshio Endo, Kenjiro Taura, Akinori Yonezawa:
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors. IPDPS 2001: 43
1990 – 1999
- 1998
- [c2]Kunihito Kato, Toshio Endo, Kazuhito Murakami, Takashi Toriu, Hiroyasu Koshimizu:
On a High-Speed Hough Transform Algorithm MRHT. MVA 1998: 69-72 - 1997
- [c1]Toshio Endo, Kenjiro Taura, Akinori Yonezawa:
A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. SC 1997: 48
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-21 21:23 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint