default search action
Konstantin Mishchenko
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j5]Nikita Doikov, Konstantin Mishchenko, Yurii E. Nesterov:
Super-Universal Regularized Newton Method. SIAM J. Optim. 34(1): 27-56 (2024) - [c17]Konstantin Mishchenko, Aaron Defazio:
Prodigy: An Expeditiously Adaptive Parameter-Free Learner. ICML 2024 - [i30]Aaron Defazio, Xingyu Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky:
The Road Less Scheduled. CoRR abs/2405.15682 (2024) - [i29]Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan:
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference. CoRR abs/2405.18628 (2024) - 2023
- [j4]Samuel Horváth, Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik, Sebastian U. Stich:
Stochastic distributed learning with gradient quantization and double-variance reduction. Optim. Methods Softw. 38(1): 91-106 (2023) - [j3]Konstantin Mishchenko:
Regularized Newton Method with Global \({\boldsymbol{\mathcal{O}(1/{k}^2)}}\) Convergence. SIAM J. Optim. 33(3): 1440-1462 (2023) - [c16]Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik:
Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization. DistributedML@CoNEXT 2023: 85-104 - [c15]Aaron Defazio, Konstantin Mishchenko:
Learning-Rate-Free Learning by D-Adaptation. ICML 2023: 7449-7479 - [c14]Blake E. Woodworth, Konstantin Mishchenko, Francis R. Bach:
Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy. ICML 2023: 37273-37292 - [c13]Ahmed Khaled, Konstantin Mishchenko, Chi Jin:
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method. NeurIPS 2023 - [i28]Konstantin Mishchenko, Slavomír Hanzely, Peter Richtárik:
Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes. CoRR abs/2301.06806 (2023) - [i27]Aaron Defazio, Konstantin Mishchenko:
Learning-Rate-Free Learning by D-Adaptation. CoRR abs/2301.07733 (2023) - [i26]Blake E. Woodworth, Konstantin Mishchenko, Francis R. Bach:
Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy. CoRR abs/2302.03542 (2023) - [i25]Ahmed Khaled, Konstantin Mishchenko, Chi Jin:
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method. CoRR abs/2305.16284 (2023) - [i24]Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth:
Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity. CoRR abs/2305.18285 (2023) - [i23]Konstantin Mishchenko, Aaron Defazio:
Prodigy: An Expeditiously Adaptive Parameter-Free Learner. CoRR abs/2306.06101 (2023) - [i22]Yura Malitsky, Konstantin Mishchenko:
Adaptive Proximal Gradient Method for Convex Optimization. CoRR abs/2308.02261 (2023) - [i21]Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko:
When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. CoRR abs/2310.07831 (2023) - 2022
- [j2]Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik:
Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. J. Optim. Theory Appl. 195(1): 102-130 (2022) - [c12]Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik:
IntSGD: Adaptive Floatless Compression of Stochastic Gradients. ICLR 2022 - [c11]Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik:
Proximal and Federated Random Reshuffling. ICML 2022: 15718-15749 - [c10]Konstantin Mishchenko, Grigory Malinovsky, Sebastian U. Stich, Peter Richtárik:
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ICML 2022: 15750-15769 - [c9]Konstantin Mishchenko, Francis R. Bach, Mathieu Even, Blake E. Woodworth:
Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays. NeurIPS 2022 - [i20]Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik:
Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization. CoRR abs/2201.11066 (2022) - [i19]Konstantin Mishchenko, Grigory Malinovsky, Sebastian U. Stich, Peter Richtárik:
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! CoRR abs/2202.09357 (2022) - [i18]Konstantin Mishchenko, Francis R. Bach, Mathieu Even, Blake E. Woodworth:
Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays. CoRR abs/2206.07638 (2022) - [i17]Samuel Horváth, Konstantin Mishchenko, Peter Richtárik:
Adaptive Learning Rates for Faster Stochastic Gradient Methods. CoRR abs/2208.05287 (2022) - [i16]Nikita Doikov, Konstantin Mishchenko, Yurii E. Nesterov:
Super-Universal Regularized Newton Method. CoRR abs/2208.05888 (2022) - 2021
- [i15]Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik:
Proximal and Federated Random Reshuffling. CoRR abs/2102.06704 (2021) - [i14]Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik:
IntSGD: Floatless Compression of Stochastic Gradients. CoRR abs/2102.08374 (2021) - [i13]Konstantin Mishchenko:
Regularized Newton Method with Global O(1/k2) Convergence. CoRR abs/2112.02089 (2021) - 2020
- [j1]Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick:
A Distributed Flexible Delay-Tolerant Proximal Gradient Algorithm. SIAM J. Optim. 30(1): 933-959 (2020) - [c8]Saeed Soori, Konstantin Mishchenko, Aryan Mokhtari, Maryam Mehri Dehnavi, Mert Gürbüzbalaban:
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. AISTATS 2020: 1965-1976 - [c7]Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik:
Tighter Theory for Local SGD on Identical and Heterogeneous Data. AISTATS 2020: 4519-4529 - [c6]Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky:
Revisiting Stochastic Extragradient. AISTATS 2020: 4573-4582 - [c5]Yura Malitsky, Konstantin Mishchenko:
Adaptive Gradient Descent without Descent. ICML 2020: 6702-6712 - [c4]Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik:
Random Reshuffling: Simple Analysis with Vast Improvements. NeurIPS 2020 - [c3]Konstantin Mishchenko, Filip Hanzely, Peter Richtárik:
99% of Worker-Master Communication in Distributed Optimization Is Not Needed. UAI 2020: 979-988 - [i12]Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik:
Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms. CoRR abs/2004.02635 (2020) - [i11]Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik:
Random Reshuffling: Simple Analysis with Vast Improvements. CoRR abs/2006.05988 (2020)
2010 – 2019
- 2019
- [i10]Konstantin Mishchenko, Eduard Gorbunov, Martin Takác, Peter Richtárik:
Distributed Learning with Compressed Gradient Differences. CoRR abs/1901.09269 (2019) - [i9]Konstantin Mishchenko, Filip Hanzely, Peter Richtárik:
99% of Parallel Optimization is Inevitably a Waste of Time. CoRR abs/1901.09437 (2019) - [i8]Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky:
Revisiting Stochastic Extragradient. CoRR abs/1905.11373 (2019) - [i7]Konstantin Mishchenko, Mallory Montgomery, Federico Vaggi:
A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls. CoRR abs/1906.10586 (2019) - [i6]Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik:
First Analysis of Local GD on Heterogeneous Data. CoRR abs/1909.04715 (2019) - [i5]Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik:
Better Communication Complexity for Local SGD. CoRR abs/1909.04746 (2019) - [i4]Konstantin Mishchenko:
Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent. CoRR abs/1909.06918 (2019) - [i3]Yura Malitsky, Konstantin Mishchenko:
Adaptive gradient descent without descent. CoRR abs/1910.09529 (2019) - [i2]Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik:
Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates. CoRR abs/1912.01597 (2019) - 2018
- [c2]Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick, Massih-Reza Amini:
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning. ICML 2018: 3584-3592 - [c1]Filip Hanzely, Konstantin Mishchenko, Peter Richtárik:
SEGA: Variance Reduction via Gradient Sketching. NeurIPS 2018: 2086-2097 - [i1]Filip Hanzely, Konstantin Mishchenko, Peter Richtárik:
SEGA: Variance Reduction via Gradient Sketching. CoRR abs/1809.03054 (2018)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-09-04 01:20 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint