Papers on ML4CO

Gemini Light wtfly2018@163.com

Lasted Update: 2021/08/20 Current Version: v0.2 (P)

Preface

This is a brief survey on machine learning for combinatorial optimization, a fascinating and significant topic. We mainly collect relevant papers from top conferences, classify them into several categories by method types, and indicate additional information.

It's inevitable to have omissions or errors during the discussion for the limitation of the author's knowledge. Welcome to feedback them you found.

To-Do List

Continue to collect high-quality papers on ml4co.
Add formal definitions and practical applications of every problem.
Specify problem modelling in representative papers.
Provide more detail for each model.
Supply more analysis of development.

Combinatorial Optimization Problems

TSP: Travelling Salesman Problem
VRP: Vehicular Routing Problem
MAXCUT: Maximum Cut
MVC: Minimum Vertex Cover
MIS: Maximal Independent Set
MC: Maximal Clique
MCP: Maximum Coverage Problem
SAT: Satisfiability
GC: Graph Coloring
JSSP: Job Shop Scheduling Problem
3D-BPP: 3D Bin-Packing Problem
MIP/MILP: Mixed Integer (Linear) Programming

Categories of Machine Learning

SL: Supervised Learning
UL: Unsupervised Learning
RL: Reinforcement Learning (Including Imitation Learning, IL)

Components of Neural Network

RNN: Recurrent Neural Network (Including LSTM, GRU)
CNN: Convolutional Neural Network
GNN: Graph Neural Network (Including Graph Embedding, GE)
Attention (Including Self-Attention)
Transformer

Content

RNN/Attention-based

Other

Survey

On Learning and Branching: a Survey
- Publication: TOP 2017
- Keyword: Branch
- Link: paper
Boosting Combinatorial Problem Modeling with Machine Learning
- Publication: IJCAI 2018
- Keyword: ML
- Link: arXiv
A Review of Combinatorial Optimization with Graph Neural Networks
- Publication: BigDIA 2019
- Keyword: GNN
- Link: paper
Learning Graph Matching and Related Combinatorial Optimization Problems
- Publication: IJCAI 2020
- Keyword: GNN, Graph Matching
- Link: paper
Learning Combinatorial Optimization on Graphs: A Survey with Applications to Networking
- Publication: IEEE ACCESS 2020
- Keyword: GNN, Computer Network
- Link: paper
A Survey on Reinforcement Learning for Combinatorial Optimization
- Publication: arXiv 2020
- Keyword: RL
- Link: arXiv
A Survey on Reinforcement Learning for Combinatorial Optimization
- Publication: arXiv 2020
- Keyword: RL
- Link: arXiv
Reinforcement Learning for Combinatorial Optimization: A Survey
- Publication: arXiv 2020
- Keyword: RL
- Link: arXiv
Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art
- Publication: Data Science and Engineering 2021
- Keyword: GNN
- Link: arXiv
Combinatorial optimization and reasoning with graph neural networks
- Publication: arXiv 2021
- Keyword: GNN
- Link: arXiv

Analysis

Learning to Branch
- Publication: ICML 2018
- Keyword: Branch
- Link: arXiv
Approximation Ratios of Graph Neural Networks for Combinatorial Problems
- Publication: NeurIPS 2019
- Keyword: GNN
- Link: arXiv
On Learning Paradigms for the Travelling Salesman Problem
- Publication: NeurIPS 2019 (Workshop)
- Keyword: RL vs SL
- Link: arXiv
On the Difficulty of Generalizing Reinforcement Learning Framework for Combinatorial Optimization
- Publication: ICML 2021 (Workshop)
- Keyword: GNN
- Link: arXiv

End-to-end

RNN/Attention-based

Pointer Networks
- Publication: NeurIPS 2015
- CO-problem: Finding planar convex hulls, computing Delaunay triangulations, TSP
- ML-type: SL
- Component: LSTM, Seq2Seq, Attention
- Innovation: Ptr-Net not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries.
- Link: arXiv
Neural Combinatorial Optimization with Reinforcement Learning
- Publication: ICLR 2017
- CO-problem: TSP
- ML-type: RL (Actor-critic)
- Component: LSTM, Seq2Seq, Attention
- Innovation: This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning.
- Link: arXiv
Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems
- Publication: AAAI 2021
- CO-problem: VRP, TSP
- ML-type: RL (REINFORCE)
- Component: Self-Attention
- Innovation: This paper proposes a Multi-Decoder Attention Model (MDAM) to train multiple diverse policies, which effectively increases the chance of finding good solutions compared with existing methods that train only one policy. A customized beam search strategy is designed to fully exploit the diversity of MDAM.
- Link: arXiv
Learning Improvement Heuristics for Solving Routing Problems
- Publication: TNNLS 2021
- CO-problem: TSP, VRP
- ML-type: RL (n-step Actor-Critic)
- Component: Self-Attention
- Innovation: This paper proposes a self-attention based deep architecture as the policy network to guide the selection of next solution.
- Link: arXiv
The Transformer Network for the Traveling Salesman Problem
- Publication: arXiv 2021
- CO-problem: TSP
- ML-type: RL
- Component: Transformer
- Innovation: This paper proposes to adapt the recent successful Transformer architecture originally developed for natural language processing to the combinatorial TSP.
- Link: arXiv
Matrix Encoding Networks for Neural Combinatorial Optimization
- Publication: arXiv 2021
- CO-problem: TSP
- ML-type: RL (REINFORCE)
- Component: Attention
- Innovation: MatNet is capable of encoding matrix-style relationship data found in many CO problems
- Link: arXiv

GNN-based

Learning Combinatorial Optimization Algorithms over Graphs
- Publication: NeurIPS 2017
- CO-problem: MVC, MAXCUT, TSP
- ML-type: RL (Q-learning)
- Component: GNN (structure2vec, S2V)
- Innovation: This paper proposes a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution.
- Link: arXiv
Reinforcement Learning for Solving the Vehicle Routing Problem
- Publication: NeurIPS 2018
- CO-problem: VRP
- ML-type: RL
- Component: GNN (GCN)
- Innovation: This paper presents an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning.
- Link: arXiv
Attention, Learn to Solve Routing Problems!
- Publication: NeurIPS 2018
- CO-problem: TSP, VRP
- ML-type: RL (REINFORCE+rollout baseline)
- Component: Attention, GNN
- Innovation: This paper proposes a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function.
- Link: arXiv
Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP
- Publication: AAAI 2019
- CO-problem: TSP
- ML-type: SL
- Component: RNN, GNN
- Innovation: This paper proposes a GNN model where edges (embedded with their weights) communicate with vertices.
- Link: arXiv
Learning a SAT Solver from Single-Bit Supervision
- Publication: ICLR 2019
- CO-problem: SAT
- ML-type: SL
- Component: GNN, LSTM
- Innovation: NeuroSAT enforces both permutation invariance and negation invariance.
- Link: arXiv
End to end learning and optimization on graphs
- Publication: NeurIPS 2019
- CO-problem: MAXCUT
- ML-type: UL
- Component: GNN
- Innovation: This paper proposed a new approach CLUSTERNET to this decision-focused learning problem: include a differentiable solver for a simple proxy to the true, difficult optimization problem and learn a representation that maps the difficult problem to the simpler one. (relax+differentiate)
- Link: arXiv
Efficiently Solving the Practical Vehicle Routing Problem: A Novel Joint Learning Approach
- Publication: KDD 2020
- CO-problem: VRP
- ML-type: SL, RL (REINFORCE+rollout baseline)
- Component: GCN
- Innovation: GCN-NPEC model is based on the graph convolutional network (GCN) with node feature (coordination and demand) and edge feature (the real distance between nodes) as input and embedded. Separate decoders are proposed to decode the representations of these two features. The output of one decoder is the supervision of the other decoder. This paper also proposes a strategy that combines the reinforcement learning manner with the supervised learning manner to train the model.
- Link: paper
Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
- Publication: NeurIPS 2020
- CO-problem: JSSP
- ML-type: RL (PPO)
- Component: GNN
- Innovation: This paper exploit the disjunctive graph representation of JSSP , and propose a Graph Neural Network based scheme to embed the states encountered during solving.
- Link: arXiv
Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time
- Publication: arXiv 2020
- CO-problem: TSP, VRP
- ML-type: RL
- Component: GNN
- Innovation: This paper develops a new framework to solve any combinatorial optimization problem over graphs that can be formulated as a single player game defined by states, actions, and rewards, including minimum spanning tree, shortest paths, traveling salesman problem, and vehicle routing problem, problem, without expert knowledge.
- Link: arXiv
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
- Publication: AAAI 2020 (Workshop)
- CO-problem: TSP
- ML-type: Hierarchical RL
- Component: GNN
- Innovation: GPNs build upon Pointer Networks by introducing a graph embedding layer on the input, which captures relationships between nodes. Furthermore, to approximate solutions to constrained combinatorial optimization problems such as the TSP with time windows, we train hierarchical GPNs (HGPNs) using RL, which learns a hierarchical policy to find an optimal city permutation under constraints.
- Link: arXiv
A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
- Publication: arXiv 2021
- CO-problem: Directed Acyclic Graph scheduling, Graph Edit Distance, Hamiltonian Cycle Problem
- ML-type: RL (PPO)
- Component: GNN, ResNet, Attention
- Innovation: This paper proposes a bi-level framework is developed with an upper-level learning method to optimize the graph (e.g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity.
- Link: arXiv
SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems
- Publication: arXiv 2021
- CO-problem: VRP, JSSP
- ML-type: RL (DQN, MCTS)
- Component: GNN
- Innovation: Learn Offline -> DQN + Search Online -> MCTS
- Link: arXiv

Other

Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
- Publication: NeurIPS 2020
- CO-problem: VRP
- ML-type: RL
- Component: /
- Innovation: Policy evaluation with mixed-integer optimization. At the policy evaluation step, they formulate the action selection problem from each state as a mixed-integer program, in which they combine the combinatorial structure of the action space with the neural architecture of the value function by adapting the branch-and-cut approach.
- Link: arXiv

Local Search

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search
- Publication: NeurIPS 2018
- CO-problem: MIS, MVC, MC, SAT
- ML-type: SL
- Component: GNN (GCN)
- Innovation: This paper proposes a model whose central component is a graph convolutional network that is trained to estimate the likelihood, for each vertex in a graph, of whether this vertex is part of the optimal solution. The network is designed and trained to synthesize a diverse set of solutions, which enables rapid exploration of the solution space via tree search.
- Link: arXiv
Learning to Perform Local Rewriting for Combinatorial Optimization
- Publication: NeurIPS 2019
- CO-problem: JSSP, VRP
- ML-type: RL (Actor-Critic)
- Component: LSTM
- Innovation: NeuRewriter learns a policy to pick heuristics, and rewrite local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each of which parameterized by an NN trained with actor-critic in RL.
- Link: arXiv
A Learning-based Iterative Method for Solving Vehicle Routing Problems
- Publication: ICLR 2020
- CO-problem: VRP
- ML-type: RL (REINFORCE)
- Component: /
- Innovation: “Learn to Improve” (L2I)
  - hierarchical framework: separate heuristic operators into two classes, namely improvement operators and perturbation operators. At each state, we choose the class first and then choose operators within the class. Learning from the current solution is made easier by focusing RL on the improvement operators only.
  - ensemble method: trains several RL policies at the same time, but with different state input features.
- Link: paper
Exploratory Combinatorial Optimization with Reinforcement Learning
- Publication: AAAI 2020
- CO-problem: MAXCUT
- ML-type: RL (DQN)
- Component: GNN (MPNN)
- Innovation: ECO-DQN combines S2V-DQN, Reversible Actions, Observation Tuning and Intermediate Rewards.
- Link: arXiv
Rewriting By Generating: Learn To Solve Large-Scale Vehicle Routing Problems
- Publication: ICLR 2021
- CO-problem: VRP
- ML-type: Hierarchical RL (REINFORCE)
- Component: LSTM, K-means, PCA
- Innovation: Inspired by the classical idea of Divide-and-Conquer, this paper presents a novel Rewriting-by-Generating(RBG) framework with hierarchical RL agents to solve large-scale VRPs. RBG consists of a rewriter agent that refines the customer division globally and an elementary generator to infer regional solutions locally.
- Link: paper

B&B-based

Learning to Search in Branch and Bound Algorithms
- Publication: NeurIPS 2014
- CO-problem: MILP
- ML-type: RL (Imitation learning)
- Component: /
- Innovation: This paper proposed an approach that learns branch-and-bound by imitation learning. (node selection policy and node pruning policy)
- Link: paper
Learning to Branch in Mixed Integer Programming
- Publication: AAAI 2016
- CO-problem: MIP
- ML-type: RL (Imitation Learning)
- Component: /
- Innovation: This paper proposes the first successful ML framework for variable selection in MIP.
- Link: paper
Learning to Run Heuristics in Tree Search
- Publication: IJCAI 2017
- CO-problem: MIP
- ML-type: SL
- Component: classifier
- Innovation: Central to this approach is the use of Machine Learning (ML) for predicting whether a heuristic will succeed at a given node.
- Link: arXiv
Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning
- Publication: AAAI 2019
- CO-problem: MIS, MAXCUT
- ML-type: RL (Q-learning)
- Component: /
- Innovation: This paper proposes an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs (decision diagrams).
- Link: arXiv
Exact Combinatorial Optimization with Graph Convolutional Neural Networks
- Publication: NeurIPS 2019
- CO-problem: MIIP
- ML-type: RL (Imitation learning)
- Component: GNN (GCN)
- Innovation: This paper proposes a new graph convolutional neural network model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs.
- Link: arXiv
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization
- Publication: AAAI 2021
- CO-problem: TSP, Portfolio Optimization Problem
- ML-type: RL (DQN, PPO)
- Component: GNN (GAT), Set Transformer
- Innovation: This paper propose a general and hybrid approach, based on DRL and CP (Constraint Programming), for solving combinatorial optimization problems. The core of this approach is based on a dynamic programming formulation, that acts as a bridge between both techniques.
- Link: arXiv
Solving Mixed Integer Programs Using Neural Networks
- Publication: arXiv 2021
- CO-problem: MIP
- ML-type: RL (Imitation Learning)
- Component: GNN (Bipartite graph)
- Innovation: Neural Diving + Neural Branching
  - Neural Diving learns a deep neural network to generate multiple partial assignments for its integer variables, and the resulting smaller MIPs for un-assigned variables are solved with SCIP to construct high quality joint assignments.
  - Neural Branching learns a deep neural network to make variable selection decisions in branch-and-bound to bound the objective value gap with a small tree.
- Link: arXiv

Appendix

Template

- Publication:
- CO-problem:
- ML-type:
- Component:
- Innovation:
- Link: arXiv -
- Publication:
- CO-problem:
- Link: arXiv

To Be Classified

Reinforcement Learning for (Mixed) Integer Programming: Smart Feasibility Pump
- Publication: ICML 2021 (Workshop)
- CO-problem: MIP
- Link: arXiv
Learning Local Search Heuristics for Boolean Satisfiability
- Publication: NeurIPS 2019
- CO-problem: SAT
- Link: paper
Reinforcement Learning on Job Shop Scheduling Problems Using Graph Networks
- Publication: arXiv 2020
- CO-problem: JSSP
- Link: arXiv
A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing
- Publication: AAAI 2020 (Workshop)
- CO-problem: 3D-BPP
- Link: arXiv
A Reinforcement Learning Approach to Job-shop Scheduling
- Publication: IJCAI 2020
- CO-problem: JSSP
- Link: paper
A Reinforcement Learning Environment For Job-Shop Scheduling
- Publication: arXiv 2021
- CO-problem: JSSP
- Link: arXiv
Improving Optimization Bounds Using Machine Learning ~ Decision Diagrams Meet Deep Reinforcement Learning
- Publication: AAAI 2019
- CO-problem: MIS, MAXCUT
- Link: arXiv
Online 3D Bin Packing with Constrained Deep Reinforcement Learning
- Publication: AAAI 2021
- CO-problem: 3D-BPP
- Link: arXiv
A Data-Driven Approach for Multi-level Packing Problems in Manufacturing Industry
- Publication: KDD 2019
- CO-problem: 3D-BPP
- Link: ACM DL
Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method
- Publication: arXiv 2017
- CO-problem: 3D-BPP
- Link: arXiv
Meta-Learning-based Deep Reinforcement Learning for Multiobjective Optimization Problems
- Publication: arXiv 2017
- CO-problem: TSP
- Link: arXiv
Dynamic Job-Shop Scheduling Using Reinforcement Learning Agents
- Publication: ROBOT AUTON SYST 2000
- CO-problem: JSSP
- Link: paper