Collection of random notes of mine while ramping up on the domain of SAT solvers.

SAT solvers

DNF - Disjunctive Normal Form. Eg: (x AND y) OR (y AND z)
NNF - Negative Normal Form. All negations are only in front of variables
CNF - Conjunctive Normal Form. Eg: (x OR y) AND (y OR z)
CNF formula is a set of clauses (conjunctive) which each clause being a set of literals (disjunctive)
- empty formula represents a true formula
- empty clause represents a false clause
- unit clause is a clause with just 1 literal
- variable is referred by its id
- positive id means the variable and negative means its negation
SAT is NP-complete but in practice (aka real world problems) we can do much better
Serial SAT algos
- complete algos (aka systematic solvers)
  - Typically based on backtracking
  - eg: DPLL, GRASP
  - computationally expensive
  - will return all satisfying assignments if they exist
  - else will give proof for unsatisfiability
- incomplete algos
  - eg: randomized algos, local search based, GSAT, WalkSAT
  - may or may not produce SAT/UNSAT
  - these local search strategies do often reach global minima!
Parallel algos
- partioning based
- competition based
DPLL
- complete solver based on backtracking
- most of the complete solvers are all mostly based on this algo
- if it is a unit clause then assign it to true
- repeat until no more unit clauses
- if a conflict is found, backtrack and assign a different value to that literal
GRASP
- improvement over DPLL with learning clauses to reduce the amount of backtracking
- uses CDCL - Conflict Driven Clause Learning
- helps narrows search space with DPLL
k-CNF = formulat with every clause having exactly 'k' literals
2-CNF formulae can be solved in poly-time, whereas 3-CNF is already NP-complete
to reduce memory usage most solvers nowadays are iterative rather than recursive
GSAT
- randomized local search technique
- selects assignments which most reduces the number of remaining clauses
- can also move sideways (aka plateaus) which doesn't reduce this at all!
WalkSAT
- interleaves greedy moves of GSAT with random walks of a Metropolis search
m = #of clauses, n = #of variables, for k-CNF problems, if clauses are generated randomly, then the hardness of these problems (aka number of DP calls) follows an easy-hard-easy pattern as a function of $$\frac{m}{n}$$. But when we don't consider fixed length clauses then there are no such clean indication of hardness!
the above was the main reason which motivated local search methods for SAT solvers
SP - Survey Propagation. Use of probabilistic reasoning for solving combinatorial search problems. But SP is currently limited only for random SAT instances!
Max-SAT
- Maximum satisfiability problem
- return the maximum number of clauses that can be satisfied by the assignments
- it is a generalization of the SAT problem
Nice (4hrs!) talk on implementing a SAT solver from scratch
- video: https://www.youtube.com/watch?v=II2RhzwYszQ
- repo: https://github.com/arminbiere/satch (It is a from-scratch serial solver based on CDCL with restarting mechanism)
- initial hour or so talks about the theory/background introducing popular solvers (eg: DPLL, CDCL)
- Stopped at 1:05:15
The Armin Biere's talk on SAT solving: https://www.youtube.com/watch?v=Emhg0uZnbNg
- Knuth himself acknowledged SAT solving to be the "killer app"!
- AIG - And-Inverter Gates. Logical gates built using only and and nor gates
- IPASIR model showing different states of a SAT solver: (UNKNOWN, SAT, UNSAT, SOLVING), for interactive solving of formulae
- In one of his videos, the CTO of Amazon spends 20mins talking about SMT solvers!
- SMT solvers completely rely on CDCL technique
- after constructing impllication graph, typically we would like to learn the clause out of its first UIP (Unique Implication Point) and then to recursive clause minimization
- in his satch codebase this recursive clause minimization code when expressed in recursive fashion is running faster than the iterative version!
Peek inside SAT solvers: https://www.youtube.com/watch?v=d76e4hV1iJY
- introduction to the SAT problem and CNF notation
- typical flow: high-level problem -> encode into SAT using a P-time algo -> use SAT solver
- introduces the DFS based search space to solving this brute-force
- but also notes that we are free to choose assignment as well as literal order in this DFS!
- shows the benefit of simplifying the clause evaluation through unit propagation
- thereby deriving DPLL (DFS + backtrack + unit-prop)
- zChaff solver
  - VSIDS - ranking the literals based on their occurence count and then using this to assign values in order to reach solution faster
  - two watched literals - have an index of pair of literals to reach its clauses faster, especially to be notified when we get unit-prop

CDCL - GRASP solver
- when conflicts arise, create an implication graph, separate the literals from the conflicts and then derive the clause that can help us not revisit this conflict during future traversals
- then depending on the clauses we could also backjump multiple levels!
- however one needs to consider the following while learning such clauses:
  - we simply can't keep learning all the clauses (space issues)
  - need to maintain these clauses in a clever manner
  - need to minimize these clauses using a technique called "resolution"
  - need to save the last assignment to variables called "phase saving"
restarts
- we are not clearing the learned clauses, but only restarting the DFS
- can help us get unstuck at times

SAT Benchmarks

Large SAT problems in CNF format: http://www.miroslav-velev.com/sat_benchmarks.html
low-to-medium sized SAT problems in DIMACS CNF format: https://www.cs.ubc.ca/~hoos/SATLIB/benchm.html

GPU SAT solver implementations

https://github.com/QuentinFiard/cuda-sat-solver - But doesn't seem to be calling any cuda kernels at all!?
https://github.com/nicolasprevot/GpuShareSat