Playing with ChatGPT

Me: Write a parallel program using C and OpenMP to search an array of integers for a given integer.

ChatGPT: Here is an example of a parallel program in C that uses OpenMP to search an array of integers for a given integer:

This program uses the #pragma omp parallel for directive to parallelize the for loop that iterates through the array. This directive tells the OpenMP runtime to create a team of threads and divide the loop iterations among them. Each thread will execute the loop body with its own private copy of the i variable.

When the loop finishes, the value of found is -1 if the value was not found in the array, otherwise it will be the index of the first occurrence of the value in the array.

It is important to note that when you are using OpenMP, the order of the elements in the array may change and the program may not find the first occurrence of the value.

Me: Your solution has a data race involving variable found. Try to correct it.

ChatGPT: You are correct, the code I provided has a data race involving the found variable. Here is a corrected version of the program that avoids the data race:

In this version, I’ve added the #pragma omp critical directive to the block of code that updates the found variable. This directive creates a critical section, which is a block of code that can be executed by only one thread at a time. By enclosing the block of code that updates the found variable in a critical section, we ensure that only one thread can update the variable at a time, which eliminates the data race.

The #pragma omp critical directive is necessary to ensure that the variable found is updated atomically, meaning that no other thread will see an intermediate value.

It is important to note that the critical section may cause a serialization of the execution and affect the performance, so be careful when using it.

VerifyThis 2023: Call for Problems

Deadline: February 1st, 2023

VerifyThis is an annual program verification competition held as part of ETAPS. It is an opportunity for community members and tool authors to showcase their work and learn from each other with hands-on exercises.

The competition proceeds in three rounds. In each round, participants are given 90 minutes to implement and prove specified properties of a given algorithm and/or data structures. They are free to use any verification tools they choose.

Xavier Denis and I are co-chairing VerifyThis 2023. We are currently looking for problem submissions. If you have recently encountered an interesting challenge in your work, don’t hesitate to submit it. Typical challenges have clear input-output specifications and often incorporate one or more of the following: heap allocation, concurrency, arithmetic reasoning. A challenge usually describes a problem using natural language together with some pseudocode, and then provides a list of properties or “verification tasks” of varied levels of difficulty. Contributors are encouraged to look at the Archive of previous problems.

An award will be given for any submission used in the competition.

To avoid spoiling the competition for others, we ask that you keep the subject of your submission private.

Submissions should be sent by email to xldenis@lri.fr and siegel@udel.edu.

Submission Criteria:

  • A brief yet precise problem description, specifically identifying verification sub-tasks.
  • A solution to the challenge is strongly encouraged, otherwise please provide a sketch of correctness.
  • The description document can use any reasonable format, including plain text or PDF.

Verification of a Barrier Algorithm, part 1

The concurrency flag, aka binary semaphore, is the most primitive synchronization structure. It is the basic concurrency building block, from which all manner of more complex synchronization algorithms can be implemented.

A concurrency flag has two states: up or down. It supports two atomic operations: lower and raise. The lower operation is enabled only when the flag is up, in which case it changes the state to down. If lower is invoked when the flag is down, it blocks until the flag is up. The raise operation is only defined when the flag is down, in which case it changes the state to up. It is erroneous to attempt to raise the flag when it is already up. Note the asymmetry: it is never erroneous to invoke lower — the operation will simply block until the flag can be lowered. Raise never blocks, but it should only be invoked when the flag is down.

(The concept of semaphore in concurrency theory is due to Edsker Dijkstra. He used the letter V to represent the raise operation, and P for the lower operation. V stands for verhogen, Dutch for “to raise”. P may stand for passering — “passing” — or probeer te verlagen — “try to decrease”.)

This post looks at the problem of designing a barrier protocol using concurrency flags. A barrier is a synchronization operation involving a set of threads. The basic property of a barrier is that no thread shall leave the barrier until all threads in the set have entered the barrier. A second desirable property is no unnecessary delay — once all threads have entered the barrier, all threads should be able to depart without unnecessary blocking. A third desirable property is reusability — the barrier should be able to be invoked multiple times in a program and work correctly.

It is easy to design a barrier with the first property but not the second: never let any thread leave the barrier. We will see it is actually tricky to design a barrier with all three properties.

The case of two threads is a good place to start. We can implement a 2-thread barrier using two flags, f1 and f2, both initially down. The protocol is

The first operation that will execute in this protocol is thread 1’s raise f1. After that, thread 1 must block, as f2 is down, but thread2 can now lower f1. Thread 1 is still blocked, so thread 2 raises f2 and now thread 2 may depart the barrier. Thread 1 may now lower f2 and depart.

The basic barrier property holds: if thread 1 arrives first to the barrier, it raises f1 but then blocks at the lower f2 and cannot proceed until thread 2 arrives and eventually raises f2. If thread 2 arrives at the barrier first, it blocks, at the lower f1. The property of no unnecessary delay is admittedly vague, but at a minimum it requires that once every thread arrives, eventually every thread should be able to leave, and that is the case for this protocol. The third property, reusability, also holds — I’ll hold off on a proof of that, but for now observe that at least the flags are returned to their original state after both threads leave the barrier.

It is helpful to think of “raise” as sending a signal, and “lower” as receiving that signal. Then we can view the protocol as follows:

First thread 1 sends a signal to thread 2. Thread 2 receives the signal and then sends a signal to thread 1. Thread 1 receives the signal.

We can obtain a more rigorous analysis of the protocol by examining its state space. The state space is a directed graph in which the nodes are program states, and an edge represents an atomic transition from one state to another. A state encodes the values of all variables in the program, including the location (program counter) of each thread, and the state of each flag. The reachable state space is the subgraph consisting of the states and transitions that are reachable from the initial state.

Let’s put our 2-thread barrier in a loop, to analyze reusability, and also label the two statements in each thread to provide values for the location:

When Thread 1’s location is 0, its next action will be the raise operation, after which it moves to location 1. From location 1 (assuming f2 is up), Thread 1 lowers f2 and moves to location 0, etc. A state of this program may be represented as a tuple [pc1, pc2, f1, f2], where pc1 and pc2 are locations in {0,1}, and f1 and f2 are in {U,D} (U=up, D=down). The initial state is [0,0,D,D]. From that state, only one transition is enabled, Thread 1’s raise of f1, and the resulting state is [1,0,U,D]. Continuing in this way, we obtain the following reachable state space:

The reachable state space shows us that the barrier program is deterministic: at each reachable state, there is one and only one enabled transition. Furthermore, we may interpret location 0 as being “outside” the barrier and location 1 as being “inside” the barrier. The reachable state space also shows that no thread will leave the barrier until both have entered: the leaving transitions are the two transitions that change a 1 to a 0; the entering transitions change a 0 to a 1. The only path from the initial state proceeds with two enters, followed by two exits, repeated forever. This also shows there is no unnecessary delay (or at least all threads eventually leave the barrier), and the barrier is reusable, since we are doing all of this inside a loop. Incidentally we can also check that flags are used correctly, i.e., there is never an attempt to raise a flag that is already up.

One of the first times I showed this 2-flag barrier in a class, a student asked whether both threads could raise first, then lower. I thought about it. The two raises could occur in either order — fine, you end up at the same state in either case. Then the two lowers could happen in either order — also fine. And neither lower could occur without the raise in the other thread occurring first, guaranteeing the basic barrier property. So I said it looked right, but I would check. I went back to the office and thought about it. The program would become:

Let’s do the same state-enumeration analysis. This time the reachable state space looks like this:

The problem occurs from a state like [0,1,U,D]. From this state, Thread 2 could lower f1 and return to location 0, but also Thread 1 could attempt to raise f1 when f1 is already up, leading to the ERRor state. In fact, this barrier works correctly once, but it is not reusable: in the erroneous path, Thread 1 races to the second iteration of the loop and attempts to raise f1, while Thread 2 has not yet completed its lower operation in the first iteration. This is the kind of subtle interleaving-dependent issue that makes concurrent programming hard (and fun).

Now let’s see how we can automate this analysis. I will use CIVL, but any standard model checker will do. Here is a CIVL-C model of the first (correct) barrier program:

A flag is just a Boolean variable. The restriction on raise is checked by an assertion. The “when” statement in function lower causes the caller to block until the condition in parentheses is true, then executes the body of the statement; the check that the condition is true and execution of *f = 0 occur atomically, i.e., no other thread executes between those two points.

Running CIVL on this program yields:

The key part of the output is “The standard properties hold for all executions”, indicating the assertion is never violated. Let’s do the same for the erroneous protocol:

Checking…

The output indicates the assertion is violated and shows that process 1 (i.e., Thread 1) is in the “raise” call while process 2 is just before the lower. We can see more details, including a blow-by-blow account of the execution that arrived at that point, as follows:

That, in a nutshell, is how you can use a model checker to verify, or find a bug in, a concurrent algorithm. And if you are a teacher looking for examples of simple “truthy” concurrent protocols with subtle bugs, you can’t do much better than this erroneous 2-thread barrier.

CIVL Verification of Lock-free Hash Set

This problem was part of the VerifyThis 2022 verification competition. The competition took place on April 2, 2022, in München, Germany, as part of the ETAPS conference. The competition consists of three challenges. Each challenge describes some algorithms, including some pseudo-code, and a list of properties to verify. Participants use any platform of their choice to implement, specify, and verify the algorithms. Each challenge lasts one hour and 50 minutes.

The problem description is here. The problem is inspired by the “world’s simplest lock-free hash table”, from Jeff Preshing’s blog. The problem describes a lock-free concurrent hash set that supports adding an element and membership test. The algorithms use an atomic compare-and-swap operation.

Here are the properties to prove:

  1. empty(n) creates an empty set with capacity n
  2. member(k) == true, if insert(k) has been executed before (and returned true)
  3. member(k) == false, if no insert(k) that returned true can have been executed
  4. termination
  5. every key is contained in the table at most once
  6. if insert returns false, the table is full

Here is the CIVL solution I came up with during the competition:

The CIVL code follows the C++-like pseudocode given in the problem description very closely. I used CIVL to verify certain properties involving two concurrently executing threads with an upper bound (N) on the capacity of the hash set. Each thread chooses two keys nondeterministically and inserts those keys into the set. Then membership tests are performed. The output (excerpted):

An Alloy Model for Derivations by Natural Deduction

Alloy is a “lightweight formal methods” tool that has been used primarily to explore abstract software designs. “Alloy” refers to both a logic (which combines elements of first order logic and relational calculus) and a tool for analyzing models specified in that logic. The language is in fact quite general and people have found a wide range of applications of the tool — I’m sure many that the original developers (led by Daniel Jackson) never envisioned.

In this post I will show an Alloy model I made for my course Logic in Computer Science. We spend a good bit of time in the course doing derivations (formal proofs) by Natural Deduction (ND), first for Propositional Logic (PL), then for First Order Logic. Here are the rules of ND for PL as presented in Rigorous Software Development by Almeida, Frade, Pinto and de Sousa, Springer, 2011:

A judgment in this logic is a sequent of the form \(\Gamma\vdash A\), where \(\Gamma\) is a set of (propositional) formulas and \(A\) is a formula. The rules specify when you can conclude new judgments from old. For example, rule (Ax) says that one can always conclude \(\Gamma\cup\{A\}\vdash A\) for any set \(\Gamma\) and formula \(A\) (\(\Gamma, A\) being shorthand for the set \(\Gamma\cup\{A\}\)). Rule \((\textrm{I}_\neg)\) says that if one has derived \(\Gamma,A\vdash\bot\) then one may conclude \(\Gamma\vdash\neg A\). Here is a derivation of the “law of the excluded middle”, \(\emptyset\vdash p\vee\neg p\):

\begin{array}{rrcll} 0. & \neg(p\vee\neg p), \neg p & \vdash & \neg p & (\textrm{Ax}) \\ 1. & \neg(p\vee\neg p), \neg p & \vdash & p\vee\neg p & (\textrm{I}_{\vee2})0\\ 2. & \neg(p\vee\neg p), \neg p & \vdash & \neg(p\vee\neg p) & (\textrm{Ax})\\ 3. & \neg(p\vee\neg p), \neg p & \vdash & \bot & (\textrm{E}_\neg)1,2\\ 4. & \neg(p\vee\neg p) & \vdash & p & (\textrm{RAA})3\\ 5. & \neg(p\vee\neg p) & \vdash & \neg(p\vee\neg p) & (\textrm{Ax})\\ 6. & \neg(p\vee\neg p) & \vdash & p\vee\neg p & (\textrm{I}_{\vee1})4\\ 7. & \neg(p\vee\neg p) & \vdash & \bot & (\textrm{E}_\neg)6,5\\ 8. & & \vdash & p\vee\neg p & (\textrm{RAA})7 \end{array}

The following Alloy model specifies an ND derivation for PL. Using this model, the Alloy Analyzer can automatically find a derivation for a given sequent. It is a good demonstration of the expressiveness of the Alloy logic. In 218 lines (including comments), the syntax of PL, derivations, and the rules of ND are fully specified, as well as several examples:

The model defines the syntax for a propositional logic formula, and then defines the structure of a derivation, which is a recursive (tree-like) structure. Each node is the application of one of 12 rules, has some number (\(\leq 3\)) of children (premises, or sub-derivations), and a conclusion, which is a sequent of the form \(S\vdash f\), where \(S\) is a set of formulas (the antecedents) and \(f\) is a formula (the consequent). Then there is a series of examples where I describe the desired sequent and ask Alloy to find a derivation of it.

From the Alloy Analyzer menu, if we select to execute the “run LEM…” command, the result, after a few seconds, is an instance of the model. The instance can be viewed in Alloy’s Table view mode. The first table shows the Formulas. In addition to the atomic formulas False (\(\bot\)) and Prop0 (\(p\)), the following three compound formulas appear:

These represent, in order: \(\neg(p\vee\neg p)\), \(\neg p\), and \(p\vee\neg p\). Now here is the derivation, exactly as it appears in Alloy’s Table view:

which is exactly the derivation one would write by hand and is shown above.

Notice that we never had to specify any algorithm to find derivations. Rather, we just had to tell Alloy what a derivation looks like, and Alloy—and the PLingeling SAT solver—do the hard work of finding something matching our description.