# ICFP / FSCD day 1 – rough notes

(Blog posts for Day 1, Day 2, Day 3, Day 4 (half day))

I decided to take electronic notes at ICFP and FSCD (colocated) this year, and following the example of various people who put their conference notes online (which I’ve found useful), I thought I would attempt the same. However, there is a big caveat: my notes are going to be partial and may be incorrect; my apologies to the speakers for any mistakes.

(ICFP keynote #1) Computational Creativity, Chris Martens (slides)

In the early days, automated theorem proving by computers was seen as an AI activity. Theorem proving, program synthesis, and AI planning are search procedures but can be viewed as creative procedures.

Linear logic is useful for describing plots (story telling).

There is a difference between what an AI opponent and AI cooperator should do: an opponent need not make intelligible moves, but a cooperator needs to act in a way that is intelligible/understandable to the human player.

Applied Grice’s maxims of conversation to build an “intentional” cooperative player for the Hanabi game. Grice’s maxims are: quantity (not too much or little), quality (tell the truth), relation (be relevant), manner (avoid ambiguity).

Theory of mind: forming mental models of other people’s mental models.

Dynamic epistemic logic, has two indexed modalities: $\Box_a A$   (agent ‘a’ believes A) $[\alpha] A$   (A holds true after action $\alpha$).

Actions are defined inductively: $\begin{array}{rll} \alpha, \beta & = \\ & \textit{flip} \; p & \text{(change truth value)} \\ \mid & ?A & \text{(precondition)} \\ \mid & \alpha + \beta & \text{(non deterministic choice)} \\ \mid & \alpha ; \beta & \text{(sequence)} \\ \mid & \alpha^a & \text{(appearance to agent 'a')} \\ \mid & \alpha* & \text{(public action)} \end{array}$

Semantics is based on a possible worlds formulation (current actions change future possible worlds). Ostari (Eger & Martens) is a full epistemic logic language that can capture lots of different games with different kinds of knowledge between players.

Key point: use compositionally (of various kinds) as a design principle

Three research challenges:

• Dynamic logics with functional action languages
• Constructive/lambda-calculable DEL
• Epistemic session types

(FSCD keynote #1) Brzozowski Goes Concurrent – Alexandra Silva

Ongoing project CoNeCo (Concurrency, Networks, and Coinduction)

SDN – Software Defined Network architectures let you write a program for the network controller. Requires languages for the controller to interact with the underlying switches / routers. Goals of new network PL: raise the level of abstraction beyond the underlying OpenFlow API– make it easier to reason about. Based on Kleene algebra.

NetKat – based on regular expression (Kleene algebra) with tests (KAT) + additional specialized constructions relating to networking. This is compiled to OpenFlow.

Kleene algebra, e.g., $(0 + 1(01^*0)^*1)^*$ – multiples of 3 in binary.

For reasoning about imperative programs, this can be extended with the idea of ‘tests’ (Kozen). Need to capture more of the control flow graph, but there are guards (tests). The solution is to split the underlying alphabet of the Kleene algebra into two: one for actions and one for tests: combine a Kleene algebra and a Boolean algebra (where there is negation), e.g. $\bar{b}$ is the negation of b. Then the three standard control flow constructs of an imperative language are defined: $\begin{array}{rl} p ; q & = pq \\ \textbf{if} \; b \; \textbf{then} \; p \; \textbf{else} \; q & = bp + \bar{b}q \\ \textbf{while} \; b \; \textbf{do} \; p & = (bp)^*\bar{b} \end{array}$

Subsumes Hoare logic, i.e.: $b \{p\} c \Leftrightarrow b\,p\,\bar{c} = 0$

Decidable in PSPACE.

What is the minimum number of constructs needed to be added to KAT to deal with packet flow?

A packet is an assignment of constant values n to fields x a packet history is a nonempty sequence of packets. Include assignments (field <- value) and tests on fields (filed = value). e.g.,

sw = 6; pt = 88; dest <- 10.0.0.1; pt <- 50


For all packets incoming on port 88 of switch 6 set the destination IP to 10.0.0.1 and send the packet to port 50.

Can then reason about reachability (can two hosts communicate)? Security (does all untrusted traffic pass through an intrusion detection)? loop detection (packet to be stuck in a forwarding cycle).

Since networks are non-deterministic, this has been adapted to probabilistic NetKAT.

Can we add concurrency to NetKAT (deal with interaction between different components): within this paradigm, can we add concurrency to Kleene algebra? Concurrent Kleene algebra adds a parallel composition || (cf. Hoare 2011). Would like to reason about weak memory model like behaviour (cf. Sewell’s litmus tests). There were developments in concurrent Kleene algebra, but a solid foundations was not yet available.

Have proved a Kleene theorems: equivalence between Kleene algebra and finite automata. In this case, the corresponding automata to a concurrent Kleene algebra is a pomset automata.

Can make Concurrent Kleene algebras into regular subsets of pomsets (useful model of concurrency, cf Gischer) i.e., $[-] : 2^{\textsf{Pom}_{\Sigma}}$.

Pomset automata- has the idea of splitting into two threads (two paths) that once they reach their accept states, then go onto the next state of the originating state (can be expressed as multiple automata). This reminds me of Milner’s interleaving semantics (for atomic actions) in CCS. Alphabet of the automata is a pomset over some other alphabet of actions.

Showed the (Brzozowski) construction to build an automata from a Concurrent Kleene algebra term, and then how to go back from automata to expressions.

The exchange law captures the interaction of sequential and parallel interaction. Currently working on extending the Kleene theorem in this work to include the exchange law.

(ICFP) Faster Coroutine Pipelines, Mike Spivey

(I missed the start and had some trouble catching up).

Coroutine pipelines are a way to structure stream-processing programs. There are libraries for doing this that are based on a term encoding that is then interpreted which suffers serious slow-down when there is process nesting. This paper proposes an alternate approaching using continuation which doesn’t suffer these slow down problems.

The direct style can be mapped to the CPS style which is a monad morphism between two Pipe monads (capturing the coroutine pipelines). To show that the two models are equivalent could be done with logical relations (Mike recommended Relating models of backtracking by Wand and Vaillancourt (2004)).

Termination. Consider a situation: $p1 \mid\mid ((p2 \mid\mid p3 \mid\mid p4) ; q) \mid\mid p5$
where $p3$ then terminates.

Need to end up with $p1 \mid\mid q \mid\mid p5$ not $p1 \mid\mid ((p2 \mid\mid p4) ; q) \mid\mid p5$. (I wasn’t sure if this was an open question or whether there was a solution in this paper).

Ploeg and Kiselyov’s paper at Haskell 2014 but this didn’t get asked.

(ICFP) A pretty but not greedy printer (Functional pearl), Jean-Philippe Bernardy

Hughes proposed a pretty printing approach, which was improved upon by Wadler. For example, if you want to pretty print some SExpr but with a maximum column width: when do you spill onto the next line?

Proposes three laws of pretty-printing:
1). Do not print beyond the right margin
2). Do not reveal the structure of the input
3). Use as few lines as possible.

(Hughes breaks 3 and Wadler breaks 2 (I didn’t spot how in the talk)).
These three rules are not compatible with a greedy approach.

A small API is provided to write pretty printers (four combinators:
text, flush, (horizontal composition), ).

Phil Wadler asked how you represent hang in the library. JP quickly typed up the solution:
hang x y = (x text " " y) (x  (text " " y))

Phil countered that he could do the same, but JP answered that it didn’t have the expected behaviour. I’m not entirely sure what hang does, but in the Hughes-based library (https://hackage.haskell.org/package/pretty-1.1.3.5/docs/Text-PrettyPrint.html), hang :: Doc -> Int -> Doc -> Docwith behaviour like:

Prelude Text.PrettyPrint> hang (text "Foo\nBar") 0 (text "Bar")
Foo
Bar Bar


(not sure what the extra Int argument is for).

(ICFP) Generic Functional Parallel Algorithms: Scan and FFT – Conal Elliott

Arrays are the dominant type for parallel programming, but FP uses a variety of data types. Generic programming decomposes data types into fundamental building blocks (sums, products, composition).

Perfect trees can be represented as a n-wise composition of a functor $h$.

Prefix sum (left scan) $b_k = \Sigma_{1 \leq i \leq k} a_i$
for $k = 1, ..., n+1$, e.g., scan [1,2,3,4] = [1,3,6,10]
Define more generally as:

class Functor f => LScan f where
lscan :: Monoid a => f a -> f a x a

Instances were then given for the components of a datatype generic representation, (constant types, type variable, sum, product, composition) which gives data type generic scan. (Insight: Do left scans on left vectors, not left scans on right vectors! The latter adds a lot of overhead).

Data types were describe numerically, e.g., 2 is a pair, $\stackrel{\leftarrow}{4}$ is a 4-vector.

Can do the esame with Discrete Fourier Transform. A nice property is that a
DFT can be factored into separate parts (Johnson 2010, e.g., a 1D DFT of size N = N1*N2 is equivalent to a 2D DFT of size N1xN2– I couldn’t see this so well on the slide as I was in the overflow room).

(ICFP) A Unified Approach to Solving Seven Programming Problems (Functional Pearl) – William E. Byrd, Michael Ballantyne, Gregory Rosenblatt, Matthew Might

Started off referencing http://matt.might.net/articles/i-love-you-in-racket/ – 99 ways to say “I love you” in Racket (all based on different list operations). But what if you wanted more, for some other language? How could you generate this, to say get 1000, or 10000 such examples? The solution put forward here is to using miniKanren, the constraint logic language. The eval function is turned into a relation then can be “run” in reverse, (i.e., run 99 (q) (evalo q '(I love you)) gives 99 S-expressions that evaluate to ​(I love you)). What about for generating quines (programs that evaluate to themselves)? (run 1 (e) (evalo e e)) produced just the 0 expression and getting three results was just 0#t and #f but with 4 it produced a more interesting quine.

Then, William showed (run n (p q) (eval p q) (evalo q p) (/= q p) generating “twines” (pair of mutually producing programs). This was then applied to a proof checker to turn it into an automated theorem prover! The same technology was used then for program synthesis (although some additional work was needed under the hood to make it fast enough).

System F (Girard ’72): Two-sorted, types and terms, based on the presentation of System F in Harper’13. Meanwhile, study of CC lead to Pure Type Systems- System F appeared in the Lambda Cube at the corner $\lambda2$. This work shows the formal relation (that indeed these two systems are equivalent). A proof is partially given in Geuvers ’93, but a full Coq formalisation was given by Kaiser, Tebbi and Smolka at POPL 2017. In this work, the proof is replayed across three tools to provide a useful comparison of the three systems: Coq, Abella and Beluga.

A complication is that the syntax is non-uniform on the System F side: $\Pi x : a . b$ in PTS can correspond to $A \rightarrow B$ and $\forall X . B$. Have to also keep track of context assumptions. One (of many) “topics of interest” in this comparison is how to manage contexts (which is interesting to me as I am quite interested in CMTT in Beluga).

The approaches were:

• Coq – first-order de Bruijn indices, parallel substitutions (Autosubst library), and invariants (e.g, a context can be extended with a new term variable). Inductive predicate for typing judgments. Traversal of binders requires context adjustments.
• Abella – HOAS, nominal quantification (fresh names handled internally), relation proof search [Abella has two layers: specification and logic]. Contexts represented by lists (as in Coq). A compound inductive predicate is define to relate the different context representations (and to keep them all consistent).
• Beluga – HOAS, 1st class contexts (via Contextual Modal Type Theory) and context schemas. Object K (e.g., terms, types, deriviations) are alway paired with a 1-st class context $\Gamma$ that gives it meaning $[\Gamma \vdash K]$. There is no concept of free variable., e.g., in Coq $0 \vdash 0_{ty} \rightarrow 0_{ty} \textsf{ty} \Rightarrow \bot$ is provable but in Belluga $\bullet \vdash 0$ is not even well formed (accessing the first variable 0 (think de Bruijns) in an empty context). Context schemas provide a way to give the structure of the typed contexts (as a dependent record).

They created a new language, Adaptive Fuzz for adaptive differential privacy (DP), uses Fuzz type checking to verify individual pieces in a piecewise static way.

DFuzz is a dependently typed version of Fuzz, can we use this? Privacy cost dependent on feature selection: which could depend on values in the databases: but this is practically useless as we don’t know what is in the database. We can still get static guarantees if it is done piece-wise in two modes: data mode (access to sensitive data, uses the Fuzz type checker to provide DP proofs) and adaptive mode (computation between pieces). End up with “light” dependent types.

These two modes have the same syntax though (code it one works in the other). Here is an example type for a function that scales one number by the other (multiplication by abs): $\textsf{scale} \; (c : \mathbb{R}) \; (n : [\textsf{abs} \; c] \mathbb{R}) : \mathbb{R}$

(scale the second number by the first argument). In adaptive mode the annotations are ignored, in data mode the first argument is forced to be a constant (partial evaluation happens under the hood so we don’t need dependent types really).

General scheme is: analyst (program) writes data mode code, it is partially evaluated and the Fuzz type checker works out a cost, which is the compared against a privacy filter (where/how is this defined?) Can write a gradient descent algorithm with different levels of randomness provided by different privacy budgets (how much information do we want to reveal at the end, e.g., if budget is infinite then you get not noise in the result). (another example in the paper is feature selection).

This seems like a really interesting mix of a static and semi-static approach (via partial evaluation). I wonder how this relates to fine-grained cost-tracking type systems?

(ICFP) – Symbolic Conditioning of Arrays in Probabilistic Programs –
Praveen Narayanan, Chung-chieh Shan

(tag line: Bayesian inference via program transformation).
Say you are benchmarking some iterative code, which you model by $a n + b$ for some startup cost $b$ and $n$ iterations and $a$ a cost for the loop. You might then collect some results and do linear regression (best fist line) to see if this model is confirmed. Bayesian linear regression gives you many lines for different kinds of fit.

1. Write a generative model, e.g., some normally distributed $n$ to generate a set of possible lines (normally distributed gradients), to which we maybe then add some additional normally distributed noise for each point.  (k measurements)
2. Observe running times
3. Infer distribution of lines. Delete those candidates from the generative model that weren’t observed (from step 2), keep the ones that were.

Bayesian inference is compositional: you can build it from different components, by treating distributions as programs (e.g., a program normal (a * n_1 + b) is gives us one of the lines in the generative model) and then transforming the program. A program transformation implements the model-theoretic notion of “disintegration” but the existing approach for this is problematic (I wasn’t quite clear on how). A new transformation approach is proposed which provides a smaller output program via some additional combinators which replace an unrolled set of $k$ observations into a replication like combination (plate).

Definitional interpreters: programs that take ASTs and run them, giving an executable semantics. Key challenge: a parameterised interpreter that recovers both concrete and abstract semantics.

A concrete interpreter for the lambda calculus: exp -> m (val) where m is a monad which (at least) captures the environment and the store (which is used for representing the variable environment). The definition is written non-recursive “unfixed” (unrolled), which can be fixed (e.g., apply to Y combinator) later to run the interpreter. By leaving it unfixed, we can interpose different behaviour in the recursive cases.

To make the interpreter abstract, abstract the primitive operations and types, e.g., integers are extended with an ‘abstract’ value, i.e., $\mathbb{Z} \cup \{\texttt{'N}\}$ where 'N represents the abstract integer. The semantics is then main non-deterministic so that an iszero? predicate on 'N splits into two execution traces. But its partial (could run forever).

How to make it total (always terminating). Look for states that it has already seen before, sufficient for termination but unsound for abstraction. If you observe fact 'N then one branch fails (non-terminating as visit previous state) and the 1 branch succeeds, so the result is just 1 (not ideal). Instead, look in a “cache” of precomputed values to return something like [[fact 'N]] = {1} U {'N x \$[fact 'N]}. Intercept recursion points to evaluate through a cache and to stop when a previously computed value is hit (sounds space intensive?).

(ICFP) – On the Expressive Power of User-Defined Effects: Effect Handlers, Monadic Reflection, Delimited Control – Yannick Forster, Ohad Kammar, Sam Lindley, Matija Pretnar

Say you want to write a program like:

toggle = { x <- get!
y <- not! x
put! y
x }

If we don’t have effects, then we can do explicit state passing, but you have to do a full rewrite. Ideally want only local transformations. (Ohad showed this in three different styles, see below).

Relative expressiveness in language design, compare/contrast: (1 – Eff) algebraic effects and handlers (2 – Mon) monads (3 – Del) delimited control. Did this by extending CBPV in these three directions (and formalised this in Abella) and defining macro translations between every pair of these extended languages. Expressivity is stated as formal translations between calculi. Then this was considering in the typed setting. Interestingly there is no typed macro translation from Eff to Del nor from Eff to Mon.

[There is a large design space which yields lots of different languages, this study is a start. Inexpressivity is brittle: adding other language features can change the result.]

In the Eff version, we give standard state handlers to implement get and put. Its type contains the effect-system like information State = {get : 1 -> bit, put : bit -> 1}where toggle : U_State F bit.

In the Mon version, monadic reflection is used with a state monad.
(Q: put was of type U (bit -> F) but I thought the monad would be T = U F?)

(Q: In the translation from Mon -> Eff, how do generate the effect type if the source program only uses put or get ? I guess it just maps to the set of all effect operations).

(Q: the talked showed just simple state effects; are the results shown for any other kind of effect).

(ICFP) – Imperative Functional Programs That Explain Their WorkWilmer Ricciotti, Jan Stolarek, Roly Perera, James Cheney

https://github.com/jstolarek/slicer

Slicing can be used to explain particular (parts of) a program output: which parts of the source code contributed to a value. Extend TML (Transparent ML) to Imperative TML.

Construct backwards and forward slicing as a Galois connection (between two lattices). The lattices are based on the idea of definedness: where ‘holes’ make programs more undefined, e.g., $1 + 2$ is above $1 + \Box$ and $\box + 2$ (partial expressions). Forward slicing preserves meets in the lattice; backward slicing should be consistent and minimal with respect to forward slicing (I missed what this meant exactly, but I talked to Janek later to clarify: consistency is $x \leq \textsf{fwd}(\textsf{bwd}(x))$). The idea then is that we have $\textsf{bwd} : values_\Box \rightarrow expr_\Box$ and $\textsf{fwd} : expr_\Box \rightarrow values_\Box$ which form a Galois connection, i.e., $\textsf{bwd}(x) \leq y \Leftrightarrow x \leq \textsf{fwd}(y)$. [Given one of the slicings we don’t necessarily get the other for free].

Given a small program l1 := 0; (!l1, !l2) a partial version of it $\Box$; (!l1, !l) and an initial store [l1 -> 1, l2 -> 2] should forward slice to ( $\Box$, 2) by propagating the “hole” through the store.

(ICFP) Effect-Driven QuickChecking of Compilers – Jan Midtgaard, Mathias Nygaard Justesen, Patrick Kasting, Flemming Nielson, Hanne Riis Nielson

They made a generator for writing OCaml programs: useful for automated testing, e.g., comparing two compilers against each other (compile -> run -> diff, any difference will be suspicious).

Generated a program let k = (let i = print_newline () in fun q -> fun i -> "") () in 0 which found a bug: different behaviour between the native code generator and the bytecode generator due to effects being delayed (indefinitely). Another example was found (related), due to effects being removed.

Program generation, how to do it:

• Try: generate arbitrary strings?! Most won’t lex or parse).
• Instead, follow the grammar (see Celento et al ’80). Most won’t type check!
• Instead, follow the simply-typed lambda calculus (Palka et al. ’11) by bottom-up reading of a typing relation. This will make it through the type checker (I guess you can generalise the type rules).

Observed behaviour in OCaml depends on evaluation order (when effects are involved). OCaml bytecode uses right-to-left but sometimes native code backend uses left-to-right (hm!) But the specification is itself a bit loose. Can we avoid generating such programs since they are not well specified for anyway?

• Instead, following a type-and-effect systems which has one bit for marking pure/effectful (boolean lattice effect system). Extend to a pair of bits, where the second bit is order dependence (extend the effect algebra a bit more). Our goal is then a program which may have effects but may not have order-dependent effects.

Have an effect-dependent soundness theorem (that the effect bits correctly anticipate the effects).

Next, there is a type preserving shrinker to reduce huge ugly generated programs into a minimal (much nicer) example. e.g., replace a pure integer expression by a literal 0 and large literals by small literals, replace beta-redexes by let bindings, etc. Shrunk tests are checked whether they still show a difference.

A question was asked whether the effect-based mechanism could be used to rule-out other allowed difference, e.g., in floating point. Jan answered that floating point isn’t generated anyway, but an interesting idea.

Another question: how do you avoid generating non-terminating programs? Jan explained that since it is STLC (+ extension) only total terms are generated.

## 5 thoughts on “ICFP / FSCD day 1 – rough notes”

1. Janek Stolarek says:

> backward slicing should be consistent and minimal with respect to forward slicing (I missed what this meant exactly).

Happy to discuss tomorrow if you like.