We thank the reviewers for their work on our submission and their thoughtful comments. Our response is in two parts: discussion of high-level questions, then more reviewer-specific comments. The document is designed to have the most important points first, so that you can stop reading at any point.

High-level questions

First, we note why we think it would be nice to see this paper at FSCD.

FSCD is a conference that is dear to us (not least for its continuous Open Access policy, which is unfortunately a rarity in this area), and we think that this work could nicely support the general direction of reasonably broadening the original TLCA+RTA scope: our work justifies a real-world improvement in a real-world programming language through a simple but non-routine meta-theoretical development -- yet another argument in favor of rigorous formalism in programming language design.

Second, we think that there is an insight in this work that relates to the discussion with reviewer 2 on "declarativeness". Existing work (and earlier iterations of the present work) on expressing program analyses on the type system tend to propose judgments of the form

Γ ⊢ e : τ ▹ ϕ

Γ ⊢ e : τ ▹ e' : τ', ϕ

where Γ, e, τ are inputs, ϕ describes the analysis results, and (e' : τ') in the second formulation describes re-annotations of the inputs e, τ refined with the analysis results. Instead, we present our static system as

Γ ⊢ e : τ

and we use the freedom of deciding that Γ may be an output instead of an input (as reviewer 2 remarks, our system is uni-typed; if we wanted more types for better precision, Γ would contain both inputs and outputs) to get a system that expresses the same algorithmic content with a simpler, more declarative, presentation. Our meta-theoretic evaluation (proving soundness with respect to the operational semantics of [Nordlander, Carlsson and Gill, 2008]) validates the idea that this presentation is reasonably easy to reason about (note in particular how the subject-reduction proof is completely obvious to the input/output distinctions in the system), and our programming experience validates the idea that this is still conductive to an implementation (which was merged into the OCaml compiler implementation).

Let us now answer the salient comments and questions from our reviewers.

Bugs avoided in the OCaml compiler (review 1)

Reviewer 1 asks about OCaml compiler bugs that this work eliminates. We offer the following list:

#7231: let-rec well-formedness check too permissive with nested recursive bindings

(this issue is directly solved by our more modular/composable approach of a type system rather than a list of allowed/disallowed scenarios)
#7215: Unsoundness with GADTs and let rec

(This issue, discussed in §6.4, justifies analysing a typed representation of the language, rather than an intermediate representation that does not represent GADT type equalities.)
#4989: Compiler rejects recursive definitions of values

let rec f = let g = fun x -> f x in g

The other bugs in this list allow invalid programs that segfault. This bug is a case of lack of expressivity of the previous check; it is trivially solved by our type-system approach.
#6939: Segfault with improper use of let-rec and float arrays

Our §6.2 describes this issue

Besides fixing (at least) these bugs, checking recursive definitions at the typing phase rather than on an immediate representation made life much easier for tools that reuse the OCaml front-end, e.g.

MetaOCaml (which previously disallowed value recursion within quotations, as it could not check them)
Merlin (an IDE tool providing edit-time type checking, which did not report value recursion errors).

Declarative vs. Algorithmic presentations (review 2)

We agree with Reviewer 2 that the presentation is neither "fully declarative" nor "fully algorithmic", and that having separate declarative and algorithmic version would be nice. But we don't think this is reasonable within the constrained FSCD space budget. Our system takes one half-page figure; having two systems, plus explanations to relate them, would take another page, and we don't see any portion of the present article that is less important than having those two presentations instead of one reasonable compromise.

Also, in the details we respectfully disagree with the reviewer 2 on the placement of the boundary between "declarative" and "algorithmic", which is somewhat subjective. For example, we would defend the choice of using a multiplicative style, explicitly merging the contexts of separate subexpressions. We view this as being mindful about the respective contributions of subterms, rather than a redundancy-introducing algorithmic choice. In fact, reference [2] uses this same style for the application rule, which supports our intuition that it is a natural presentation choice for this family of type systems.

The one rule that we agree is objectively algorithmic in its presentation is the rule for mutual recursion:

(G_i, (x_j : m_i,j)^j |- t_i : Return)^i
(m_i,j <= Guard)^i,j
forall i, (G'_i = G_i + \Sum_j m_i,j[G'_j])
-------------------------------------------
(x_i : G'_i)^i |- rec (x_i = t_i)^i

Here, the G' are defined from the G as fixpoints of a system of equations, while it would be enough (less constrained) to specify, through inequalities, a correctness condition on a single family of contexts:

(G_i, (x_j : m_i,j)^j |- t_i : Return)^i
(m_i,j <= Guard)^i,j
forall i,j,  G_i <= m_i,j[G_j]
----------------------------------
(x_i : G_i)^i |- rec (x_i = t_i)^i

This formulation is more declarative, but we found it less intuitive / readable: it would have been the only rule where constraints on the context in a sub-goal would flow from the rule around it, instead of the other way around. So we made the conscious choice to be more algorithmic there.

As researchers we value declarative presentations because they are (1) easier for humans to understand and (2) (often) easier to reason about in the meta-theory. This particular type system is currently never shown to human users of OCaml (the check either passes or reports an error for the offending variable), and its meta-theory analysis is done. Finally, our reference implementation serves as the ultimate algorithmic presentation. These explain why exploring the declarative/algorithmic design continuum was not a priority in our presentation of the work.

Related work on modal type systems (review 2)

We thank reviewer 2 for the extra references (we only knew of [1], [3] and work related to [4] by the same authors) and will try to extend the Related Work section as suggested.

"apply [our technique to other systems - type parameter variance"

Yes, and there is related literature, you may start looking at [1,2,5].

By "our technique" we mean "presenting a backward-analysis algorithm as a right-to-left type system". [1], [2] and [3] demonstrate the flexibility of modal type systems (in complement to the work on Nakano-style modal calculi), but they do not propose this technique in the context of a program analysis.

Relatedly, [4] structures the evidence of a backward analysis as a modal type system, which is related, but it does not use a "right-to-left type system" presentation, it returns the result of the analysis as a "demand" on the right of the judgment. As mentioned earlier, we propose to represent this syntactically as just the context of a type judgment. (This is a presentation technique rather than a deep difference in what is being computed, but we think that it does make a nice difference, and in a sense that's what is really "declarative" about our work.)

Comparison with call-by-need (review 2)

Shouldn't be there similarities with call-by-need? The concept of "needed redex" seems related to your redexes in forced evaluation contexts. There are differences as well, since you consider a term vicious not only when it is stuck...

Anyway this is just a hunch; a comment on the relation to call-by-need in the related work section would be useful.

Don Syme proposed a translation of value-letrec into lazy thunks for F#. Using a source-level call-by-need operational semantics was our first idea to specify the dynamics to compare our type system against, but it results in a more complex presentation than the one we adapted from [Nordlander, Carlsson and Gill, 2008].

In call-by-need calculi, there is also a store (typically for not-yet-evaluated terms), and indeed it has been presented as a global thunk store (as implemented in programming languages). But the nicer presentations for reasoning, starting with "The call-by-need lambda-calculus" by Ariola, Maraist, Odersky, Felleisen, Wadler, use local stores / explicit substitutions. We have local stores as well ("let rec"), not to delay computations (we are strictly call-by-value), but to represent cycles through the memory. Note that explicit substitutions are a common meta-theoretical technique in lambda-calculi that is not specific to call-by-need (call-by-need, being more complex to describe than call-by-{value,name}, tends to require fancier techniques).

The notion of "forcing context" does sound related to notions that occur in call-by-need, but it is also not uncommon in call-by-value settings when you want to precisely study errors. For example, our definition of Mismatch (using E[H] has a notion of forcing context) was inspired by the following work, agnostic to the evaluation strategy:

System F with Coercion Constraints
Julien Crétin and Didier Remy, 2014
http://gallium.inria.fr/~remy/coercions/Cretin-Remy!fcc@lics2014.pdf
first paragraph of the right column of page 3

What is the relation between "forcing contexts" and needed redexes?

They are quite different notions, because we are in a call-by-value setting. A context is "forcing" when (1) it is in evaluation position and (2) its evaluation requires the value of its hole. For example:

(□, u) is neither needed in call-by-need nor forcing for us (by "needed" we mean that a redex in the hole of this context would be a needed context)
□ is needed but not forcing (in terms of our type-system it is a Return position, not a Dereference)
(1, □+2) is forcing (with strict pairs) but not needed

Details on the operational semantics (review 3)

It was indeed our intention to avoid an "assoc" rule, so that the example of review 3

let rec x = (let rec y = 1 :: x in 2 :: y) in 3 :: x

is precisely a normal form that describes a value containing cycles, with each cycle defined as locally as possible. (This is just a presentation choice and an "assoc" rule would work just as well; in fact it would probably have been wiser to directly reuse this aspect of the presentation of [Nordlander, Carlsson and Gill, 2008], which use a "merge" rule for that purpose. The rest of this response describes how the calculus works without needing to explicitly merge let-rec blocks.)

Note that our "lookup" rule may traverse the whole term from the lookup point to the root, in search of the value binding that defines this variable -- we don't need to merge value bindings to find the value in outer/older bindings.

That said, there were two small mistakes in the rules described in our submission, related to the lack of a grammar for contexts built only from fully-evaluated let rec blocks:

L[□] ::= □ | let rec B in L[□]

(With this L[□] defined, our forcing contexts E_f[□] can be defined from forcing frames F_f[□] in a simpler way: E_f ::= E[F_f[L]].)

Mistake 1: the grammar of values should allow L[v] as a strong value, and L[w] as a weak value.

Mistake 2: the rules for head reductions should allow for L-contexts in the middle of the redex, corresponding to a somewhat standard "reduction at a distance" rule in explicit-substitution calculi:

L[λx.t] v -{hd}-> L[t[v/x]]

(match L[K (wᵢ)ⁱ] with h, K (xᵢ) → t, h') -{hd}-> L[t[wᵢ/xᵢ]ⁱ]

(For a reference on "reduction at a distance", see for example the Linear Substitution Calculus introduced in "An abstract factorization theorem for explicit substitutions", Beniamino Accattoli, 2012.)

With these two fixes in place, let us describe a more interesting example by defining

B :=
  rec hd = λli.
    match li with Nil -> None | Cons (x, xs) -> Some x
  and cycle = λx.
    let rec li = Cons (x, li) in li

t :=
  let rec B₀ in hd (cycle 1)

We have, as one possible reduction path for t

let rec B₀ in hd (cycle 1)
-{lookup hd}->
let rec B₀ in
  (λli. match li with Nil -> None | Cons (x, xs) -> Some x)
    (cycle 1)
-{lookup cycle}->
let rec B₀ in
  (λli. match li with Nil -> None | Cons (x, xs) -> Some x)
    ((λx. let rec li = Cons (x, li) in li) 1)
-{app}->
let rec B₀ in
  (λli. match li with Nil -> None | Cons (x, xs) -> Some x)
    (let rec li = Cons (1, li) in li)
-{lookup li}->
let rec B₀ in
  (λli. match li with Nil -> None | Cons (x, xs) -> Some x)
    (let rec li = Cons (1, li) in Cons (1, li))
-{app}->
let rec B₀ in
  (match (let rec li = Cons (1, li) in Cons (1, li)) with
    | Nil -> None
    | Cons (x, xs) -> Some x)
-{match}->
let rec B₀ in
  let rec li = Cons (1, li) in
  Some 1

Local specific questions from reviewer

Review 1

You basically want to say that information flows from the expression to the variables. This is usually referred to as the "checking" direction in bidirectional typing, as opposed to the "inference"/"synthesis" direction. Thus, you could also say "A type checking system for...".

In the bidirectional type systems that we are familiar with, the context is known, and the "checking" direction moves inwards inside terms by following constructors, and stops before reaching neutrals (in particular variables) which are inferred, not checked. It is thus not clear that "checking" would carry the right intuition here.

p7 Why is "(λx.□)t" a deref context and not a return context? Should it not be equal to "let x = t in □" ?

Reduction can lower mode constraints, so anti-reduction can strengthen them. With our mode languages, (□ t) is a Dereference context that is indistinguishable, from the modes, from (f □); it is natural that (f(λ.□)) be a Dereference context (f may call the function and inspect the result), and ((λx.□)t) is handled in the exact same way.

If we wanted to refine the mode-checking of this particular example (we do not), we could introduced new mode "Call" that sits between Return and Dereference, is only used on the left of function applications, with Call[Delay] = Return.

Maybe a semantics using a heap more explicitly would make the presentation clearer!? See, e.g., [4], or related machines for call-by-need, like the Sestoft machine.

We think that there is conceptual value in staying as close to the source calculus as possible. Of course, "clearer" depends on reader familiarity, and some readers are already more familiar with abstract-machine presentations. In term of economy of concepts, we would argue that this presentation is better.

Since you have only shallow matching, you could have instead required disjointness of patterns in the syntax of "match".

Thanks for the suggestion.

Review 2

(One approach to remedy the situation would be to prove that the paper's semantics is correct with respect to a more conventional semantics. But this would be beyond the scope of the paper.)

Yes, we asked ourselves the same question and have written down a simple compilation scheme for our source language into a λ-calculus with a global store, and a sketch of a bisimulation argument between a source program and its compiled form. (It doesn't fit in the paper.)

For the record, our target calculus is defined as follows:

t ::= x, y, z
    | new (xᵢ)ⁱ; u
    | (xᵢ←tᵢ)ⁱ; u
    | λx.t
    | t u
    | K (tᵢ)ⁱ
    | match t with h

v ::= K (wᵢ)ⁱ | λx.t
w ::= x | v

[[let rec (xᵢ = tᵢ)ⁱ in u]] ::= [[new (xᵢ)ⁱ; (xᵢ ← tᵢ)ⁱ; u]]

In this calculus, store locations are just variables (we don't have first-class reference cells that can be returned from a function, so no need for a separate class of runtime locations), and (xᵢ←tᵢ)ⁱ represents parallel, unordered writes: reducing one of the writee and committing a fully-evaluated write (x←v) to memory may happen in any order.

gasche/fscd2019-author-response.md