Skip to content

Instantly share code, notes, and snippets.

@dbuenzli
Last active August 22, 2022 07:38
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dbuenzli/a78131f54580212986713ef3e9b313e8 to your computer and use it in GitHub Desktop.
Save dbuenzli/a78131f54580212986713ef3e9b313e8 to your computer and use it in GitHub Desktop.
OCaml compiler support for library linking
@lpw25
Copy link

lpw25 commented Nov 14, 2019

If you're going to drop the .cma files, why do you even need the text file?

The library contents are already represented by the contents of the directory, and the .cmo and .cmx files know what the required dependencies are. See my namespaces proposal for details.

@gasche
Copy link

gasche commented Nov 14, 2019

I think it essentially breaks modularity and reveals implementation details of the type checker to the user. I should be able to a new dependency to my library without possibly forcing all downstream users to update their dependencies.

I think that the behavior that the authors of this proposal expect is the following:

  • if my library does not mention the new dependency in its interface, users don't need to add the dependency and everything works as expected
  • if my library does mention the new dependency in its interface, but it is missing from the include path, then types and signatures from that interface are handled by the type-checker as abstract types and abstract interfaces
  • if my library does mention the new dependency and downstream user code depends on the definition of its types and signatures, then they do need to add the new dependency

I agree with @dbuenzli that this model has benefits, and I disagree with @lpw25 that it breaks abstraction, assuming users need to add the downstream dependency only if their own code needs more information about it than abstract types.

At the same time, I have strong doubts that the type-checker currently allows this to work flawlessly in all cases. (We know a bit about this because Dune tried to use this model before and it broke in various ways in corner-cases. @lpw25 and @Octachron have looked for example at ocaml/ocaml#8779). From a language perspective, it looks like a reasonable model and in fact a fairly good model (missing .cmis are just "free module/unit variables", with abstract type-level components and no value-level components), but our implementation probably isn't quite there yet.

@lpw25
Copy link

lpw25 commented Nov 14, 2019

The issue is that "mentions in its interface" is not actually well-defined, and neither is "handled by the type-checker as abstract types". This is a fundamental aspect of OCaml's design -- you cannot remove an equality without specifying a complete interface. It sort of happens to sometimes work at the moment -- and this proposal doesn't do anything to change that -- but encouraging people to rely on the current behaviour is a mistake.

The need to read transitive .cmi files is really just an optimisation in the current compiler implementation -- it would be perfectly possible to avoid it by just expanding type aliases, module aliases and module type aliases. We should avoid making the behaviour of the system dependent on whether this optimisation is being applied.

@gasche
Copy link

gasche commented Nov 14, 2019

When we discussed this together, @Octachron suggested that we could open the .cmi of transitive dependencies in a degraded mode where type-level definitions are available, but term-level definitions are not (using a term variable or a variant/field from the module would be an error/warning). This is not "enough" from the point of view of proposal authors, who would like (at least) any definition (even type-level) that was not needed when type-checking my library to be hidden from the users of my library.

Your remark on the fact that transitive .cmi reading is "just an optimization" does not consider the fact that it allows my library users to do more than just rely on the definition of aliases used by my library. The intention of the authors of the proposal is precisely to restrict the extra visibility that it allows, whose use is arguably problematic. I think we should acknowledge that this is a reasonable feature wish.

Meta-level remark: I think that this particular question (the type-checking visibility of transitive dependencies) is a small sub-point of the proposal, and maybe we shouldn't get too distracted with it when discussing the proposal as a whole. But I agree that it is controversial and needs to be discussed in details.

@lpw25
Copy link

lpw25 commented Nov 14, 2019

Your remark on the fact that transitive .cmi reading is "just an optimization" does not consider the fact that it allows my library users to do more than just rely on the definition of aliases used by my library. The intention of the authors of the proposal is precisely to restrict the extra visibility that it allows, whose use is arguably problematic.

Sorry, I was already assuming that we would remove that behaviour. @trefis and I have been planning to fix this for ages and he even wrote an RFC describing how to get rid of it. I forgot that he had not actually posted the RFC since ocaml/ocaml#9056 covered some of the same ground as his RFC.

The part of his proposal that is relevant to this discussion is that there should be a --hidden-cmi <file> option (alongside --cmi <file> as a per-file version of -I) for adding cmi files to the Path.t lookup without adding them to the Longident.t lookup -- essentially the degraded mode that you mention.

I agree that we should not get too side-tracked by this discussion -- the proposal here does not change things in this regard nor does it make it harder to fix these issues later.

@dbuenzli
Copy link
Author

dbuenzli commented Nov 14, 2019

@gasche summarized exactly the model I want. I personally think it matches, at the library level, the notion of abstraction we have in the language.

This being said I'm all for changing the system in the long term along the various lines that are suggested here (e.g. the possible eventual removal of archive files), but I prefer if we avoid changing everything at the same time.

This proposal has the benefit that it mostly doesn't change anything except re-encoding the current state of the world in a simpler, more obvious and formal manner and made aware to the compiler.

This first step will then only make it easier to introduce gradual improvements without disturbing the eco-system -- for example namespacing with which this proposal is highly compatible and will only make it easier to introduce in my opinion. But I really think it's better if this first step which is rather big, not compiler-wise, but eco-system wise is done without trying to turn everything upside down.

Regarding the specific issue of recursive -I or not, if it's unclear then erring on the non-recursive can only lead to over specification rather than under specification, which will then not break if it turns out recursive is what is needed (but I doubt).

@alainfrisch
Copy link

If you're going to drop the .cma files, why do you even need the text file?

  • The file provides library dependencies, which themselves give information on where to find dependent units. We could instead keep the library name, in addition to the unit name, for dependencies in each .cmo file.

  • Attaching various kinds of properties at the library level: dependencies to C libraries (we could also add them to .cmo/.cmx files), a per-library -linkall mode (if any of the object in the library is used, link the entire library). (Possibly also attaching information on preprocessors to be applied when the library is used.)

  • Generally speaking, relying on explicit information (from the command line or files) rather than the mere presence of files on the filesystem is more robust, and allow detecting problems faster, esp. with parallel builds.

  • To support multiple library interfaces, we need to specify somewhere a list of units anyway (admittedly, this is to a large extent independent of .cma libraries).

  • The text files can also serve as specification for other tools (to compile the library itself, or download/install dependencies).

@alainfrisch
Copy link

This being said I'm all for changing the system in the long term along the various lines that are suggested here (e.g. the possible eventual removal of archive files), but I prefer if we avoid changing everything at the same time.

It makes sense. I don't want to derail the project, and the proposal looks ok to me, even if it seems to me that the community direction is rather to push users towards Dune anyway, and even possibly a "mono-repo" approach (duniverse style); in that context, the user interface for OCaml is really Dune, and the current proposal doesn't bring much. But we are not there yet!

@dbuenzli
Copy link
Author

Personally I don't care about dune and I think it's good if the OCaml compiler interface is good without assuming or needing a particular build system or a closed world mentality.

There are many different ways you may want to go about compiling OCaml, here's an alternate one for example.

Copy link

ghost commented Nov 18, 2019

Regarding recursive include paths, it's not just about typing, it also impacts compilation and in particular optimisations such as inlining. I was discussing with @mshinwell recently who mentioned that the compiler not seeing some cmx files was a huge pain for flambda. And even without considering the middle-end, in the typer we might still want to carry the information that a type is immediate even if the user is not allowed to make assumptions about the type declaration.

That said, I concur regarding the benefits of not exposing transitive dependencies to the user. But it seems to me that @lpw25's idea is the most viable one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment