ICFP 2022
Sun 11 - Fri 16 September 2022 Ljubljana, Slovenia

Authors with a paper accepted to ICFP 2022 are invited to submit an artifact that supports the conclusions of the paper. The Artifact Evaluation Committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

If you have a paper accepted at ICFP 2022, please see the Call for Artifacts for instructions on submitting an artifact for evaluation.

Call for Artifacts

The Artifact Evaluation Committee (AEC) invites authors of accepted papers to submit an artifact that supports the conclusions of the paper. The committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

The submission of artifacts for review is voluntary and will not influence the final decision of whether the paper itself is accepted. Papers with successfully reviewed artifacts will receive a seal of approval printed on the first page of the paper in the ICFP proceedings. Authors of papers with successfully reviewed artifacts are encouraged to make the artifact publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

Types of Artifacts

An artifact that supports the paper’s conclusions can take many forms, including any or all of the following:

  • a working copy of the software and its dependencies, including benchmarks, examples and/or case studies
  • experimental data sets
  • a mechanized proof

Paper proofs are not accepted as artifacts for evaluation.

Selection Criteria

Artifacts have two broad purposes: facilitating reproduction and reuse of the work by future scientists. Reuse goes beyond reproduction by allowing future scientists to, for example, extend a tool with new features or to inspect the exact definitions used in a formal proof.

To facilitate reproduction and reuse, an artifact should be:

  • consistent with the claims of the paper and the results it presents,
  • as complete as possible, supporting all claims of the paper,
  • well-documented,
  • future-proof, and
  • easy to extend and modify.

Artifacts which satisfy these criteria will be awarded at least one of the ACM “Available”, “Functional” and “Reusable” badges. For more details on the badges and the evaluation criteria, see the Evaluation Guidelines.

We expect that most artifacts submitted for review at ICFP will have a few common forms: compilers, interpreters, proof scripts and so on. We have codified common forms of artifacts on a separate page. If you are considering submitting an artifact that does not have one of these forms, we encourage you to contact the Artifact Evaluation chairs before the submission deadline to discuss what is expected.

Submission Process

The evaluation process uses an (optional, lightweight) double-blind system. Of course, authors will not know the names of reviewers. Authors are also encouraged (but not required) to take any reasonably easy steps to anonymize their submissions, and reviewers will be discouraged from trying to learn the names of authors. Note that we do not intend to impose a lot of extra work here: anonymizing artifact submissions is an opportunity to help ensure the reviewing process is fair, but is in no way required, especially if the anonymization would require a lot of work or compromise the integrity of the artifact. There will be a mechanism for free, double-blind communication between reviewers and authors, so that small technical problems can be overcome during the reviewing process. Authors may iteratively improve their artifacts during the process to overcome small technical problems, but may not submit new material that would require substantially more reviewing effort.

We intend for most artifact submissions to include BOTH:

  1. Software installed into a QEmu Debian base Virtual Machine (VM) image provided by the committee. The base VM image will be available for download soon, and will include instructions on how to use and build on it, as well as help debugging common problems.

  2. A separate source tarball that includes just the source files.

In most cases, artifacts should include BOTH the extended VM image AND a separate source tarball. The intention is that reviewers who are familiar with certain tools (e.g. Agda or OCaml) can inspect the artifact sources directly, while reviewers that are less familiar can still execute the artifact without needing to install new software on their own machines besides QEmu. The VM image will be archived so that future researchers, say in 5 years time, do not need to worry about version incompatibilities between old tool versions and new operating systems.

The detailed submission process is as follows:

  1. Read the Submission Guidelines page for details on artifact preparation.
  2. Register your intent to submit an artifact on the separate artifact only HotCRP site (to be linked) before the end of Thursday 13th May.
  3. Prepare your artifact, building upon the base VM image (to be linked).
  4. Upload your artifact to Zenodo (recommended), or otherwise make it available via a stable URL (i.e. the URL should not change if you later make updates to the artifact; and ideally, the URL has a good chance of continuing to exist well into the future).
  5. Finalize your submission on HotCRP with a link to your artifact. You should also upload a preprint of your paper, and any additional materials the reviewers may find helpful (e.g. appendices).

For any question, please feel free to contact the Artifact Evaluation Chairs – listed at the end of this page.

Timeline

It takes time to produce a good artifact; thus we allot 8 working days between conditional paper acceptance and artifact submission. These are the key dates (all dates are in the Anywhere on Earth (AOE / UTC-12) timezone):

Event Date
ICFP Conditional Acceptance Sat 21 May
Registration date Thu 26 May
Artifact submission Wed 1 Jun
Review and technical clarification Wed 6 June - Wed 16 June
Preliminary reviews available Wed 16 June
Further clarification if needed Wed 16 June - Tue 28 June
Final decision sent to authors Tue 30 June

More Information

For additional information, clarification, or answers to questions, please contact the ICFP Artifact Evaluation co-chairs:

Most artifacts that are submitted for review at ICFP have one of a few common forms, so we provide some advice for authors about how to prepare them. This material should be taken as highly suggestive, but not prescriptive. If you have questions about what is expected, or if your artifact does not fit into any of the categories below, please contact the AEC co-chairs as early as possible.

Instructions for All Artifacts

Sources and VM Image

Artifacts should generally consist of two components:

  1. A source tarball.
  2. A virtual machine image containing the same sources, with the artifact’s dependencies already installed.

The VM image primarily facilitates reproduction: it allows future scientists to reproduce the artifact’s results without having to deal with incompatible dependencies, changes to operating system interfaces, etc. The source tarball primarily facilitates reuse.

Both components must contain a Readme.md file that gives the name of the paper and step-by-step instructions for how to execute the artifact. For the source tarball, these instructions should include how to install the artifact’s dependencies.

The VM image may be produced by taking our base image (to be linked here), unpacking the source tarball into the VM and executing a prefix of the source tarball’s instructions. You may use other base images for the VM, but please make them as similar as possible to our base image. (The linked website will also contain instructions on how we produced the base image.)

Try to avoid requiring graphical environments (X Windows) to be installed into the VM unless truly necessary. Graphical environments in VMs are sometimes slow and unstable.

Readme

In most cases the step-by-step instructions in your Readme.md should be a list of commands to build and test the artifact on the examples described in the paper, and to reproduce any graphs and benchmarking results. The instructions should call out particular features of the results, such as “this produces the graph in Fig 5 that shows our algorithm runs in linear time”. Try to keep the instructions clear enough that reviewers can work through them in under 30 minutes. Consider providing a top-level Makefile so that the commands to be executed are just make targets that automatically build their prerequisites.

If the build process emits warning messages, perhaps when building libraries that are not under the author’s control, then include a note in the instructions that this is the case. Without a note the reviewers may assume something is wrong with the artifact itself.

Separately from the step-by-step instructions, provide other details about what a reviewer should look at. For example, “our artifact extends existing system X and our extension is the code located in file Y”.

Upload to Zenodo

Once you have prepared your artifact, upload it to Zenodo to ensure that it will remain publicly accessible in perpetuity. Similar non-commercial, long-term archives are also acceptable (but not GitHub or your personal website).

Anonymization

We use an optional, lightweight double-blind review process. This means you may, but are not required to, anonymize your artifact to improve the fairness of the reviewing process. See here for how to upload an anonymized artifact to Zenodo. We will ask reviewers to refrain from trying to find out artifact authors’ names.

Instructions for Common Types of Artifacts

Command-line Tools

Unix command-line tools should have standard --help style command-line help pages. It is not acceptable for an executable to throw uninformative exceptions when executed with no flags, or with the wrong flags.

Compilers and Interpreters

It should be obvious how to run the tool on new examples that the reviewers write themselves. Do not just hard-code the examples described in the paper.

If your tool consumes expressions in a custom DSL then we recommend supplying a grammar for the concrete syntax, so that reviews can try the tool on new examples. Papers that describe such languages often give just an abstract syntax, and it is often not clear what the full concrete syntax is from the paper alone.

Proof Scripts

In most cases, the artifact VM should contain an installation of the proof checker and specify a single command (preferably make) to re-check the proof. It is fine to leave the VM itself command-line only, and require reviewers to browse the proof script locally on their own machines. It should not be necessary to have an IDE (e.g. CoqIDE or Emacs) installed into the VM, unless the paper is particularly about IDE functionality.

Include comments in the proof scripts that highlight the main theorems described in the paper. Use comments like “This is Theorem 1.2: Soundness described on page 6 of the paper”. Proof scripts written in “apply style” are typically unreadable without loading them into an IDE, but reviewers will still want to find the main lemmas and understand how they relate.

Reviewers almost always complain about lack of comments in proof scripts. To authors, the logical statements of the lemmas themselves are likely quite readable, but reviewers typically want English prose that repeats the same information.

Common Problems

This section discusses common problems with artifacts. If your artifact has any special requirements, please contact the AEC co-chairs well in advance. We will then discuss how the artifact can be best reviewed. The advice below has been distilled from past experience at a variety of events and does not describe specific papers, artifacts or authors.

Proprietary Software

It is reasonable for artifacts to depend on commercially licensed tools (e.g. MATLAB or some commercial SMT solver) if the paper’s audience would generally have access to these tools. In such cases, we will try to match the artifact with reviewers who also have access to these tools. If this is not possible, we will ask authors to provide an anonymously accessible environment in which the required tools are installed.

If parts of the artifact depend on a proprietary tool or proprietary data which cannot be made available to the reviewers at all, please contact the AEC co-chairs to discuss whether and how the artifact can still be reviewed.

Special Hardware Requirements

Some artifacts require special hardware, e.g. GPUs or a compute cluster. If the hardware is relatively common, e.g. an NVIDIA GPU, we will try to find reviewers who have access to the hardware. If not, it is usually still possible to (partly) evaluate the artifact:

  • Authors may provide a scaled-down version of the artifact (e.g. a benchmark) which can be executed on a commodity laptop.
  • Authors may provide anonymous access to an environment which includes the needed hardware (e.g. a compute cluster).
  • Authors may provide access to a simulator, e.g. for a specific FPGA.

Long-Running Computations

Some artifacts require extensive computations (on the order of days, not hours). In such cases, please provide a scaled-down version of the artifact which can be evaluated within a reasonable amount of time. Reviewers will greatly appreciate it if the computation can be paused and resumed.

Unstable or Dangerous Software

Reviewers may be reluctant to install software which may destabilize their systems (e.g. kernel drivers) or which intentionally performs dangerous actions (e.g. proof-of-concept exploits). In such cases, document very clearly any possible effects on the host system and how to reverse them. If possible, prepare the artifact in such a way that the dangerous software is isolated from the reviewer’s system.

Web Interfaces

If your artifact has a web interface, try to get the server running locally inside the VM and allow the reviewer to connect to it via a web browser running natively on their host machine. Graphical environments installed into VMs are sometimes laggy and unstable, and standard web protocols are stable enough that such artifacts should be usable with any recent browser.

Programs that Generate Images

If the artifact produces an image file (e.g. a graph), then expect the reviewer to use scp or some such to copy it out to the host machine and view it. Authors should test that the connection to the VM works, so that this is possible.

In 2022, for the first time at ICFP, artifact evaluation will not only make a binary decision between ‘artifact accepted’ and ‘artifact not accepted’, but also distinguish between two levels of artifact quality. (This is a demand from the ICFP steering committee, following other SIGPLAN conferences.). To facilitate this, we believe it is important to lay out in some detail what these levels mean, to guide reviewers, to ensure fairness and to help authors prepare their submissions.

However, this document cannot possibly capture all the nuances of what makes a good artifact. Any prescriptive statement should be read as a guideline, not a fixed rule. The ultimate aim is to produce artifacts which are useful to the research community and if that aim is better served by deviating from these guidelines, authors and reviewers should do so (though if the deviation is substantial, please contact the evaluation co-chairs to discuss it; the earlier the better). Our guidelines are also not necessarily complete: if an artifact satisfies all the criteria below but has other major issues, it may still be denied the corresponding badge.

Badges

Artifacts can earn three badges (familiar from other conferences):

  • Artifact available
  • Artifact functional
  • Artifact reusable

Available Badge

To receive the Artifact available badge, artifacts must be stored in a long-term, publicly accessible archive. We recommend Zenodo for this purpose (and the submission guidelines contain instructions on submitting artifacts there). Other archives are also accepted, but they must fulfill the two above criteria:

  • Long-term archival: the archive must ensure that artifacts are available indefinitely. This excludes commercial offerings, such as GitHub, which make no long-term commitments, as well as personal websites.
  • Public accessibility: the archive must be freely accessible to the general public.

The available badge is independent of the functional and reusable badges. This means an artifact not deemed to meet the functional standard of quality can still be available. Conversely, an artifact which contains significant proprietary components, and which therefore cannot be submitted to a public archive, can be functional or reusable but not available.

Functional and Reusable Badges

We award two badges for quality: functional and reusable. To be deemed reusable, an artifact must also fulfill all the criteria for functional artifacts. The difference between the two badges is, roughly, the difference between reproducibility and reusability:

  • Functional artifacts allow future scientists to confirm that any claims from the paper which are supposed to be supported by the artifact, are in fact supported by the artifact. E.g. if the artifact is a program, reviewers must be able to build it, to run it and to confirm that it yields the right results.

  • Reusable artifacts go beyond reproduction by enabling future scientists to build upon the artifact. E.g. programs should run on modern operating systems and have up to date dependencies.

In the following, we spell out our expectations for the two badges in some more detail.

Functional Badge

Consistency and Completeness

The artifact should directly implement or support the technical content of the paper (consistency). It should validate any claims made in the paper about the artifact or, if there are no explicit claims in the paper, any claims that one would expect to be validated (completeness).

  • For programs, the program should work as described in the paper. The program may be an extended version of the one described in the paper, but all examples discussed in the paper should run with at most minimal changes – clearly documented.
  • For benchmarks, results obtained by running the benchmarks should be consistent (within the expected variance) with the results reported in the paper. All graphs, tables etc. should be reproducible. Exceptions are possible, e.g. when a benchmark takes a very long time to run, we expect this aspect of the artifact to be clearly documented.
  • For formal proofs, the proved statements should match those from the papers. Axioms or incomplete proofs are acceptable if they are documented.

Exercisability

It should be possible to reproduce the artifact’s contribution in any commonly used environment. For executable artifacts, this requirement is satisfied if the artifact is packaged as a VM image containing all relevant software and data sets (as described in the submission guidelines). If external data are required, it should be clear how to access them.

This requirement does not apply to artifacts which necessarily require a non-standard environment, e.g. special hardware or large amounts of computing power. But if at all possible, the artifact should still allow reviewers to partially verify the artifact, for example by

  • providing simulators for special hardware;
  • providing anonymous remote access to special hardware or compute clusters;
  • providing downscaled versions of the artifact which can be run on standard hardware.

Authors of such artifacts should contact us already before the submission to discuss these issues.

Documentation

The artifact should contain sufficient documentation for reviewers to perform the activities mentioned above.

  • For programs, it should be clear how to build the program and how to run it on the examples provided in the paper.
  • For benchmarks, it should be clear how to run the benchmark and, if necessary, how to interpret the resulting data.
  • For formal proofs, it should be clear (a) how to check that the proofs are axiom-free; (b) which parts of the formal proof correspond to which theorem in the paper; (c) how the notation and definitions used for the formal proof correspond to those used in the paper.

Reusable Badge

Reusable artifacts should satisfy the following requirements in addition to the functional ones. The reusable requirements are necessarily fuzzier than the functional ones. We rely on reviewers’ good judgment to determine whether an artifact can, in fact, be reused by future scientists. However, we ask reviewers to be lenient with regards to the more work-intensive requirements. It is generally easy to find fault with the documentation and code quality of research projects, but artifacts should be evaluated against the state of the art, not against a theoretical ideal of perfect software.

Exercisability

The artifact should work not only inside the VM image, but also in other standard environments. This means:

  • The artifact’s dependencies should be reasonably up to date. The artifact should not unnecessarily depend on specific drivers, instructions sets, etc.
  • The artifact should not depend on undocumented changes to, for example, the operating system or dependencies.
  • The artifact’s packaging should facilitate reuse (e.g. a Coq library could be packaged as an opam package).
  • The artifact should give reasonable error messages if it cannot be executed in some particular environment.

Documentation

The artifact should be documented in a way that facilitates reuse. This means:

  • There should be install instructions for all supported operating systems. Dependencies should be clearly documented.
  • For programs, it should be clear how to run the program on inputs other than those from the paper. E.g. for a compiler, the concrete syntax of the input language should be documented. Any options to the program should be documented. The main parts of the implementation should be documented to a reasonable degree. It should be clear how to run the test suite (if any).
  • For benchmarks, it should be clear how to run the benchmark on inputs other than those from the paper and how to prepare such inputs.
  • For formal proofs, the main parts of the proof (key lemmas and definitions) should be documented, especially if the notation differs from that used in the paper.

Quality

The artifact should be of sufficiently high quality that future scientists could reuse it without major changes. For example:

  • Code should be reasonably clear and consistently formatted.
  • The build process should be as simple as possible.
  • Error messages should be clear enough to facilitate debugging.
  • Web interfaces should work with all modern browsers.