ICFP 2022 - Artifact Evaluation

Authors with a paper accepted to ICFP 2022 are invited to submit an artifact that supports the conclusions of the paper. The Artifact Evaluation Committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

If you have a paper accepted at ICFP 2022, please see the Call for Artifacts for instructions on submitting an artifact for evaluation.

Call for Artifacts

The Artifact Evaluation Committee (AEC) invites authors of accepted papers to submit an artifact that supports the conclusions of the paper. The committee will read the paper, explore the artifact, and provide feedback on how easy it would be for future researchers to build on. The ultimate goal of artifact evaluation is to support future researchers in their ability to reproduce and build on today’s work.

The submission of artifacts for review is voluntary and will not influence the final decision of whether the paper itself is accepted. Papers with successfully reviewed artifacts will receive a seal of approval printed on the first page of the paper in the ICFP proceedings. Authors of papers with successfully reviewed artifacts are encouraged to make the artifact publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

Types of Artifacts

An artifact that supports the paper’s conclusions can take many forms, including any or all of the following:

a working copy of the software and its dependencies, including benchmarks, examples and/or case studies
experimental data sets
a mechanized proof

Paper proofs are not accepted as artifacts for evaluation.

Selection Criteria

Artifacts have two broad purposes: facilitating reproduction and reuse of the work by future scientists. Reuse goes beyond reproduction by allowing future scientists to, for example, extend a tool with new features or to inspect the exact definitions used in a formal proof.

To facilitate reproduction and reuse, an artifact should be:

consistent with the claims of the paper and the results it presents,
as complete as possible, supporting all claims of the paper,
well-documented,
future-proof, and
easy to extend and modify.

Artifacts which satisfy these criteria will be awarded at least one of the ACM “Available”, “Functional” and “Reusable” badges. For more details on the badges and the evaluation criteria, see the Evaluation Guidelines.

We expect that most artifacts submitted for review at ICFP will have a few common forms: compilers, interpreters, proof scripts and so on. We have codified common forms of artifacts on a separate page. If you are considering submitting an artifact that does not have one of these forms, we encourage you to contact the Artifact Evaluation chairs before the submission deadline to discuss what is expected.

Submission Process

The evaluation process uses an (optional, lightweight) double-blind system. Of course, authors will not know the names of reviewers. Authors are also encouraged (but not required) to take any reasonably easy steps to anonymize their submissions, and reviewers will be discouraged from trying to learn the names of authors. Note that we do not intend to impose a lot of extra work here: anonymizing artifact submissions is an opportunity to help ensure the reviewing process is fair, but is in no way required, especially if the anonymization would require a lot of work or compromise the integrity of the artifact. There will be a mechanism for free, double-blind communication between reviewers and authors, so that small technical problems can be overcome during the reviewing process. Authors may iteratively improve their artifacts during the process to overcome small technical problems, but may not submit new material that would require substantially more reviewing effort.

We intend for most artifact submissions to include BOTH:

Software installed into a QEMU Debian base Virtual Machine (VM) image provided by the committee. See this page for details.
A separate source tarball that includes just the source files.

In most cases, artifacts should include BOTH the extended VM image AND a separate source tarball. The intention is that reviewers who are familiar with certain tools (e.g. Agda or OCaml) can inspect the artifact sources directly, while reviewers that are less familiar can still execute the artifact without needing to install new software on their own machines besides QEmu. The VM image will be archived so that future researchers, say in 5 years time, do not need to worry about version incompatibilities between old tool versions and new operating systems.

The detailed submission process is as follows:

Read the Submission Guidelines page for details on artifact preparation.
Register your intent to submit an artifact on the separate artifact only HotCRP site before the end of Thursday 26th May.
Prepare your artifact, building upon the base VM image.
Upload your artifact to Zenodo (recommended), or otherwise make it available via a stable URL (i.e. the URL should not change if you later make updates to the artifact; and ideally, the URL has a good chance of continuing to exist well into the future).
- See here for one recommendation on how to anonymize your submission to Zenodo.
Finalize your submission on HotCRP with a link to your artifact. You should also upload a preprint of your paper, and any additional materials the reviewers may find helpful (e.g. appendices).

For any question, please feel free to contact the Artifact Evaluation Chairs – listed at the end of this page.

Timeline

It takes time to produce a good artifact; thus we allot 8 working days between conditional paper acceptance and artifact submission. These are the key dates (all dates are in the Anywhere on Earth (AOE / UTC-12) timezone):

Event	Date
ICFP Conditional Acceptance	Sat 21 May
Registration date	Thu 26 May
Artifact submission	Wed 1 Jun
Review and technical clarification	Wed 6 June - Thu 16 June
Preliminary reviews available	Thu 16 June
Further clarification if needed	Thu 16 June - Tue 28 June
Final decision sent to authors	Tue 30 June

More Information

For additional information, clarification, or answers to questions, please contact the ICFP Artifact Evaluation co-chairs:

Jannis Limperg j.b.limperg@vu.nl
Gabriel Scherer gabriel.scherer@inria.fr

Most artifacts that are submitted for review at ICFP have one of a few common forms, so we provide some advice for authors about how to prepare them. This material should be taken as highly suggestive, but not prescriptive. If you have questions about what is expected, or if your artifact does not fit into any of the categories below, please contact the AEC co-chairs as early as possible.

Instructions for All Artifacts

Artifact Registration

Artifacts must be registered via a separate artifact-only HotCRP instance by May 26th. For the registration, please write a short abstract of the artifact. This allows us to detect in advance artifacts which may have special requirements (e.g. specific hardware). The submission of the full artifact is due on June 1st.

Sources and VM Image

Artifacts should generally consist of two components:

A source tarball.
A virtual machine image containing the same sources, with the artifact’s dependencies already installed.

The VM image primarily facilitates reproduction: it allows future scientists to reproduce the artifact’s results without having to deal with incompatible dependencies, changes to operating system interfaces, etc. The source tarball primarily facilitates reuse.

Both components must contain a Readme.md file that gives the name of the paper and step-by-step instructions for how to execute the artifact. For the source tarball, these instructions should include how to install the artifact’s dependencies.

The VM image may be produced by taking our base image, unpacking the source tarball into the VM and executing a prefix of the source tarball’s instructions.

Try to avoid requiring graphical environments (X Windows) to be installed into the VM unless truly necessary. Graphical environments in VMs are sometimes slow and unstable.

Readme

In most cases the step-by-step instructions in your Readme.md should be a list of commands to build and test the artifact on the examples described in the paper, and to reproduce any graphs and benchmarking results. The instructions should call out particular features of the results, such as “this produces the graph in Fig 5 that shows our algorithm runs in linear time”. Try to keep the instructions clear enough that reviewers can work through them in under 30 minutes. Consider providing a top-level Makefile so that the commands to be executed are just make targets that automatically build their prerequisites.

If the build process emits warning messages, perhaps when building libraries that are not under the author’s control, then include a note in the instructions that this is the case. Without a note the reviewers may assume something is wrong with the artifact itself.

Separately from the step-by-step instructions, provide other details about what a reviewer should look at. For example, “our artifact extends existing system X and our extension is the code located in file Y”.

Upload to Zenodo

Once you have prepared your artifact, upload it to Zenodo to ensure that it will remain publicly accessible in perpetuity. Similar non-commercial, long-term archives are also acceptable (but not GitHub or your personal website).

Anonymization

We use an optional, lightweight double-blind review process. This means you may, but are not required to, anonymize your artifact to improve the fairness of the reviewing process. See here for how to upload an anonymized artifact to Zenodo. We will ask reviewers to refrain from trying to find out artifact authors’ names.

Revised Papers

Artifacts should, whenever possible, be evaluated against the revised version of the paper. To facilitate this, you can upload the revised version of the paper, or a partially revised draft, when you submit the artifact. Please also add a note to the artifact’s README alerting reviewers to the revisions.

Instructions for Common Types of Artifacts

Command-line Tools

Unix command-line tools should have standard --help style command-line help pages. It is not acceptable for an executable to throw uninformative exceptions when executed with no flags, or with the wrong flags.

Compilers and Interpreters

It should be obvious how to run the tool on new examples that the reviewers write themselves. Do not just hard-code the examples described in the paper.

If your tool consumes expressions in a custom DSL then we recommend supplying a grammar for the concrete syntax, so that reviews can try the tool on new examples. Papers that describe such languages often give just an abstract syntax, and it is often not clear what the full concrete syntax is from the paper alone.

Proof Scripts

In most cases, the artifact VM should contain an installation of the proof checker and specify a single command (preferably make) to re-check the proof. It is fine to leave the VM itself command-line only, and require reviewers to browse the proof script locally on their own machines. It should not be necessary to have an IDE (e.g. CoqIDE or Emacs) installed into the VM, unless the paper is particularly about IDE functionality.

Include comments in the proof scripts that highlight the main theorems described in the paper. Use comments like “This is Theorem 1.2: Soundness described on page 6 of the paper”. Proof scripts written in “apply style” are typically unreadable without loading them into an IDE, but reviewers will still want to find the main lemmas and understand how they relate.

Reviewers almost always complain about lack of comments in proof scripts. To authors, the logical statements of the lemmas themselves are likely quite readable, but reviewers typically want English prose that repeats the same information.

Common Problems

This section discusses common problems with artifacts. If your artifact has any special requirements, please contact the AEC co-chairs well in advance. We will then discuss how the artifact can be best reviewed. The advice below has been distilled from past experience at a variety of events and does not describe specific papers, artifacts or authors.

Proprietary Software

It is reasonable for artifacts to depend on commercially licensed tools (e.g. MATLAB or some commercial SMT solver) if the paper’s audience would generally have access to these tools. In such cases, we will try to match the artifact with reviewers who also have access to these tools. If this is not possible, we will ask authors to provide an anonymously accessible environment in which the required tools are installed.

If parts of the artifact depend on a proprietary tool or proprietary data which cannot be made available to the reviewers at all, please contact the AEC co-chairs to discuss whether and how the artifact can still be reviewed.

Special Hardware Requirements

Some artifacts require special hardware, e.g. GPUs or a compute cluster. If the hardware is relatively common, e.g. an NVIDIA GPU, we will try to find reviewers who have access to the hardware. If not, it is usually still possible to (partly) evaluate the artifact:

Authors may provide a scaled-down version of the artifact (e.g. a benchmark) which can be executed on a commodity laptop.
Authors may provide anonymous access to an environment which includes the needed hardware (e.g. a compute cluster).
Authors may provide access to a simulator, e.g. for a specific FPGA.

Long-Running Computations

Some artifacts require extensive computations (on the order of days, not hours). In such cases, please provide a scaled-down version of the artifact which can be evaluated within a reasonable amount of time. Reviewers will greatly appreciate it if the computation can be paused and resumed.

Unstable or Dangerous Software

Reviewers may be reluctant to install software which may destabilize their systems (e.g. kernel drivers) or which intentionally performs dangerous actions (e.g. proof-of-concept exploits). In such cases, document very clearly any possible effects on the host system and how to reverse them. If possible, prepare the artifact in such a way that the dangerous software is isolated from the reviewer’s system.

Web Interfaces

If your artifact has a web interface, try to get the server running locally inside the VM and allow the reviewer to connect to it via a web browser running natively on their host machine. Graphical environments installed into VMs are sometimes laggy and unstable, and standard web protocols are stable enough that such artifacts should be usable with any recent browser.

Programs that Generate Images

If the artifact produces an image file (e.g. a graph), then expect the reviewer to use scp or some such to copy it out to the host machine and view it. Authors should test that the connection to the VM works, so that this is possible.

In 2022, for the first time at ICFP, artifact evaluation will not only make a binary decision between ‘artifact accepted’ and ‘artifact not accepted’, but also distinguish between two levels of artifact quality. (This is a demand from the ICFP steering committee, following other SIGPLAN conferences.). To facilitate this, we believe it is important to lay out in some detail what these levels mean, to guide reviewers, to ensure fairness and to help authors prepare their submissions.

However, this document cannot possibly capture all the nuances of what makes a good artifact. Any prescriptive statement should be read as a guideline, not a fixed rule. The ultimate aim is to produce artifacts which are useful to the research community and if that aim is better served by deviating from these guidelines, authors and reviewers should do so (though if the deviation is substantial, please contact the evaluation co-chairs to discuss it; the earlier the better). Our guidelines are also not necessarily complete: if an artifact satisfies all the criteria below but has other major issues, it may still be denied the corresponding badge.

Badges

Artifacts can earn three badges (familiar from other conferences):

Artifact available
Artifact functional
Artifact reusable

Available Badge

Note to evaluators: this badge is awarded by the chairs directly since there is not much to evaluate. So you can focus on the next two badges.

To receive the Artifact available badge, artifacts must be stored in a long-term, publicly accessible archive. We recommend Zenodo for this purpose (and the submission guidelines contain instructions on submitting artifacts there). Other archives are also accepted, but they must fulfill the two above criteria:

Long-term archival: the archive must ensure that artifacts are available indefinitely. This excludes commercial offerings, such as GitHub, which make no long-term commitments, as well as personal websites.
Public accessibility: the archive must be freely accessible to the general public.

The available badge is independent of the functional and reusable badges. This means an artifact not deemed to meet the functional standard of quality can still be available. Conversely, an artifact which contains significant proprietary components, and which therefore cannot be submitted to a public archive, can be functional or reusable but not available.

Functional and Reusable Badges

We award two badges for quality: functional and reusable. To be deemed reusable, an artifact must also fulfill all the criteria for functional artifacts. The difference between the two badges is, roughly, the difference between reproducibility and reusability:

Functional artifacts allow future scientists to confirm that any claims from the paper which are supposed to be supported by the artifact, are in fact supported by the artifact. E.g. if the artifact is a program, reviewers must be able to build it, to run it and to confirm that it yields the right results.
Reusable artifacts go beyond reproduction by enabling future scientists to build upon the artifact. E.g. programs should run on modern operating systems and have up to date dependencies.

In the following, we spell out our expectations for the two badges in some more detail.

Functional Badge

Consistency and Completeness

The artifact should directly implement or support the technical content of the paper (consistency). It should validate any claims made in the paper about the artifact or, if there are no explicit claims in the paper, any claims that one would expect to be validated (completeness).

For programs, the program should work as described in the paper. The program may be an extended version of the one described in the paper, but all examples discussed in the paper should run with at most minimal changes – clearly documented.
For benchmarks, results obtained by running the benchmarks should be consistent (within the expected variance) with the results reported in the paper. All graphs, tables etc. should be reproducible. Exceptions are possible, e.g. when a benchmark takes a very long time to run, we expect this aspect of the artifact to be clearly documented.
For formal proofs, the proved statements should match those from the papers. Axioms or incomplete proofs are acceptable if they are documented.

Exercisability

It should be possible to reproduce the artifact’s contribution in any commonly used environment. For executable artifacts, this requirement is satisfied if the artifact is packaged as a VM image containing all relevant software and data sets (as described in the submission guidelines). If external data are required, it should be clear how to access them.

This requirement does not apply to artifacts which necessarily require a non-standard environment, e.g. special hardware or large amounts of computing power. But if at all possible, the artifact should still allow reviewers to partially verify the artifact, for example by

providing simulators for special hardware;
providing anonymous remote access to special hardware or compute clusters;
providing downscaled versions of the artifact which can be run on standard hardware.

Authors of such artifacts should contact us already before the submission to discuss these issues.

Documentation

The artifact should contain sufficient documentation for reviewers to perform the activities mentioned above.

For programs, it should be clear how to build the program and how to run it on the examples provided in the paper.
For benchmarks, it should be clear how to run the benchmark and, if necessary, how to interpret the resulting data.
For formal proofs, it should be clear (a) how to check that the proofs are axiom-free; (b) which parts of the formal proof correspond to which theorem in the paper; (c) how the notation and definitions used for the formal proof correspond to those used in the paper.

Reusable Badge

Reusable artifacts should satisfy the following requirements in addition to the functional ones. The reusable requirements are necessarily fuzzier than the functional ones. We rely on reviewers’ good judgment to determine whether an artifact can, in fact, be reused by future scientists. However, we ask reviewers to be lenient with regards to the more work-intensive requirements. It is generally easy to find fault with the documentation and code quality of research projects, but artifacts should be evaluated against the state of the art, not against a theoretical ideal of perfect software.

Exercisability

The artifact should work not only inside the VM image, but also in other standard environments. This means:

The artifact’s dependencies should be reasonably up to date. The artifact should not unnecessarily depend on specific drivers, instructions sets, etc.
The artifact should not depend on undocumented changes to, for example, the operating system or dependencies.
The artifact’s packaging should facilitate reuse (e.g. a Coq library could be packaged as an opam package).
The artifact should give reasonable error messages if it cannot be executed in some particular environment.

Documentation

The artifact should be documented in a way that facilitates reuse. This means:

There should be install instructions for all supported operating systems. Dependencies should be clearly documented.
For programs, it should be clear how to run the program on inputs other than those from the paper. E.g. for a compiler, the concrete syntax of the input language should be documented. Any options to the program should be documented. The main parts of the implementation should be documented to a reasonable degree. It should be clear how to run the test suite (if any).
For benchmarks, it should be clear how to run the benchmark on inputs other than those from the paper and how to prepare such inputs.
For formal proofs, the main parts of the proof (key lemmas and definitions) should be documented, especially if the notation differs from that used in the paper.

Quality

The artifact should be of sufficiently high quality that future scientists could reuse it without major changes. For example:

Code should be reasonably clear and consistently formatted.
The build process should be as simple as possible.
Error messages should be clear enough to facilitate debugging.
Web interfaces should work with all modern browsers.

Artifacts should generally consist of two components:

A source tarball.
A virtual machine image containing the same sources, with the artifact’s dependencies already installed.

We provide a VM image which you may use as a base for your own VM image. If you wish to use this image, follow the instructions in the next section. If you wish to create your own VM image from scratch, follow the instructions further down below.

Creating an Artifact Using the Base Image

Download an archive containing the base image and some supporting files:

base-image.tar.xz

The archive is hosted on Google Drive. To verify the download, create a file sha next to base-image.tar.xz with content

d1988b842d465b6dc3d057e690566df38861928319ffce48460719f6bbcbb9d3d267c946b9f6efd3f9ce8d601e99fcee77ed0216d706adc1ed55a2deb887b0af  base-image.tar.xz

Then run the command

$ sha512sum -c sha

Unpack base-image.tar.xz. This requires about 2.2GB of disk space. The unpacked directory contains a file README.md with further instructions.

The base VM image uses a virtual hard disk with a maximum capacity of 16GB. (Its actual size expands dynamically up to this maximum.) If you need more disk space, please create a custom VM image.

Creating an Artifact Using a Custom Image

Download an archive containing the supporting files. (These are the same files contained in base-image.tar.xz, minus the actual VM image.)

supporting-files.tar.xz

The archive is hosted on Google Drive. To verify the download, create a file sha next to supporting-files.tar.xz with content

0aafcae83adf3e8f4a78b1ae08a3377cea37f78c69281d649766335e8978bb6fd8b33a8b67f85aa725219793665ccd28747705da75e1951d9f400439334b6a2c  supporting-files.tar.xz

Then run the command

$ sha512sum -c sha

Unpack the archive. The unpacked directory contains a file ImageCreation.md detailing the steps we took to create the base VM image. When creating your own image, please follow these steps as closely as possible (while making any modifications you need). This makes sure that the reviewers can run your VM image without too many surprises.

Adjust the supporting files as necessary for your custom VM image. Check that the start.sh and start.bat scripts still work. Add a prominent notice for the reviewers to your README.md.

M1 Macs

The script for starting the VM, start.sh, unfortunately does not work on Apple Silicon M1 macs. You can use the following script instead. However, you must use macOS >= 12.4 to run this script. Earlier versions of macOS contain a bug which will lead to a kernel panic on certain M1 chips (at least M1 Pro and M1 Max).

qemu-system-aarch64 \
    -name   "ICFP 2022 Artifact" \
    -M      virt,highmem=on \
    -accel  hvf \
    -cpu    max \
    -m      4096 \
    -device e1000,netdev=net0 \
    -netdev user,id=net0,hostfwd=tcp::5555-:22 \
    -hda    disk.qcow \
    $@

You can also use the UTM app as a graphical QEMU frontend. Create a VM with the following settings:

4GB of RAM
Use disk.qcow as the virtual hard drive.
Forward port 22 of the VM to port 5555 on the host (if you want to SSH into the VM).

Artifact EvaluationICFP 2022

Call for Artifacts

Submission Guidelines

Evaluation Guidelines

VM Image

Alain Delaët-Tixeuil

ENS Lyon

France

Andrés Goens

the University of Edinburgh

Artem Pelenitsyn

Northeastern University

United States

Aymeric Fromherz

Carnegie Mellon University

Basile Pesin

Inria Paris

France

Danielle Marshall

University of Kent, UK

United Kingdom

Gabriel SchererCo-chair

INRIA Saclay

France

Hector Suzanne

LIP6 - Sorbonne Université & CNRS

France

Hugo Férée

University of Kent, UK

United Kingdom

Ike Mulder

Radboud University Nijmegen

Jaime Arias

CNRS, LIPN, Université Sorbonne Paris Nord

France

Jannis LimpergCo-chair

Vrije Universiteit Amsterdam

Netherlands

John Leo

Halfaya Research

United States

Ken Sakayori

University of Bologna

Italy

Lionel Rieg

Verimag

Lourdes del Carmen González-Huesca

National Autonomous University of Mexico

Mexico

Lucas Franceschino

INRIA

France

Mallku Soldevila

FAMAF, UNC / CONICET

Argentina

Mário Pereira

NOVA LINCS & DI -- Nova School of Science and Technology

Matthias Güdemann

University of Applied Sciences Munich

Germany

Meven Lennon-Bertrand

Inria – LS2N, Université de Nantes

France

Mistral Contrastin

Facebook London

United Kingdom

Mukesh Tiwari

University of Cambridge, UK

United Kingdom

Neea Rusch

Augusta University

United States

Nick Hu

University of Oxford

United Kingdom

Orestis Melkonian

University of Edinburgh

United Kingdom

Qianchuan Ye

Purdue University