The Reproducibility Initiative at SC'22 - Episode 5

Posted on Friday, May 26, 2023 | reproducibility, supercomputing, hpc
In this episode, Nicole interviews members of the Reproducibility Committee at SC'23. Together, they talk about the history of the initiative, the roles of the various subcommittees, Artifact Documentation and Artifact Evaluation, how the representation of industry affects reproducibility expectations, many other aspects of reproducibility, and much more.

Show Notes

Nicole: Okay, so welcome. Thanks for coming and meeting me here. So I will let you all introduce yourself, but, uh, thank you for volunteering your time here at SC 22.

Rocio: Thank you. So I am Rocío Carratalá from Spain from, and I’m currently an assistant professor.

Bilel: So, hello. Good afternoon. I’m, uh, Bilel Hadri computational scientist at the KAUST Supercompeting CoLab, and , I am the Reproducibility Initiative Chair for SC 22.

Le Mai: Hi, my name is Le Mai Weekly. I am a senior technical lead with the research applications team with Indiana University, and I am participating with the Reproducibility Challenge in the Student Cluster Competition and the special journal issue under the reproducibility initiative.

Nicole: Wonderful. Thanks for coming. So today we’re gonna talk about the reproducibility committee, which you are all a part of in one way or another. The reproducibility committee has been around for a while, so if you all would like to start with explaining your roles and how that contributes to computational reproducibility in general.

Bilel: To briefly summarize the, effort of the reproducibility at SC. So this effort started since, 2015. And it was led by Michela Taufer. And at that time there was like, if I recall correctly, like one paper that has shared like the, about the artifacts kind of brief description.

And, after seven years, quite a big committee with a lot of task and so thanks to many effort of all our predecessor. So now the reproducibility initiative has like major, like in three subtracts, the artifacts, description, and evaluation to award with the badges.

We have also the journal issue submission that are reviewing the student challenge that have selected one of these apps. And also last but not least, one subcommittee for Reproducibility Challenge that they select a previous paper from the last year and they selected for the student challenge.

So basically the effort spent for over three years. And now what is new? Like since 2021, there is an award for the best paper.

Le Mai: This year I’m doing the special issue for the reproducibility initiative. So, as Bilel was mentioning, this is like the last leg of that life cycle, three year life cycle of, a paper that goes into SC. So first of course, when you submit a paper to SC because of the reproducibility initiative you also have to submit an artifact descriptor, which will, be evaluated and then possibly get the badges and then maybe in the end get an award depending on what happens. Then, once a paper gets submitted, the reproducibility challenge selects a paper from the previous year and creates tasks for the student cluster competition.

To try to reproduce, results that they’re seeing in their paper. So the full circle comes in with a special issue where we, invite the best team, the top scoring reports from the previous student cluster competition, invite them to, shepherd them into bringing their reports that they wrote on the floor into journal quality.

And then inviting the author of that original paper to revisit their paper, with the perspective of these results that the students saw and extend their paper commenting on what the results of the paper saw so that paper from two years ago finally sees its end with students actually working on it and trying to reproduce their stuff.

Rocio: Yeah, so I’m co-chairing SC 22 Artifact Documentation and Artifact Evaluation Committee this year. And I think what this initiative has been focusing on evaluating all the technical paper artifacts provided so we can check if they are awarded with one, two, or three badges respectively correspond to first badge.

The artifact is available, so it has a DOI and it’s accessible. Second one, it’s functional, so you can download it, compile it, and use it. On third one, it’s, reproducible. It means that the results observed from the execution are those claimed in the paper they belong to.

Nicole: Great. Thank you guys for that. And thank you for the history there that I feel like that’s a really important start.

Bilel: We just have to give the credit to the people who have created this initiative.

Nicole: Yeah, so I guess I will introduce myself as well just to give you a motivation for why I put this together. One year I did the Student Cluster Competition. I, focused on the reproducibility challenge, and we got our paper accepted for it, the lovely journal. So that sort of started my inquiry, around reproducibility. I think, we spend a lot of time on just making sure things are well documented and that if somebody gets it, they can build it or run it again, but I think there’s even more complex issues that we talk about less because, they might not be a priority because we’re just working on the basics right now but I think there’s also interesting issues like, at what point is reproducibility acceptable, or when is transparency acceptable?

Because there are some things, especially in the HPC world where it’s just not feasible or for whatever reason, so I feel like it becomes a philosophical issue in a sense, so after being a research software engineer for several years, I went back to get my PhD in history and philosophy of science to sort of explore these issues. And I’m interested in researching standards related to reproducibility. So that’s sort of my motivation for pulling this together.

Bilel: There is also important important fact that over the years how submission has improved, as like the team that has, taken care of the artifact description and evaluation. We make life for reviewer. Because it’s a lot of work. And so the, this year, like they made like almost like compulsory to have at least a container.

So that for reviewing is at least we can check and validate some early result. And some of the reviewer would not be able to have access to some exotic software or uh hardware. So, the container will be able to reproduce it from your laptop up to if they have access to a supercomputer. And of course in a site, if they have a well-documented, source or easy access, how they can download it. Because now the HPC world is like a village. And there are some people who do not have access to these privilege. Uh, I would say resources. So with this transparent initiative, so the student, anywhere in the world, they read the paper that interested they can get the data, because sometimes we think we focus on the code, but also in the productibility they make sure that the data, because it’s not just to do like scientific things, it’s not about like hpc, like what we see here at the conference. Many track are towards science because you just don’t do computation for doing other things. There is something, a goal behind what is the problem you to solve. So many, I would say in biology or in climate. So you need the data. So how you will reproduce some result, because it’s not, the reproducibility is not only about performance, but also some result.

Rocio: I just say that it’s not about open source code, but open science that involves, everything. It’s a collection of code data. The logs, the way you can execute the requirements, maybe an adjustment to your code so it can be used in a smaller environment. So it’s the full collection of everything. Yes.

Bilel: These two they help a lot and they went through the journey of starting as a reviewer and up to being chair, and that’s why I selected them. But no, I’m just stating so that you can tell them their journey. So it’ll be nice to hear how they, transition for being like a reviewer and up leading the effort.

And I think for the last three years we have been there.

Le Mai: Yeah. For the reproducibility initiative, I started off with the repro challenge in 2019. Just helping out with reviewing there. And then in also in 2020 with reviewing and grading, and also helping out a little bit with a special issue in 2020.

2021, I chaired the repro challenge, so I got to actually have my hands on it and also work with the initiative. When Carlos was the was chair and he was the one who really pushed for, a lot of the ADA initiatives and bringing the award to light and. Anyo and Tanu, I can’t remember their last names, but they really, created a framework that Rosa and Roci have built on of, finding reviewers who will evaluate these artifacts and these artifact evaluations.

Finding people who will give, access to exceed machines or access machines at this point, or other kinds of, compute time so that they can evaluate these things. It has been in the last, last five years, and the fact that the repo challenge has still become, a flagship application in, the, the challenge has been really, incredible to see.

I think especially with the award and it actually being called an award for behind, SC has really been amazing. And, yeah, as Bell said before, and as I’ve said, the Ada e like it is a, a lot of legwork in terms of what they do, it’s really impressive to see.

Bilel: So when we look at the 2015, there was like one paper that voluntary, like shared like the artifacts. And, for SC 22, we have over 37 paper awarded with full budgets. So this means like over 45% of paper with three budgets, which is the best thing. So this is already a record, so every year people are submitting and this time when understand it takes time like to do, to collect the data to, put like a, in a proper manner into container.

And what we did this year, and thanks to the previous. Before the SC submission, couple of months ago, we did, some training and we had like this , seminar because part of the reproducibility is also education. So how you can do it, because people, when they look at the container, they’re scared, okay, I can’t do it.

And so what we did, we did three seminars, in the month of March, and they’re all available on YouTube in the main site of SC 22. And one was about like some tools like ecp, like how they can get this E for S. So this is how to, you can get access easy to many software built so you don’t need to reinvent and how to figure out how I can install some libraries.

Then we did another one with chameleon team, so how to get access to some. Exotic hardware from basic CPU to the latest ARM, like the similar one, like at, the Fuku. And last but not least, we had also a similar with the Jet Stream two people, so people can have access to larger resources. So that was like make also not only the life for the participant, the author for submitting paper, but also the reviewer. So the seminar was very beneficial to everyone. .

Nicole: Yeah. So that brings up a great point, to go also back to the award. I think, education is a really important part of making sure that people are able to do these things. But I think also, we have to incentivize people.

So whether it’s the badging or the awards, these things all have to go together before they. Before you reach your 40% with three badges, which is awesome. I love that. So what do you see in the future? Like what are, what is the next step? What is the next piece of education or incentive or the next piece that, continues this trend?

Rocio: I think it’s gonna be a mix of providing good guidelines accessible to everyone so everyone knows how to proceed either they want to reproduce or they want to make a reproducible, contribution. And I also think it’s a matter of all of us, kind of force everyone to made the results accessible and their codes accessible because it happens that if, unless there are legal issues that put a barrier, of course, unless that’s the case, why when you submit a paper, you don’t make the code available so the reviewers need to trust your results. I mean, we live in an environment, an HPC environment that provides you with all the necessary tools to share your code, right? So it’s a matter of education. And also, from the reviewers perspective of the journals, the conferences a model of taking responsibility and guiding everything through the mandatory of reproducible initiative, in my opinion.

Bilel: So today for the group disability, I see one main challenge is the standards of the badges. Today we have multiple badges award. Oh, so you have the acm, which is the one of the most major and most used, but you have also the IEEE, you have the open source. So which one? And each one has different criteria, and I think like the community, so we should ask some of our, predecessor or into libraries like what they did for MPI.

So if you, like, Jack said like when they had the pvm, they had so many, different libraries for the distributed communication. Then they come up together and they said, we do the MPI standards. So I think like this is the next effort that the community, the leadership come together with the editorial from ACM, IEEE, one family and toward a goal, how to make the standard.

And so it’ll make easier the life for the author to submit something and also for the reviewing and the people who try to reproduce the result and hopefully how, also in the long term, how we can make sure that this data stay forever.

Le Mai: I think that Rocio and Beel hit a lot of the same things that I was going to say. One thing I’d like to add about the reproducibility initiative in SC proper is what they’ve described. One, they’ve been coming from the leadership down in terms of, "Hey, if you wanna submit something, you have to provide this and we have badges and we will recognize you if you do this."

So that’s coming that from there, and for the papers. So already there’s that framework. And also that we will help you come to this by creating a framework where we will be evaluating this. And then the other part that I thought was really interesting was padding it from below, which comes with a student cluster competition.

By knowing that we have these, students who will, like you, come into work in HPC or in computing or in the sciences. And with this, they not only have experience, in the professional sense, but also because they’re doing the reproducibility challenge. They’re already thinking why is reproducibility important.

You know, what is good science? What does open science mean? And so there’s that thing where we’re coming up from who’s going to be filling in our shoes when we are going to be retiring, and also from, the people who are currently doing it, and also the top level, making sure we recognize the people who are pushing this forward.

Nicole: Yeah, that’s all really great. I heard a couple things in there. I think that there’s both right? Top down and bottom up . And I think you had also mentioned like a kind of a culture change in how we are incentivizing people to get this done.

You had also mentioned how we talk about reproducibility, right? We have these three different badges. We have at least three different terms by which we speak of reproducibility. But there’s also all the standards science wide.

Uh, and I feel personally like there is some disconnect between the way that we talked about reproducibility, how people outside computation talk about reproducibility, ACM switched theirs, because there’s different communities using different things. And so I wonder if you have any thoughts about the language surrounding reproducibility and what needs to happen there so We’re more productive, not just in our HPC world, but also when we come together and talk with like broader science, how we’re having more productive communications there as well.

Rocio: Well, uh, Jack was saying yesterday that when they had in mind MPI what they did was meeting, if I’m remembering correctly, each six weeks for three days. So they fully focused on that. During that time they said, those things need to be solved, so let’s figure out how to do it.

So I think the best idea would be combined people from different, different levels. So maybe also from students to the highest levels in the hierarchy of, big companies or big conferences, committees. So they put together their thoughts, their needs, their, barriers face.

And it does not depend on a particular organization, but common thoughts put together scientific environment, let’s say. That’s my opinion.

Bilel: Probably one thing, to continue about the challenge that, as you said, opportunity is like kind of more open science, but SC without the industry or vendor cannot be done. So the next challenge is, how the industrial partner or vendor will be able to share things. So there were like, nice paper, but unfortunately they were not able like to share full details about, some codes, which we understand because this is a business, but there should be like a common ground, or this is probably the award, like how to standardize.

Le Mai: It’s really hard question to answer. And I guess, I think Rocio touched on this, which is bringing scientists and computational scientists and as you mentioned vendors together to understand the importance of what they’re doing and how if you’re going to simulate or use your HPC to do science or research, that will then be peer reviewed about how important that that foundation is and without which we are on shaky ground with our results.

Oh, also in understanding where scientists have to understand that, computation has a lot of barriers in front of it in terms of sharing open science. There’s the industry part, there’s a heterogeneity of the hardware and the software that we’re using, and that maybe does not come across as well in the scientific community. So I think that that maybe discussion might help.

Rocio: I just want to add something that maybe it’s not everything can be reproducible, so maybe we need to learn that there’s open science on the one side and then companies software on the other. I’m not saying there are enemies, but I’m saying they have different target, different group of users and maybe we need to learn that they cannot coexist in the same.

Nicole: Yeah, yeah. So I agree. And I think, there’s also things to be like learned, right? There’s absolutely two different needs there, but also there are maybe things that we can borrow from each other. So what comes to mind for me is, you know, you say, oh, well, industry has things they can’t make available.

Well, there are some science like biosciences or whatever, where like we do have to either anonymize something or take other measure so things are both, you know, reproducible, but protective of bulls privacy or whatever the issue is. So not all science can be open as much as we would love it to be.

I guess unless you have anything else to add, I would transition into what your day jobs are and how you feel reproducibility comes up in them. Okay.

Rocio: So what, I’m an assistant professor there in Spain, in Universidad, and I combine teaching with research in my position. So when it comes to teaching, I’m starting to force myself to make the students put their assignments through git. I create private git repository so I force them to use it and submit everything and try to update it each time they make a change, not only the last day because that means you could have done it in whichever way. So it’s good to see the progress. And then on my research side, I did not realize, but I was lucky in the PhD because all the software I was using were, the libraries I was using were open source.

So let’s say I learned about reproducibility when I needed it. And I did not have it because I was, so used to have it that I missed it when I lost it. And it was when I started, working my research with, uh, fluid dynamic applications because I have read more than 20, 30 papers that claim for certain results following specific paths, but I haven’t been able to find their source code.

I would say like more of 90% of the cases. And in some of them I found them, but it was not usable for different reasons. So that’s what I’m facing right now. I want to compare my software with other existing softwares and it’s being very difficult. So I wish, I can contribute somehow with reproducible, software and I can have other people, do the same or follow the same.

Bilel: So my deal I’m Bidel. I’m a competition scientist at the KAUST Supercomputing lab, and so my daily job is to make sure that our user are happy with the super computer, Shane. So basically to summarize, this is the machine, the HPC system, super computer should work as expected when we purchase it and we accepted it. So it means performance, reliability, functionalities, and accuracy. One thing like what we do, like, time to time, we check that all components are working. So we don’t want the user complaining. We want them all happy. And so time to time we have to check and validate the result. So what does it mean, how we can reproduce the results? So we have of course codes.

We have tests that are, some of them are open source, some are priority, but this is already shared by the team of computation scientists, but also with the csad means and also with the onsite. So that the day that I am in vacation or I’m not here or in conference that don’t wake up me at 2:00 AM okay, Bilel, we need your help. And we have to keep some of historical data. So are we sustaining the same performance? And what will be the criteria of, is it okay or are we still, are we reaching the critical thing? So, because over time some performance made, But sometimes also it may improve with the new software and so on.

So we always think about code availability and so on, but also keeping the data, history covata, and to see how over time it improves.

Le Mai: So at IU, I’m part of the applications team, so quite a bit of my job as well as keeping users happy and, benchmarking machines for acceptance that we have.

And, I’ll echo what Bilel says in terms of benchmarking, and performance evaluation. That’s something that, one we wanna see, like, let’s say something happens in the hardware later on or something. A user is coming in and saying, oh, sudden I’m seeing performance degradation. It’s so helpful to come, go back, let’s say to your old HPL runs before the acceptance.

See your configuration file, see what you got then, and be able to, oh, I have that config file. I’m gonna compile it again. Or use that old binary that you compiled before and be able to test that. Another thing was, when I first started, my job in with the applications team, a lot of what I was doing was, doing analytics on data coming in from the systems in terms of usage, et cetera.

And for that, that was like a team-based thing. So where something where if I’m writing the scripts that is pulling the data and and doing analytics, if I move into another job, how can I make something that somebody can see what I did if I did something wrong and improve on it, pull more data, et cetera.

So I guess in that sense, reproducibility, exists in my work. Another thing I’m thinking of is when working with users at iu, we get a mixed bag and some of our researchers may not have heard of GitHub or are doing something perfectly fine on the systems. And then one day they bring the systems to a screeching halt, and we ask them, what’s changed?

We don’t see any changes in your job scripts. And they’ll say, oh, I changed my code. It’s like, oh, well, can I see any, what version you’re doing here so I can make a comparison? And many times they’ll say, oh no, I made these changes and I’ve changed it already, and I’m not using GitHub and that is an opportunity to talk to them and say, well, it would be really helpful if you did something like this or that we could do this, and we create that partnership.

So yeah. Another thing I thought about was, in terms of institute to institute, sharing, at one point, for example, we were bringing something, a service online that helped, gave a graphical interface to one of our, machines. And we were implementing something to do the load balancing on that for when users came on, since it was a shared resource. Later on our dear rivals. Um, I think you went to Purdue, is that correct? Yes. So our dear rivals, Purdue, were also planning on bringing some things, I think with the same product online that was similar and they wanted to hear about, what we were doing in the load balancing, and we were happy to share it and I believe they’ve implemented it as well on their side.

So in that sense, IU Purdue, you know, sure we’ll fight it out on the field, but when it comes to science, it’s really important to be able to share what you’re doing.

Nicole: Yeah. So I feel that, this is also an important aspect of developing standards, is that community development, right? Because it is not one size fit all as well, and so keeping those communications going really needs to be in the context of specific groups, which is really difficult right?

Yeah, to wrap up and summarize. I’d like to hear what reproducibility means to you. I think, it typically goes beyond whatever we try to define it as. And I think that’s actually a problematic way to go about defining it is over defining it. I think especially when we talk about this more broadly it means a lot of different things.

Whether it’s standards or some other components. What does reproducibility mean to you? Or, what are you most passionate about going forward in in terms of making reproducibility more accessible.

Le Mai: I think the word that comes to my mind when I think about reproducibility in general is a legacy. Right? You’re creating a legacy for whatever you’re creating. You’re creating something that future generations can build upon and make stronger. So if you have something that is reproducible, you know that the ground, the next person coming into this is not shaky. It can hold water and they will be able to build that next step. And so it is a legacy, it’s a human legacy.

Bilel: For me, reproducibility is, I would try to define it with, two words. Its availability, towards sustainability so that over generation, generation and so on, and basically she summarize it, Le Mai, it’ll be so that will have a legacy. Without this, we’ll not have reproducibility. Um,

Rocio: I would say honesty and honor because, if I had to explain what Albert Einstein used to say, you know nothing about a field unless you are able to explain it to your grandma. So I would tell my grandma, you know, when we are at the school, they ask us to justify every answer, and then we get to publish papers where nothing is justified. So we are missing something.

Bilel: There is a joke, like everyone wants to get results fast and so on. So someone said in his resume, I’m very fast in calculus, so the interview was perplex and said, what do you mean? Yes. Okay, you can test it. Okay. So the interviewer said, what is 53 by 27?

And the new applicant said quickly 128. It’s wrong! Yes, but I told you I’m fast. I did not tell you like I will give you the right answer , so it’s not reproducibility .

Le Mai: Yeah. Oh yeah, please. I just wanted to add something. As I was listening to both Rocio, you and Bilel talking about what they thought in general, which was, it’s equitable.

Having reproducibility makes equitable science being able to leave this, it gives anyone, an idea of how they can do this.

Nicole: Great. Thank you guys so much for coming. I learned a lot. I think that summary of what reproducibility is, really explains, how much of an important human issue this is.

Right? It really is very big and you guys are doing a lot to forward the community. So thank you very much for coming today.

All: Thank you. Thank you for sharing that. Give us okay. Yes, continue. It’s a lot of. Because I know like some people when, I try to bring this committee and say it’s too much work, work.

Guests

Le Mai Weakley

Le Mai Weakley

Le Mai Weakley received her PhD in Mathematics from Indiana University in 2013. She is part of the Research Applications and Deep Learning team at Indiana University.

Rocío Carratalá Sáez

Rocío Carratalá Sáez

Dr. Rocío Carratalá Sáez received a B.Sc. Degree in Computational Mathematics by Universitat Jaume I (UJI) of Castellón (Spain) in 2015, M.Sc. Degree in Parallel and Distributed Computing by Universitat Politècnica de València (Spain) in 2016, and Ph.D. in Computer Science by UJI in 2021. She is currently an Assistant Professor at Universidad de Valladolid (UVa) in the Department of Computer Science. Her main research interest is High Performance Computing, focused on the parallelization of linear algebra operations to enable optimal executions in high performance and heterogeneous systems.

Bilel Hadri

Bilel Hadri

Computational scientist with more than 10 years experience in HPC along with more than 5 years in leadership computing centers. Interested in monitoring, analyzing and bechmarking applications to effectively and efficiently use supercomputers.

Hosts

Nicole Brewer

Nicole Brewer

I am a software engineer in the Scientific Solutions Group within Research Computing at Purdue University. My primary contribution is to the NSF-funded science gateway, GeoEDF, for managing, sharing, analyzing, and visualizing geospacial data. I have also worked in a facilitation roll in a research lab where I implemented and recommended standard practices, training research scientists, wrote workflow-enabling tools, and deployed complex UIs in Jupyter. I’m passionate about improving the state of science through sustainable software practices, the treatment of software as first-class research objects, improved institutional support for research software engineers, and diversity and inclusion. Catch me conversing about these topics on the Long Tales of Science Podcast - a podcast about women in HPC!