Disciplining Data: A conversation with a school of information sciences dean

Harvard Law School Center on the Legal Profession

Wednesday December 1 2021

Eunice Santos, professor and dean of the School of Information Sciences at the University of Illinois Urbana-Champaign, recently sat down with David B. Wilkins, faculty director of the Harvard Law School Center on the Legal Profession, for a conversation about the intersection of information sciences and the law, and how to train students to be effective collaborators and translators between the disciplines.

David Wilkins: Thank you so much, Dean Santos. I'm just absolutely thrilled to be able to talk to you about a topic critical for our two professions: how to think about and use information and data. As the dean of the School of Information Sciences at the University of Illinois, I could not think of a better person to engage with to explore these issues.

I'd like to start by going back a little bit to the origins of the field, which, as I understand, grew out of library science. There's a really interesting connection with law there, because "law" has always been kept in libraries—in fact, I'm currently looking at Langdell Law Library, which is the largest law library in the world. Could you talk a little bit about the origins of information studies and how it emerged from libraries?

Dean Eunice Santos: I'll start with how library and information studies became information sciences. As you note, our School of Information Sciences grew out of library and information science or studies. Many of the core disciplines that we now study—information management, information retrieval, preservation, ethics of information, and others—all find their roots in libraries and from the library science field. Over time, we came to realize that many of the skills in library sciences came to be useful in many, many other areas. This led to an odyssey of opening up the umbrella further so that people can truly be trained and understand the role of information in society. The solutions, the assessments, and the understanding of the implications and ramifications of information are applicable not only for libraries but also for the world. The underlying philosophy has deep roots in library sciences.

Informatics is a part of this. If you talk to three people, they'll give you three different definitions because informatics has many different roots. How do we look at it? When you use information technology, information processing, and solution building and assessment—and when you apply it to a field—that’s when you start to realize there's informatics. And that's where you also put a word in front of it, like bioinformatics, legal informatics, medical informatics, and so on.

Once you have that application of information, and the technology surrounding it, that is when you can start to build solutions or ask the hard questions about what information is useful: How should we use it? What are the privacy aspects? What are the preservation aspects? What about the ethical implications?

All of that starts to build up a specific subfield, like legal informatics or medical informatics, or even things like cheminformatics and social informatics. For us, in the iSchool, both information science and informatics are two core disciplines that help us to focus on our vision, mission, and future. We are focused on what is it that we are really trying to understand and the role of information in that process. We want to make sure that we aren't just building something to build something and not understand what's going on under the hood.

Wilkins: That's fascinating and, as you say, very relevant to law. One of the things that I'm interested in is that it's both broad but then could also be deeply specialized in particular domains. I wonder, if you'd talk a little bit about those two layers. In other words, you say on your website that you prepare students for all kinds of careers in different fields around informatics, but then also you have a number of specialized degree programs, including in fields like bioinformatics. What is common across the fields, and then how do you think about all of this in more specialized ways?

Dean Santos: As you say, we have more general degrees in information management and information sciences as well as, for instance, a master's in bioinformatics. In some ways we take a hybrid approach. You need to have a foundational understanding if you want to specialize. I'll take our undergraduate program as an example. You come out with a BS in information sciences where you learn the foundation of information sciences from a variety of perspectives. You learn it from more of the computational side, the sociological side, the humanistic side, the library side, and information technology. But you have pathways in which you can then specialize that are part of your curricula. For example, you can specialize in human-computer interaction and user experience. You can specialize in galleries, libraries, archives, and museums. You can specialize in narrative design and game studies. You can specialize in data and society or data analytics and science and other pathways.

Wilkins: I take it from your knowledge and familiarity already that you have seen some applications in law. Do you see students interested in law, either as going from your undergraduate program to becoming lawyers, or perhaps going to become legal technologists, or legal operations?

Dean Santos: I will say that our undergraduate program is actually quite new, so we've only graduated a small number. I'm going to more talk about our graduate students.

We have a number of our graduates who have gone on to be law librarians. Those that have been recruited to the legal profession are individuals who understand the role of information and are able to gather it, process it, and understand how to synergize and preserve it.

We also have had individuals who have law degrees and came to us to get their master's degree in library and information sciences (MS/LIS) specifically to become law librarians or bring it back to their law firm. And, on the flip side, we've had a number of folks who got their MS/LIS degree and then went on to get their JD and became practicing lawyers.

So, we've seen this from many different angles where you have individuals who are incredibly well versed in information, informatics, and law. And this is where we need folks, and we hope to see more of it. We can help to seed and produce hybrid thought leaders, because of all these really critical problems that are coming up.

Wilkins: One of the interests of the coeditors of Legal Informatics is to create these kinds of interdisciplinary and hybrid programs around informatics and law. I'd love to get your thoughts and comments on two big issues underlying this need. The first is what might be thought of as predictive analytics—using the power of machine learning, computational science, and other fields to predict legal outcomes, for instance, court opinions. You see people trying to analyze the reasoning of judges, the wording of judges, and what arguments will appeal to judges. The second is around trying to harness big data and what it means for legal outcomes—for instance, every company has millions of contracts, most of which no one has any idea of what's in them or what their relationship is to each other. Each raises interesting issues, conceptually and normatively. I wonder, do you talk about these things in your school, and how you think about the ethics of, as you said already, the information like this, and the implications and ramifications of it?

Dean Santos: These are critical research areas in information sciences. When we look at machine learning and all of this, a lot of this is how you train and on what data you train algorithms, and what ways that this may set a certain course of analysis and prediction. There are critical questions at play: What information at your disposal are you using? If you train and you are making assumptions on the data, that's also going to send it in a certain course. At the end of the day, what does accuracy mean here? What does it mean to be predictive, and at what level? How are you really trying to analyze, and are your experiments actually scientifically sound? Are you accidentally instilling bias, even though you don't think you are? Researchers are trying to tackle all of these questions. The other issue relates to whether you can really explain how you got those results—what is the "explainability"? Can you look under the hood and really take a look at how the results were derived? Those are big questions that researchers in the field are looking at. It is very meaty. There’s going to be a lot of different individuals that will look at it from different angles to be able to make progress.

Wilkins: This is so helpful. One question that leads me to is what law, which is much more ancient in these discussions, might learn from other fields that have thought about it more, like health care or engineering.

Dean Santos: There's a field in AI called trustworthy AI that is really trying to look at the ethical implications and what it means to have a system that is trustworthy. That's very general, but has been looked at in a number of different fields. It would be something that legal informatics would really want to have their work overlap with, particularly given all the things law is doing with respect to big data and predictive analytics, as you just described.

Wilkins: I think that sounds like something people should be looking at quite carefully! As you may know, there have been questions around how bail is being set or whether implicit bias is being built in. But there's also a set of questions around, as you say, how to ensure accuracy in these kinds of processes. On the second big piece, as I mentioned, is that companies are now trying to figure out how to drive accuracy, predictability, and efficiency within their world of contracting, where the heart of their data lives. Are there any lessons on that side of the ledger?

Dean Santos: There are. When you take a look at data and information, especially when you're looking from the big data or large information scope, just because you have a lot of data doesn't mean that it's all equally useful. It also doesn't mean that it's really accurate. The more information you have, the more likely there's a lot of noise and contradictions. People often think having more data is wonderful—and on one level it is. But it also brings up a lot of complexity and requires a lot of nuance in trying to be able to understand what's really going on. For instance, how do you appropriately deal with contradictory information or data, and when you do that, how does that really affect your outcomes?

Or let's say that you have heterogeneous data sources. A lot of people just think, I'll grab data from everywhere. Well, the way the data is collected can be very different. And some of it may not be that reliable. These are all very critical things coming up. It's not just, grab data and you all throw it into a big pot, and then you just see what you do with it. There are many quandaries that come up that you have to be aware of.

Wilkins: My guess is, one of the biggest challenges in your field is how to connect with people who are completely unsophisticated around these kinds of questions—and who are, quite frankly, either intimidated by or nervous about data, which in my field is particularly acute because many lawyers went to law school because they didn’t really like numbers—with those, like your graduates, who have the needed skills.

So, what do you wish the users of the data, or the people you have to collaborate with, whether they're lawyers or doctors or engineers or policy analysts, what do you wish that they knew about data science and informatics, and what could we do to help them be able to collaborate better with you and professionals in your discipline?

Dean Santos: I'll start by saying that data and information doesn't have to be about numbers. Information's information. It can be concepts. It can be words. But, at the same time, it can be scary. Even for people who do research within the field, all this data and not knowing what you can really do with it is a big question.

Here's the thing, for people from the information science and informatics side who are branching out into an applied field, they need to understand what the data and information really means, how it's collected, and what is it that you're really trying to do with it. What is important about this information becomes very critical. From discipline to discipline, there are different things that they're trying to get out of it. For instance, data visualization is one of the first things often done to help understand your data. But, part of this process has to be, to what point? What are we trying to do with the data on an applied level?

You've given some examples from the legal profession. There are many others. All of them need to build that bridge of understanding of what your needs are and what our capabilities are. And that generates the fascinating questions that need to be asked about whether solutions exist or not, and how we can jointly get there. Without that you're really not able to create the solutions or the thought leadership of what the solutions are for.

Wilkins: Dean, this is so helpful and so thoughtful. I'll just leave with one last question, which is to ask you maybe to look a little bit in the future, in the next, say, five to 10 years. What do you think are the big trends around your field that people who are in law should really be paying attention to and that are going to reshape the world in the next five to 10 years?

Dean Santos: There's so many things going on, and that's why it's so exciting! We're really starting to understand all the ways that we look at information and technology, and in doing so there's been a mad dash to just create and be pushing the forefront. But we need to take a step back and ask, "Yes, this can be done. But if it's done, what is going to happen? If it's released out to the world, what are its effects?"

I'm a computer scientist by training, and being able to push the forefront of technology is fantastic. But, pushing it without doing the deep dive into understanding the implications of doing it is one of the things that are creating problems in society right now. In five to 10 years, I hope that we will have the research needed that defines the ethics and the trustworthiness of what we're doing. We need to gain a better understanding of the things we should be doing in building systems so that ultimately we produce reproducible assessments of overall effectiveness, and that the solutions created take into account bias. That's what a lot of research is being pushed forward, and that is very timely and critical for any field that has the discipline plus informatics approach.

Wilkins: Dean Santos, I have no doubt that our readers will have benefited immeasurably from hearing these really thoughtful comments, and I hope that it will spark in all of them an interest in moving forward and understanding how to put these two disciplines together, which, as you say, have to go together if we're going to make progress on the big issues that we face in our world.