Preserving Integrity in Online Social Media – ADM+S Tech Talks
31 May 2021

Prof Flora Salim, Associate Investigator, UNSW node (chair)
Alon Halevy, Director, Facebook AI
Watch the recording
Duration: 1:16:54


Prof Flora Salim:

I like to acknowledge the traditional custodians and ancestors of the lands where RMIT conducts our business. This talk is held as part of the centre of excellence of automated decision making and society, and the tech talk series. And Alon is here also to give a talk for some of our students who are doing a masters of Amsterdam data science in RMIT, and also I like to welcome colleagues from the school of computing technologies and school of engineering, as well as colleagues from the centre of excellence in automated decision making in society from all over Australia, and possibly the rest of the world, too. So, I’d like to introduce Alon. It’s great to have Alon here. Thank you for being so generous with your time. It’s evening there where you are, and thanks for giving your time to give a talk about what you’re actually doing right now in Facebook. Alon is a director at Facebook AI since 2019, and prior to Facebook he was the CEO of mega gun labs and he led the structured data research groups at Google research, and he developed web tables and confusion tables. And prior to that he was a professor at university of Washington. So, Alon, go ahead.

Alon Halevy:

Okay, well thanks. Thanks for having me and I’m really sorry I can’t be there in person. I want to start the standard way one should start these talks. Ten years ago I wrote a book about coffee culture around the world and so I traveled to 30 different countries to drink coffee. And the question that I often got after or during writing the book was where is the best place in the world for coffee? And I always gave an unequivocal answer which is Melbourne, Australia. There’s no better city in the world if you just want to spend your time going around from one cafe to another. Now, the first time I gave a talk in Melbourne and said this, I got a standing ovation, so I will just assume that you’re standing and evading. But today we’re going to talk about something related, which is preserving integrity in online social networks. And I say related because when coffee came onto the scene in the 15th and 16th century in Europe, we saw a lot of the same things that we’re seeing today as social media is taking the world by storm. And let’s go to the next slide.

So, I’m going to be talking about some of the – I’m going to give an overview of the field and some of the AI techniques that have been used to combat, or to preserve integrity in online networks. If you actually want to see a much more detailed and careful presentation of this, there is a paper with the same title on archive, and there’s a shorter version of it that will appear in the communications of the ACM before too long, by the set of authors that you’re seeing here. So, let’s go to the next slide. And I assume questions, I mean I’m happy to take questions anytime.

So, we all know that social networks bring a lot of good to the world. The best way we know that is a story that we’re told on the day that you joined Facebook, which is that there was one day when Facebook went down. I guess this was in 2014 and people in Los Angeles were calling 9-1-1 asking what happened to Facebook and the head of police there had to Tweet that you know, Facebook going down is not a law enforcement issue and they should just wait patiently and talk to each other without their iPhone. So, clearly social networks are doing something very important. But on the other hand, they also bring out – just like any technology – they bring out some less desirable aspects of humanity. So, people use social networks in order to propagate hate speech and misinformation, to sell stuff they’re not supposed to sell, bullying people, to show graphic violence, and any kind of violation that you can imagine. Humanity has used social networks to affect that, so the field of integrity, which is defined as the field of keeping people safe on social networks – so that means companies will create a set of policies, these policies are outline, are obviously influenced by the laws. But in some cases they go further than the law and say what you can and cannot have on this particular network. And then what the companies do is they set up systems and pipelines and workforces in order to enforce these policies. So, that’s the field of integrity. You should be able to come to a social network and not feel violated in any way. Next slide please. I’m trying to do this here on my thing and it doesn’t work. So, why is this so hard? This won’t come as a surprise. There is posts are coming at amazing rates onto social networks. The amount of content, the diversity of language, over 100 languages. The cultural nuances of what is deemed reasonable in one culture may not be reasonable in another. The kinds of violations that we see on social networks are – I’ll give you soon a list of 50 of them – if you do a back of the envelope calculation of well, can I just throw a bunch of people at the problem? The answer is even under very conservative estimates, you need 12 and a half million people per day, looking at all the content and making sure that it doesn’t violate the policies. And these people need to understand their languages and you see where this is going. So, AI is not a nice to have in this problem, AI is a necessary part of the solution. Without AI today, we would be – there would be too much violating content on the network. Now it’s not like it’s a simple problem to solve, even when you look at content. So, they’re the decisions that are being made, are really difficult. And it really boils down to keeping a balance between free expression, letting people say whatever they want to see, whatever they want to say, and enforcing certain rules. So, as you can imagine, this is a very deep cultural balance that needs to be treaded. And this is again, the fact that you’re doing this in almost every country in the world, doesn’t make this any easier.

Now I’m going to be talking a lot about integrity that happens when you look at posts on social media, but integrity doesn’t involve only posts. It can apply to accounts, having fake accounts or accounts that are trying to impersonate, or just to act in bad ways. The violations can be in replies to posts. So, people can you know, initiate hate speech and replies. The post can be in messaging and point-to-point messaging between pairs of people, and that’s even harder because the messaging applications are all going to be there. They either are, or going to be very soon, end-to-end encrypted. Which means you can’t actually look at the content to decide whether something bad is happening. And ads as well. So, ads are also a source of violations of community policies. So, you know I’m not going to talk – I’m not going to obviously be able to cover everything but I’m going to try to cover the main trends and some of the challenges that arise in this problem. Now, obviously I’m coming from Facebook, so a lot of what I’m saying is grounded in what we do at Facebook. But nothing of what I’m saying should be an endorsement of any technique by Facebook. If I would, I’d have to talk to many lawyers before I could say this in public. So, I’m going to use Facebook as examples, but just in order to ground the conversation.

Let’s go to the next slide. So, I’m going to spend quite a bit of time on defining the problem, because actually defining it is not trivial. And then I’m going to pose a framework for how we can start thinking about the many aspects of the problem, and try to go through some of the latest techniques that have been used to address integrity. So, let’s start from the problem definition.

Next slide.

Okay, so community policies is what Facebook or Twitter or YouTube, or any other company, they will publish a set of policies that say what is allowed and what is disallowed on the network. Now, this is just an excerpt from the Facebook community policies, but the point to notice about this is that it’s really subtle. So, for example, there’s a category of violence and criminal behaviour. So, you’re not allowed to have a presence on Facebook if you’re representing an organisation with a violent mission, you’re not allowed to support individuals or organisations with violent missions, and you’re not supposed to incite valiant violence, or to admit violence activities. Okay, so that all sounds good. But on the other hand, you can provide instructions on how to create or use explosives if there is a clear context that it is for non-violent purposes. So, if you’re a chemistry teacher you can teach your students how to blow up your lab. But so now, you can see the subtlety here. So, you can talk about fire, you know, explosives in a very specific context. But not in the vast majority of context that these are spoken. Or another example is you’re not allowed to sell firearms. You’re not allowed to sell drugs on Facebook, but you are allowed to talk about the policies that you know, the laws that people set on whether guns or drugs should be allowed to be sold. So, even for a human it actually takes quite a bit of attention to look at a post and see whether it is from the second column of this table or the third column of this table. So, now you can imagine if you’re trying to endow an AI algorithm, the capability – this is going to be pretty challenging, let’s go on to the next slide.

Now, there’s subtlety is not only in the policies themselves, but it’s also in the way things are intended or used. So, for example there are a set of words that I cannot say in a public talk that are used to refer to people of particular protected classes. Okay, now these words – so you can’t use them, I cannot refer to somebody of some protected class using these words. But on the other hand within those communities, these communities have often adopted these words as a way of just sort of speaking casually to each other. Okay so, you can choose your favorited projected class and choose whatever word to refer to these people – to people of that class – and sometimes you’ll find that people in that class are using that word just colloquially. So, who is using the word and how they’re using it really matters.

The other thing to note is, it’s really hard to set absolute rules. So, we can say look, nude photos of children are definitely not allowed on social media, but if you look at this photo here, that is an iconic Pulitzer prize-winning photo that was taken during the Vietnam war. You see a nine-year-old naked girl running from explosives in Saigon, and this is part of culture, okay. So, this photo, even though it does contain material that would otherwise be allowed, is part of culture that we would allow. Okay so again, extremely subtle. Let’s go to the next slide.

Now, this is just a partial list of kinds of violations, and you’ll get really depressed if you watch, if you look at this list. So, this is hate speech and misinformation that are at the bottom of the list. They’re actually the most common and the most – the ones that receive most attention. But there are all kinds of things like people trying to exploit kids and human trafficking, and financial scams. Oh, well those have been around since the 90s right, people trying to encourage others for self-injury or suicide, people calling on violence in general. So, each one of these is actually, when you go into the details, they require different kinds of techniques and classifiers, and each one is a world onto its own.

Okay, now these violations have been discovered over time as people started using social networks, and companies have set up teams in order to address these violations as they became prominent. Ideally what we’d like now is to actually look at the technical aspects of preserve integrity and find techniques that can apply widely to many of these violations. Okay, let’s go to the next slide.

So, this is what this is. Maybe my most colourful slide. This is what the enforcement pipeline looks like. Okay, so this is again, this is similar to a bunch of companies. But what happens is – so stuff happens on the network, right. People create accounts, people post, they pos, they make comments, they send messages. These are the events that happen now. When these events happen, there are two ways of discovering violations. One is when people, users, will report saying I think is a violating post. Okay, the other is these posts go directly into AI based violation classifier. So, this is the technology that I’ll be talking about today, powers these AI systems that look at this content. And we’ll do one of two things. One is if the classifier is extremely confident that the post is violating, it will remove it immediately. Or possibly down rank it so the probability that somebody will see it is very low. If the level of confidence is not that high, then what will happen is it’ll queue it to a set of community workforce of people who enforce community rules. So, they would look at posts, they will go through some user interface that enables them to make a decision, hopefully quickly. And when you know, if there’s enough, if there’s a majority that says that this is violating, the post will be removed. So, this is what happens for most of the violation types except misinformation. Misinformation is when people try to say that trump won the election or that, I don’t know, Clorox cures coronavirus. Now in principle, one could do the same thing right, one could treat misinformation as just another violation. The problem is that with misinformation there is some sort of ground truth. And so Facebook or any other social media company doesn’t want to be the arbiter of truth in the world, that would be a really bad situation to be in. So, what these companies do is they send the suspected misinformation posts to third-party checkers who are you know, they go through all kinds of qualifications and they come from the broad range of the political map, and they’re not employed by Facebook. They are – or the social media company – they are independent organisations and they actually, what they will do is they will do the deeper research that is needed in order to figure out whether the claim is false or not. And usually that research depends on some expertise in politics or medicine, or what have you. And they will make a judgment that that will be upheld. Now many misinformation posts are just variations or different ways of saying a post that we already claimed, that we already know was debunked or validated. And so what we do, is if we see a new post that we suspect is misinformation, we will try to match it against a post for which we already know or claim that we already know it, whether it’s true or not. And if we can see, if we can transfer our learning from that one to the new one we will perform the appropriate action. Now you can  always – so what does it mean to enforce? It means you can delete accounts. You can delete content, you can demote content. You may have to call law enforcement if you suspect that somebody is trying to commit suicide, for example. So they’re again depending on the violation, the enforcement can look to be different and of course there’s an appeal process in place. So, you can – or affected organisations or individuals can appeal, and those appeals are obviously looked at. lLt’s go to the next slide.

Before we go into measurement, let’s take a broad look of enforcement. Okay so, the one thing that is obviously done is if we find content that violates our policies we will remove it. Okay, we will remove it completely from the network. It won’t be there anymore. Let’s click, this is an animated slide, yeah. So, another, now the problem with that is when it’s clearly violating, that’s fine and we should remove it. But a lot of content is maybe borderline violating, may be offensive to some people and not to other. And so, one possibility that you can consider, that is a more softer you know, treatment, is to downrank offensive content. So, downrank means when you go to your Facebook feed, you will just see it. You know, it’ll just be ranked much lower in your feed and maybe you will decide to go to sleep before you get there. Let’s click again now, some of these decisions or some of these – the sensitivity is obviously not uniform across people. So, for example in some cultures it’s okay to see you know, more skin than in other cultures, or certain phrases may be considered, okay, and might be offensive in some cultures and not in others. So, another way of thinking about enforcement of integrity is to make this a very personal decision. And so since your feed is already personalised to your preferences, you can imagine that being you know, the personalisation of whether to downrank the content is also – sorry the decision of whether to downrank the content – is also a personal one. Now, so these are all good, and this has to do with integrity and material that is that harming to individuals. But now we can click again this – what I would say that the more integrity is not solved by any stretch of the imagination. But what we’re seeing is that a lot of the bad experiences that people are having on social networks are not so much because of clearly violating content. They’re because the goals and the values that people have when they come to social networks, are not necessarily aligned with the values of the ranking system of these recommendation systems. So, the new buzzword if you want, is what’s called human value alignment. How do we align AI system – the values of an AI system – with the values of humans who are using this AI system?

So, social networks are one example – or I should say recommendation systems are where we see this problem quite a bit right now. But you can also imagine this with you know, self-driving cars and any AI technology that is used. How do you make sure that it’s got the right values? So, what I have on this slide is three books that can introduce you to the problem. One is uh by Stuart Russell who actually goes through a lot of AI popularised in a nice way and then explains the alignment problem. Brian Christian’s book is much more direct. Talks about alignment, and if you want a beautiful book, just fiction, has nothing to do with you know, well it does have to do, but it’s a beautiful example of where alignment does work well and beautiful nozzles that just came out earlier this year. So, just the point of this is integrity is one aspect of human value alignment. Clearly seeing violating content is something that would be against human values and there’s a lot of, there’s a lot of work on this in social media companies.

Okay, next slide.

So, before we can actually talk about techniques we need to talk about how we’re going to measure success, because if you can’t measure progress then you’re not going to – it’s hard to know what to do. And you thought measuring is going to be easy. No, it’s complicated too. So, essentially what you would like, is you would like to measure the prevalence. So, now we’re looking at the problem of we have content that is violating our policies, how prevalent is that content in the platform as a whole? Okay so, you can try to estimate what percent of posts that are on the network are violating. Now that’s hard to do because you know, it’s like measuring the recall of web search, right. You can’t go through the entire red and see okay, oh I got everything out. So, you know, we take a sample and within that sample we look at how much – what fraction of violating content there is. But that’s not satisfying because not every post has the same visibility, right. You might have a post that is seen by three people, and others that might be seen by millions of people. And so you want to weigh those differently. So, people talk about prevalence of bad experiences. So, you can think about what is the number of times that a user – any user – has seen a post. So, if a post was seen by a million people, well that counts for a million. That’s better but that also does not completely satisfy what you really want, because some violations are very severe and others are yeah, okay. So, there was maybe a hint of you know, of hate or something like that, but it’s not as severe as something that is clearly hate speech. So, how do you measure what is the severity of the violation of a post to a particular user? That problem is incredibly hard and I don’t think anybody knows how to do it yet. But that’s – at the end of the day, that’s what you want, that also sort of ties it back to the problem of human value alignment because each person has their own sort of tolerance of certain kinds of values and that’s going to be a hard problem to figure out. So, yeah.

Prof Flora Salim:

I’m just wondering do you want to take questions in the middle or pull it all at the end?

Alon Halevy:

Ask me now and I’ll see if it’s yeah, ask me now.

Prof Flora Salim:

There’s a question from Lynn. Lynn do you want to ask that question directly?

Participant 1:

Yeah sure. So, when you talk about aligning the software with the values of the users, isn’t there a risk that by aligning with the values of some users who maybe don’t have very good values – eg, far right groups or something – that you end up propagating that kind of value system and material within those groups, and strengthening those kind of values that you actually don’t want?

Alon Halevy:

That’s a great question. There are several issues there. One is when we look at values, we distinguish between values for an individual and values for society. So, there’s an open question. There’s a question, okay, it’s open of whether, when individual values or conflict with societal values, what should you do? Okay, again this is a moral dilemma that is not new to the social networks. There’s also a question of should the social network let you get into extreme material or extreme groups? Now, we actually remove extreme groups, but still there’s borderline stuff. So, should we always give you exactly what you want, or should we protect you in ways that may seem paternal to some individuals. So, yeah, all the risks you’re talking about are there.

Participant 2:

And Flora’s just invited me to speak. So, I might just add in a quick comment if that’s all right. So – and thanks to Lynn for raising the question, as well, and thanks for your response. I study hate speech, particularly hate speech directed at women, and I mean, I think I do agree with Lynn, and I think the key concern with target group members particularly in the hate speech context is not necessarily protecting them from seeing the material themselves or just protecting them from seeing the material themselves, but it’s about protecting target group members from either others behaviour, related to other people seeing that content. And as far as the framing of it as an individual values versus societal values dilemma, I guess I object to that slightly because it’s not so much a question of kind of moral value but a question of material harm. And we know people who have studied this kind of speech and also people who regularly experience it know that it has material constitutive and causal harms that don’t just occur online, but seep into people’s real-world identities. And yeah, I’d be really interested to know what you think Facebook and other platforms can do in that regard. It’s not so much about the individual user experience but about protecting vulnerable groups from low as a result of other people seeing this speech.

Alon Halevy:

Yeah, no, thank you. Thank you for that question. Just to be very clear, hate speech towards women or any other protected group is not allowed on Facebook, so today, Facebook does the best it can to remove that content for anybody. So, I hope I didn’t give the impression that that’s not the case. Okay, now I think the question surfaces again when there is content that is you know, it doesn’t violate the policies but it might be borderline okay, and I’m not you know, I don’t have an example in mind right now because that is relevant to what you just asked but again, I just want to be very clear. There’s no hate speech towards any protected group that one can imagine, or at least the ones that are in the policies, is not allowed. Not to the people who are – not to the group itself, not only to the group itself but then anyone else.

Okay, so two guys are not allowed to talk about – I’m not allowed to convert, to post about hate speech that way, okay.

Participant 2:

Thank you. I mean yeah, I guess that then depends on kind of the effectiveness of identifying and responding to those kinds of – and also what the policy definitions are. But yeah, I agree. And what this talk is about is to show you that it’s not an easy problem. So, it’s actually amazing to me how AI techniques in the last few years have made a huge difference here, but it’s not – we’re far from perfect, okay. This is not – and it’s not that anybody is being lazy, it’s this is a hard technical problem that a lot of people are working on.

Alon Halevy:

So, yeah. What about the situation along where it’s not exactly hate speech, but maybe some groups have very particular ideas say, about the role of women or so on, and they talk about this and perpetuate those ideas, and if you align to those values then you what with Facebook aligns to the values of that group and feeds them more of that kind of information, it sort of solidifies those things which aren’t hate speech, they’re not illegal, but yeah, do you want to be solidifying people’s opinions in that way? It’s the same issue as giving them news posts that fit what they already think.

So, first of all I’m not – I want to be very careful. I’m not an expert on the policies, but there is, I mean, you can take this example, right, and say let’s talk about political opinions okay. You might have a class of people in a particular country that have very certain political opinions. They’re not – it’s their right, their democratic right to have whatever opinions they want as long as they don’t you know, call for violence and – but they do have a right to talk to express their opinions, okay, and share their opinions. Now, when they do that, that doesn’t mean that Facebook is aligning with those values or any other social network is aligning with those values. That’s a very strong leap of something okay, but I think what you’re getting at is the tension between free speech and not okay, and I don’t think any social network wants to be the police force of the world in terms of preventing people from being able to say what they think. Now again, that’s exactly why – and you can argue with the policies okay – and I should also say the policies change all the time. So, just a few months ago you know, Facebook finally decided that you’re not allowed to deny the holocaust on Facebook. Kind of late, but better late than never.

So, these policies are also very dynamic, but it’s a very careful – it’s a very careful balance between free speech and what you call a violation. It’s not an easy, and it’s just you know, it’s not an easy line to tread. These are issues that are constantly on leadership’s mind here at Facebook, and other companies as well.

Prof Flora Salim:

Thank you Alon. We will let you to continue talking. I didn’t realise Lynn was here. Hi Lynn. It’s long – we go way, way back, indeed. Many, many years, yes. Wow, okay. Let’s go to the next slide.

Alon Halevy:

Okay, let’s go to the next slide again.

So, and you can click twice so we don’t need the animation here. So, what we want to do – so this is a talk about AI, okay. I know it’s much more fascinating to talk about the policies and the implications on society, and we do that all the time, but for the next while let’s focus on the eye. And by the way, let me know when I should stop talking, I’m not quite sure when that is.

So, when we look from the technical perspective, now our problem is we’ve got a post or we’ve got an ad, we’ve got a comment on a post, and we need to figure out whether it’s violating or not. There are three things there. There are three sources of information that we can go on. There’s stuff that happens before the post. So, for example, who is the user that is – or the organisation that is posting this? Do we know about certain events in the world that are going on that might be you know, relevant to this post? Are there any coordinated activities between multiple users? Maybe bots that are coordinating some action here. So, this is before the post. Then we have the post itself. In the post itself we have the content, we have the language, the image, the video. And you know, everything that you can imagine extracting from the post. And then after the post is made, we have the behaviour that users exhibit. So, this can be the reactions like the lights and the loves and the ha-ha’s. It can be the reshares, it can be the actual comments that people write there that can often give you a lot of the intended meaning behind the post, and you can see it coordinating activities again. So, you can see people who are – or organisations, sets of users – that are trying to propagate the post in an artificial way in order for it to get more attention than it deserves. So, I’m going to be talking mostly about the middle column here, the post itself. I’ll touch a little bit upon the behaviour later. So, let’s go to the next slide. So, here basically, here what we see is a tour de force of AI, okay. So, this is the latest techniques in natural language processing and computer vision are being applied to try to detect violating posts. So, broadly speaking we have natural language processing. We know language is difficult and we know that 100 languages is even harder. We have computer vision techniques that are doing amazing things today, and then they come together in multimodal – or means multimodal posts – where the text might be benign, the image might be benign, but when you actually put them together it all of a sudden becomes hate speech. And I’ll show you this in a second. Let’s go to the next slide.

So, this is probably pretty obvious to most of you in natural language processing. So, a lot of the information that you see in a post is what people write. You know, the text in that post. Now, we’ve been very fortunate that in the past few years, the huge strides that the natural language processing community has made basically with pre-trained language representations has enabled us to to identify a violating post with much higher accuracy. Now, the thing – the gist of this, for those of you who aren’t experts in NLP, is that previous techniques in natural language processes were looking at words or phrases. They were brittle, they didn’t understand the sort of context that these words were used in. The nice thing about these pre-trained representations is that they’re actually much better at understanding the context in which a word or a phrase is being used. So, you’re actually able to make a much more accurate prediction of whether the particular use of a word or a phrase is violating or not. The other exciting area that happened here is multilingual translation. So, yeah, you have plenty of training data for English, even with funny accents. Like you all have, but in there are many languages that have very much less content on the internet and therefore the systems that are trained, the machine learning systems that are trained with the available data, are going to be weaker. But with most – with the translation systems that actually take a sentence from one language translated into some internal representation, that is sort of oversimplifying here. Sort of language independent. And then spewing it out in a different language has enabled a lot of detection work to happen, much better accuracy for languages with low resources. So, that’s pretty exciting especially for example – one example was our Burmese, where obviously in Myanmar there were some pretty, a lot of integrity violations happened there in the last few years.

Let’s go on to the next slide. Now, an area that is relatively less explored is – it’s not just what you say, it’s how you say it. Okay so, if somebody appears to be angry or somebody appears to be sad or excited or depressed, just that kind of affect can really affect whether – or can give you a pretty strong signal sometimes, of whether something is misinformation or hate speech, or those are two examples. So, this is the area known as effective computing and it’s now starting to be applied to detecting integrity violations. There’s been some great – some of the papers that are linked here – there have been some research papers on showing that look, taking into consideration affect and subjectivity analysis can help you with detecting whether something is abusive or misinformation. Let’s go to the next slide.

Something that is done less, so I’m actually, part of my agenda in this talk is to try to point to areas that are right for research by the community. So, we could also incorporate external knowledge, right. So, when you look – if you’re looking at a particular post and you’re trying to figure out whether it’s violating or not, you’re probably missing a lot of context because you’re just looking at the post itself. So, you can imagine you know, increasing levels of using external knowledge. One is you look at a post and you’re trying to find some other post that you’ve already either removed or decided it’s fine, and see whether it’s really similar to it. So, this is – it’s not really external knowledge but it’s you know, it’s previous knowledge and that’s as I was saying with misinformation, we do that and that’s actually a very powerful technique. Another thing you can do is you can try to actually extract – so suppose you’re talking about misinformation – you can actually look at a post and try to say okay, what is the statement that this person is saying? And then you can check that against a database or a knowledge base of known factual knowledge and if it doesn’t match, or if you contradict it, then you say this is not good. That’s nice, but the problem is that some of the claims – first of all, extracting claim is not easy, that’s a huge problem itself. And the second is that a lot of these claims are sort of pretty complicated. So, it doesn’t just match one triple in a knowledge graph, which makes these techniques a little less powerful. The third vision which I would love to see someday happen is you know, all these posts are done in a particular context, right, a particular context of what we know about the world. Now, a lot of what we know about the world is very subjective, right. You know, a group of people, a hate group of people, or they’ve been in conflict for the last 500 years, or these terms have been used to refer to people of this protected group and we know that when you show a picture of this person that you know that creates a very strong negative emotional reaction in group B. Wouldn’t it be nice if we actually had sort of this knowledge base of culture that you know, is obviously subjective and not very pleasant to look at sometimes, because it’s a reflection of some bad stuff. But if we could create something like that, we can bring to bear on integrity violation decisions that would be very powerful. But that’s to me, that’s a medium-term challenge. Okay, let’s go on to the next slide.

So, we’re talking about computer vision here. One of these pictures is of fried broccoli and the other one is marijuana. Now, the classifiers we have – this is from a couple years ago – the classifiers we have at Facebook trained on the latest and greatest machine learning of machine vision techniques are able to distinguish which one is which. If we were in a physical room together I would take a straw vote and half of you would be wrong as to which one is which. I gave a talk similar to this, to teenagers a couple of weeks ago and they were like who cares, you should ban broccoli on social media too. I don’t actually, I think the one on the left is the broccoli. Let’s go on.

The point is, machine computer vision techniques have really come a long way and can distinguish between things that are really hard for humans, which is awesome. Next slide.

This is an example of multimodal reasoning. So, this is – and again, this is a sterilised meme – but essentially you have a picture of a cemetery and you have a phrase saying everybody in your you know, fill in your favorited ethnic group belongs here. There’s nothing wrong with the text. There’s nothing wrong with the image. But when you put those two together, this is clearly hate speech. Okay, now the problem is you thought vision was hard. You thought NLP was hard. Now when you’re putting these two together this becomes much harder. So, the area of multimodal reasoning is, I wouldn’t say in its infancy, but it’s certainly way behind each of the individual technologies, component technology. But this is really interesting because it’s not just that you’re fusing the text and the image, you’re also probably taking some background knowledge that is not in text and not in the image, and putting that together. You’ll see an example of that in a moment, but this is really where – and you know, it’s actually the case that many hate speech posts are means, so they do make use of text and images. Let’s go on to the next slide.

This is an example of – or I’m not expecting you to look at this – but what we’ve done at Facebook, we’ve built, one of the teams here has built a system called whole post integrity embeddings, which looks at those two things. One is it looks at all the modalities together, okay. And the second thing is it looks at many violations together. So, there’s a total of something like 50 something violations and so it’s a multitask learning system, multimodal and multi-task learning. And what it does is it creates the output as an embedding for every post, and that embedding is very informative about whether that post violates a particular policy or not. So, this is – you can think of this as really taking all the techniques we have and putting them together into a representation of a post. And I think if you go to the next slide, these are examples of posts that would be detected by this system and these things are drugs. You can go to the next, but you can see it’s actually pretty subtle to see that these things are drugs. Next slide.

So, this is an example – I’m going back to the multimodal challenge – if you look at the picture on the left, right, love the way you smell today. This is not hate speech but it’s not a very nice thing to say either. So, what you say is this is taken from the hateful means data set which is a great data set if you want to do some research on hate speech. But again, you’re seeing you know, love the way yes you smell today, there’s nothing wrong with the text. There’s nothing wrong with a picture of a skunk but when you take those two together that seems like again, it’s not hate speech, but it’s not very kind. The point here is that you’re also incorporating the fact that skunks don’t smell very well, okay, and so you’re actually fusing the text, the image, and some background knowledge that is present somewhere else. How do you bring all that into an AI system is again, a challenge that everybody’s still working on. Let’s go to the next slide.

You know, let’s skip this part. Just one more slide, and one more, yeah. This is where I want to be. So, I hope that I’ve convinced you that integrity is a hard technical problem. It’s a hard societal problem. It’s a hard policy problem. It’s a hard problem and I think currently in the work that I’m thinking about personally at Facebook, is I’m trying to put integrity in the bigger context of aligning recommender systems and human values. There are a lot of other challenges that I didn’t talk about today. One is you know, privacy is all the rage obviously for good reasons on social networks and messaging – how do you protect users while still preserving their privacy? And privacy can be just you’re not allowed to look at their data, but you’re also not allowed to try to infer what ethnic group they might belong to. Okay,

we have a human workforce who’s looking at posts every day and making judgments about whether they’re violating or not. That’s not a very fun job. How do you do that? First of all, how do you make it efficient so you actually give them only the posts that are possibly violating, and how do you make sure that their well-being is not damaged during this unpleasant job? The other law or rule of integrity is it changes all the time, so we have new policies all the time. We have new things happening in the world, we have you know, Covid, and we have misinformation about Covid and all these things that are happening in elections and what have you. So, changes is the rule of the, the lay of the land. It’s happening all the time, but the good news is you know, if you’re looking at it from the perspective of AI – especially if you’ve been in AI for a long time – it’s inspiring to see that AI is playing a huge critical role in solving or addressing, partially adjusting a societal problem and maybe this can give us inspiration for other societal problems where AI can play a role as well. So, I’ll leave you with that parting thought. Thank you.

Prof Flora Salim:

Great. Thank you so much. So, lots of applause. Just wondering if anyone have a question? Ariana?

Participant 3:

Yeah, can you hear me?

Prof Flora Salim:

I can.

Participant 3:

Alright cool. Yeah, I have two questions regarding Fakebook’s current approaches to humour. So, a bit more about the work that you guys are doing with image memes, this is like – my first question would be, so in the example that you gave with this image meme that says love the way you smell today, so there’s the complexity of the text and the image. I was wondering how your automated approaches work with also the post that works with the image meme? So, we have the image with that, we have the text superimposed in the image, but also the hate speech that could be perceived when we read the image meme with the post – with the written post on Facebook. So, I wonder how do you work with this extra layer of complexity?

And my second question is regarding also your approaches to satire. So, I know that there are some automated approaches that look at satire from the vantage point of truth and misinformation. So, I don’t know you would just label the audience post as satirical, but I wonder how do you approach satire as a social commentary that can be harmful and especially with regards to voice and live videos, and this kind of ironic jokes that work to undermine vulnerable communities? So, yeah. That led to questions .

Alon Halevy:

Yes. Let me start from the second one. That’s a really good question. In fact, a lot of the – I think there was some analysis that showed that a lot of the errors in misinformation or a lot of the places where misinformation and satire often look very similar okay, and so distinguishing actually trying to identify satire is a hard problem that people are working on, but your question is actually deeper, which is hold it some satire is could be considered important, yeah? Harmful, yeah. And that’s a really difficult problem. Now I don’t have a great answer for that, so the place where this would be resolved, and I say resolved in quotes is in the policy – the way the policies are phrased, okay. So, I don’t think that, I think that if you can show that the post is hateful then even if it’s funny that doesn’t help you. But I’m not – don’t quote me on this – but you know, this is, there’s a much deeper issue, right. I mean the thing is there are a lot of jokes right, I mean going back to grade school right, there are a lot of jokes about uh about different populations that they’re funny because they are about that population, right. I mean I don’t have to give you examples and embarrass myself in front of all of you, and I think that’s, I don’t know, that there is a simple way of resolving that, no. It isn’t. But that’s like the question would be that I think that – so let me rephrase it. So, there’s no automated response to this at the moment, I guess for these more complex cases. So, it would be rather at the policy level or if users appeal this kind of content to the oversight board, is there like – or do you have a team, sorry. The appeals first go to Facebook itself, the oversight board is like the supreme court, not a supreme court, but it’s a different body. So, the vast majority of appeals go to the Facebook itself so maybe Facebook if it – you know, they will look at, Facebook will look at the appeal and compare it against the policy. And if the classifier or the humans were wrong, it will reverse the decision. Yeah, okay, yeah. So, my question would be then, if you have someone working on these issues now within this, like your department of like identifying this with some sort of automation, that would be like my question for these difficult cases. Or if it’s something that you would say, well this is really difficult to automate so we don’t have any team working on these issues. I don’t exactly know everything that’s going on, so I can’t tell you whether somebody’s working on it or not. I just know that this has been, it’s been pointed out through some investigations that this is actually a major issue, so – and it’s not surprising right, that misinformation or hate speech and humor are often ambiguous. The cases that we, that they looked at were pretty clear on which side they felt. But I mean even if you look at you know, even if you look at something from the onion – the onion is an American website that is a satire site. It’s got a political leaning, right. So, it’s clear that it has a political leaning. So, you – and in fact many satire sites do. So, do you disallow them?

Prof Flora Salim:

Yeah, I mean there’s no simple answer. Yeah, there are two hands. So, Danula, do you want to raise the question?

Participant 4:

Thanks Alon, really interesting talk and it’s a very complex problem. It’s definitely – so my question is on information presentation and user engagement strategies that you could use once you detect particular content is borderline or is it misinformation. Let’s say there’s a bit of work around like, credibility indicators. And also nudging users to think about accuracy, etc. And also if you take platforms like Twitter, they flag content and the flag saying that this content might be missing information. So, I’m just curious about Facebook’s take on that, and also whether you have any ongoing research around that area?

Alon Halevy:

So, I think all these techniques and what you described from Twitter and other techniques are all on the table. Okay, they’re all on discussion. Some of them are implemented in certain contexts. For example, another technique might be making it harder to reshare. So, if there’s some piece of content that is borderline or downright violating that we share will be made more harder to do. So, all these techniques are possibilities. I don’t know, I can’t speak for the current state of which ones are being planned or under investigations, and they’re not always very easy to implement as well. The, you know, some of the stuff that Facebook has done around Covid information, for example. If you have a Covid related post, they will – you will see immediately a post after that, that has information, authoritative information about Covid. That’s a decision that the company made was important enough to provide the right information. And until it was done, the same thing around the elections, the 2020 elections also. Whenever there was a post around the elections, there was a status post around what you know – what we currently know about the results and trying to provide a common reality to the users. So, that’s – I also want to point out one other thing. There’s a fraction, I don’t know what size fraction, but there’s a pretty sizable fraction of integrity violations that are not intentional. People don’t intend to harm others. They just speak in ways that is you know, unacceptable. And when you take down their post and you explain to them that this was violating, they don’t repeat it. So, it’s not like you know, the world is full of really bad people just trying to trick our systems all the time. It’s also an education problem.

Prof Flora Salim:

Thank you. I’m just wondering – oh, how’d you go about time? Do you still have a bit of time?

Alon Halevy:

Yeah, I do.

Prof Flora Salim:

So Alon, would you like to take a question? Sure.

Participant 5:

Thank you. Thank you Alon, for the great talk. I have a couple of questions but I just decided to go with two, and one of them was actually very much I think aligned with the previous question from Danula. When you were showing the three types of enforcements, I was thinking is there any kind of like a systematic way of providing explanations to users, just like instead of cutting the content? I’m just kind of like removing them, just trying to basically make users ever why these decisions has been made and like, just to help them understand a bit better. So, that’s kind of like – and this is basically I think, when you said that maybe there are some talks about making content to share more difficulty, I was just thinking it’s somehow I mean, I know that everything that we potentially would need to decide in such context is somehow like a social engineering. But the second question I have is actually, is there any long term, like a desire to run long-term studies of the impact of our decisions or the decisions made in Facebook, and basically the social impacts of it?

Alon Halevy:

So, just to give you a bit of a context. Yesterday I just came across a paper which was talking about the flags in social media and how it’s structuring the language of users and how it’s both limiting how we are expressing our concerns, and it basically potentially has the potential to change how we are thinking and structuring our ideas around those flags. So, is there any desire to think of long-term impacts of any of these decisions that we are making?

There is a desire. You know, this obviously – there’s a concern in the media, there’s a concern among people around the world, about the effects of social media. All kinds of negative effects. And so this is not you know, people at Facebook are extremely concerned about this and there are a multitude of efforts trying to understand them. And to do something about that, about them, to answer your question about explain ability. So, you know, at the bottom, the fundamental level right, is a lot of these systems that are doing the classification are deep learning systems. Which means that by and large, you don’t exactly know what’s going on there. Now that’s only part of the story, right. When the system decides to remove something, it does tell you which violation it is covered. Okay, so it can tell you that this is hate speech.

Now, going to the next level which is telling you this is hate speech because it’s saying something. Here’s the phrase that was used and this thing is hate, is a way of expressing hate towards this particular group. There you need the AI system to be more explainable, and people are working on that. And trying to sort of build systems that will look at the AI decisions and will give some sort of partial explanation. But some basic – when something gets removed it is flagged for a particular violation, but that violate – that explanation may not be detailed enough, in many cases.

So, now long-term studies. You know, the world is changing so quickly that yeah, there’s another point here, right. I mean Facebook or other social media, we’re not working in a vacuum, right. There’s a lot of stuff going on in the world. There’s a lot of, there are a lot of events and there’s a lot of conflicts and social issues. And so at some point, how do you measure the real effect that social media is having on the real world? It’s having some effect. In some cases it’s minor, in some cases it’s a little bigger. But actually figuring out what is the real effect on the world is a tremendously hard problem, right, because you spend – even if you’re an avid Facebook user, you might be spending I don’t know, some number of minutes on Facebook per day – you’ve got the other parts of your day and the other people who are affecting your mood and your thoughts and your other media that you’re looking at. So, how do you do the credit assignment or credit this assignment, or blame assignments or whatever you call it – how can you really pinpoint to a particular tool like a social media platform, and blame that for something? That’s how – I’m not saying we shouldn’t be studying this. I’m just saying that’s a really hard problem.

Prof Flora Salim:

Great, I think there’s one final question from Mark Andrejevic?

Participant 6:

Thanks so much. Thanks for a super interesting talk. I had a quick question early on, and I hope I didn’t miss it. I had to drop out for a little bit, but you mentioned also integrity issues with advertising. And I know that Facebook has had some issues, and actually had to pay a fine for allowing certain types of advertising, and I’m wondering if you could talk about what integrity initiatives are underway in terms of advertising? And also there were some external efforts – I think it was ProPublica, that did some of the work on kind of pointing out that ads could be targeted. Unprotected categories for housing and jobs, and so on. But what are the key problems with integrity and advertising, and what initiatives are at work on them?

Alon Halevy:

Yeah, I mean I think the main issues – some of the issues are trying to sell stuff you’re not supposed to sell like drugs, like guns, okay. Mis-advertising, saying that you’re selling something but actually not – there are a bunch of things about political ads. So, for example, political ads. You need to, since you know, a few years ago, political ads go through much more scrutiny. Okay so, if you’re going to put a political ad on Facebook, you need to say who you are, what organisation you are, who’s funding you, a bunch of stuff. And in fact, there is a database where anybody can go and look at all the political ads that went on Facebook, and some stats about them. So, a lot of people try to circumvent that, right. They try to you know, put an ad that is – they don’t say is political and therefore they don’t have to do all this, and still get through. So, that’s another issue. And you know, what is – I mean there’s also if you go one level up right, what is a political ad? So, many social issues is any you know, if i’m talking about climate change, is that? Or if I’m trying to sell solar panels, is that? You know, you could say this is, well you’re arguing about climate change or something like that. So, the boundary between what is a social issue and what is a political ad is also something that is kind of hard to navigate sometimes. But so, those are some of the issues um that I don’t know enough about the targeting. I mean if you go on Facebook and try to create an ad, you can see exactly what your options are in terms of you know, you can select an age range and you can select maybe an area of the world or a city where you want to advertise. But you can’t ask you know, to advertise stuff to the republicans or to black people, you can’t do that.

Prof Flora Salim:

Thank you. Great. Thank you Alon. So, I just have a very final question and maybe slightly a bit more technical. So, I really like the final slide there where you talk about the difficulty in identifying hateful messages from multimodal signals, and also from the context now, even from that mimi-pro examples. So, I’m just wondering, how would someone how would Facebook use you know – where will the ground truth come from? Do you actually then need to have lots of people annotating this or you know, how do you get all this historical background information that could be used to train a model, to understand this context and background.

Alon Halevy:

No, I mean, so the ground truth comes from labelling. So, we have people labelling posts and we know what is violating and what isn’t. Not only that, because of a variety of rules we can’t keep these annotations forever. So, we need to throw out data after a while. So, that means we need to relabel. So, in fact when Covid started and everybody went home a lot of the people who were doing the work on labelling couldn’t work for a certain amount of time and that created a dearth of training data we recovered, but these are, this is the reality of machine learning. Today you need training data everywhere, wow, but one could hope that you could create resources that are sort of there forever, right. But that’s you know, that’s something still on the agenda.

Prof Flora Salim:

Great, alright. So, I think it’s time to wrap up the talk. So, I’d just like to thank you Alon, for giving your time in the evening, and hope you have a nice week. And thank you. We’ll have this talk posted on our YouTube channel and we’ll share it with you once it’s done. Thank you.

Alon Halevy:

Thank you. Thanks for all the great questions. Okay, bye, bye.