Evaluations
We evaluate AI and robot systems in terms of their risks and social impact.
Relevant publications
- R. Azeem, A. Hundt, M. Mansouri, and M. Brandao, “LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions,” arXiv preprint arXiv:2406.08824, Jun. 2024.
[Abstract]
[arXiv]
#fairness
#safety
Members of the Human-Robot Interaction (HRI) and Artificial Intelligence (AI) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interactions, doing household and workplace tasks, approximating ‘common sense reasoning’, and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To address these concerns, we conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs. Our evaluation reveals that LLMs currently lack robustness when encountering people across a diverse range of protected identity characteristics (e.g., race, gender, disability status, nationality, religion, and their intersections), producing biased outputs consistent with directly discriminatory outcomes – e.g. ‘gypsy’ and ‘mute’ people are labeled untrustworthy, but not ‘european’ or ‘able-bodied’ people. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions – such as incident-causing misstatements, taking people’s mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. Data and code will be made available.
- W. Wu, F. Pierazzi, Y. Du, and M. Brandao, “Characterizing Physical Adversarial Attacks on Robot Motion Planners,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
[Abstract]
[PDF]
#safety
As the adoption of robots across society increases, so does the importance of considering cybersecurity issues such as vulnerability to adversarial attacks. In this paper we investigate the vulnerability of an important component of autonomous robots to adversarial attacks - robot motion planning algorithms. We particularly focus on attacks on the physical environment, and propose the first such attacks to motion planners: "planner failure" and "blindspot" attacks. Planner failure attacks make changes to the physical environment so as to make planners fail to find a solution. Blindspot attacks exploit occlusions and sensor field-of-view to make planners return a trajectory which is thought to be collision-free, but is actually in collision with unperceived parts of the environment. Our experimental results show that successful attacks need only to make subtle changes to the real world, in order to obtain a drastic increase in failure rates and collision rates - leading the planner to fail 95% of the time and collide 90% of the time in problems generated with an existing planner benchmark tool. We also analyze the transferability of attacks to different planners, and discuss underlying assumptions and future research directions. Overall, the paper shows that physical adversarial attacks on motion planning algorithms pose a serious threat to robotics, which should be taken into account in future research and development.
- N. W. Alharthi and M. Brandao, “Physical and Digital Adversarial Attacks on Grasp Quality Networks,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024.
[Abstract]
[Code]
[PDF]
#safety
Grasp Quality Networks are important components of grasping-capable autonomous robots, as they allow them to evaluate grasp candidates and select the one with highest chance of success. The widespread use of pick-and-place robots and Grasp Quality Networks raises the question of whether such systems are vulnerable to adversarial attacks, as that could lead to large economic damage. In this paper we propose two kinds of attacks on Grasp Quality Networks, one assuming physical access to the workspace (to place or attach a new object) and another assuming digital access to the camera software (to inject a pixel-intensity change on a single pixel). We then use evolutionary optimization to obtain attacks that simultaneously minimize the noticeability of the attacks and the chance that selected grasps are successful. Our experiments show that both kinds of attack lead to drastic drops in algorithm performance, thus making them important attacks to consider in the cybersecurity of grasping robots.
- Z. Zhou and M. Brandao, “Noise and Environmental Justice in Drone Fleet Delivery Paths: A Simulation-Based Audit and Algorithm for Fairer Impact Distribution,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
[Abstract]
[Code]
[PDF]
#fairness
#wellbeing
Despite the growing interest in the use of drone fleets for delivery of food and parcels, the negative impact of such technology is still poorly understood. In this paper we investigate the impact of such fleets in terms of noise pollution and environmental justice. We use simulation with real population data to analyze the spatial distribution of noise, and find that: 1) noise increases rapidly with fleet size; and 2) drone fleets can produce noise hotspots that extend far beyond warehouses or charging stations, at levels that lead to annoyance and interference of human activities. This, we will show, leads to concerns of fairness of noise distribution. We then propose an algorithm that successfully balances the spatial distribution of noise across the city, and discuss the limitations of such purely technical approaches. We complement the work with a discussion of environmental justice, showing how careless UAV fleet development and regulation can lead to reinforcing well-being deficiencies of poor and marginalized communities.
- R. Eifler, M. Brandao, A. Coles, J. Frank, and J. Hoffman, “Evaluating Plan-Property Dependencies: A Web-Based Platform and User Study,” in Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 2022.
[Abstract]
[DOI]
[PDF]
#transparency
The trade-offs between different desirable plan properties - e.g. PDDL temporal plan preferences - are often difficult to understand. Recent work addresses this by iterative planning with explanations elucidating the dependencies between such plan properties. Users can ask questions of the form ’Why does the plan not satisfy property p?’, which are answered by ’Because then we would have to forego q’. It has been shown that such dependencies can be computed reasonably efficiently. But is this form of explanation actually useful for users? We run a large crowd-worker user study (N = 100 in each of 3 domains) evaluating that question. To enable such a study in the first place, we contribute a Web-based platform for iterative planning with explanations, running in standard browsers. Comparing users with vs. without access to the explanations, we find that the explanations enable users to identify better trade-offs between the plan properties, indicating an improved understanding of the planning task.
- M. Brandao, M. Mansouri, A. Mohammed, P. Luff, and A. Coles, “Explainability in Multi-Agent Path/Motion Planning: User-study-driven Taxonomy and Requirements,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022, pp. 172–180.
[Abstract]
[PDF]
#transparency
Multi-Agent Path Finding (MAPF) and Multi-Robot Motion Planning (MRMP) are complex problems to solve, analyze and build algorithms for. Automatically-generated explanations of algorithm output, by improving human understanding of the underlying problems and algorithms, could thus lead to better user experience, developer knowledge, and MAPF/MRMP algorithm designs. Explanations are contextual, however, and thus developers need a good understanding of the questions that can be asked about algorithm output, the kinds of explanations that exist, and the potential users and uses of explanations in MAPF/MRMP applications. In this paper we provide a first step towards establishing a taxonomy of explanations, and a list of requirements for the development of explainable MAPF/MRMP planners. We use interviews and a questionnaire with expert developers and industry practitioners to identify the kinds of questions, explanations, users, uses, and requirements of explanations that should be considered in the design of such explainable planners. Our insights cover a diverse set of applications: warehouse automation, computer games, and mining.
- M. Brandao, G. Canal, S. Krivic, P. Luff, and A. Coles, “How experts explain motion planner output: a preliminary user-study to inform the design of explainable planners,” in IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2021, pp. 299–306.
[Abstract]
[DOI]
[PDF]
#transparency
Motion planning is a hard problem that can often overwhelm both users and designers: due to the difficulty in understanding the optimality of a solution, or reasons for a planner to fail to find any solution. Inspired by recent work in machine learning and task planning, in this paper we are guided by a vision of developing motion planners that can provide reasons for their output - thus potentially contributing to better user interfaces, debugging tools, and algorithm trustworthiness. Towards this end, we propose a preliminary taxonomy and a set of important considerations for the design of explainable motion planners, based on the analysis of a comprehensive user study of motion planning experts. We identify the kinds of things that need to be explained by motion planners ("explanation objects"), types of explanation, and several procedures required to arrive at explanations. We also elaborate on a set of qualifications and design considerations that should be taken into account when designing explainable methods. These insights contribute to bringing the vision of explainable motion planners closer to reality, and can serve as a resource for researchers and developers interested in designing such technology.
- R. Eifler, M. Brandao, A. Coles, J. Frank, and J. Hoffman, “Plan-Property Dependencies are Useful: A User Study,” in ICAPS 2021 Workshop on Explainable AI Planning (XAIP), 2021.
[Abstract]
[PDF]
#transparency
The trade-offs between different desirable plan properties - e.g. PDDL temporal plan preferences - are often difficult to understand. Recent work proposes to address this by iterative planning with explanations elucidating the dependencies between such plan properties. Users can ask questions of the form ’Why does the plan you suggest not satisfy property p?’, which are answered by ’Because then we would have to forego q’ where not-q is entailed by p in plan space. It has been shown that such plan-property dependencies can be computed reasonably efficiently. But is this form of explanation actually useful for users? We contribute a user study evaluating that question. We design use cases from three domains and run a large user study (N = 40 for each domain, ca. 40 minutes work time per user and domain) on the internet platform Prolific. Comparing users with vs. without access to the explanations, we find that the explanations tend to enable users to identify better trade-offs between the plan properties, indicating an improved understanding of the task.
- M. Brandao, G. Canal, S. Krivic, and D. Magazzeni, “Towards providing explanations for robot motion planning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 3927–3933.
[Abstract]
[DOI]
[PDF]
#transparency
Recent research in AI ethics has put forth explainability as an essential principle for AI algorithms. However, it is still unclear how this is to be implemented in practice for specific classes of algorithms - such as motion planners. In this paper we unpack the concept of explanation in the context of motion planning, introducing a new taxonomy of kinds and purposes of explanations in this context. We focus not only on explanations of failure (previously addressed in motion planning literature) but also on contrastive explanations - which explain why a trajectory A was returned by a planner, instead of a different trajectory B expected by the user. We develop two explainable motion planners, one based on optimization, the other on sampling, which are capable of answering failure and constrastive questions. We use simulation experiments and a user study to motivate a technical and social research agenda.
- J. Grzelak and M. Brandao, “The Dangers of Drowsiness Detection: Differential Performance, Downstream Impact, and Misuses,” in AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021.
[Abstract]
[DOI]
[PDF]
#fairness
Drowsiness and fatigue are important factors in driving safety and work performance. This has motivated academic research into detecting drowsiness, and sparked interest in the deployment of related products in the insurance and work-productivity sectors. In this paper we elaborate on the potential dangers of using such algorithms. We first report on an audit of performance bias across subject gender and ethnicity, identifying which groups would be disparately harmed by the deployment of a state-of-the-art drowsiness detection algorithm. We discuss some of the sources of the bias, such as the lack of robustness of facial analysis algorithms to face occlusions, facial hair, or skin tone. We then identify potential downstream harms of this performance bias, as well as potential misuses of drowsiness detection technology - focusing on driving safety and experience, insurance cream-skimming and coverage-avoidance, worker surveillance, and job precarity.
- M. Brandao, “Fair navigation planning: a humanitarian robot use case,” in KDD 2020 Workshop on Humanitarian Mapping, 2020.
[Abstract]
[arXiv]
[PDF]
#fairness
In this paper we investigate potential issues of fairness related to the motion of mobile robots. We focus on the particular use case of humanitarian mapping and disaster response. We start by showing that there is a fairness dimension to robot navigation, and use a walkthrough example to bring out design choices and issues that arise during the development of a fair system. We discuss indirect discrimination, fairness-efficiency trade-offs, the existence of counter-productive fairness definitions, privacy and other issues. Finally, we conclude with a discussion of the potential of our methodology as a concrete responsible innovation tool for eliciting ethical issues in the design of autonomous systems.
- M. Brandao, M. Jirotka, H. Webb, and P. Luff, “Fair navigation planning: a resource for characterizing and designing fairness in mobile robots,” Artificial Intelligence (AIJ), vol. 282, 2020.
[Abstract]
[DOI]
[PDF]
#fairness
In recent years, the development and deployment of autonomous systems such as mobile robots have been increasingly common. Investigating and implementing ethical considerations such as fairness in autonomous systems is an important problem that is receiving increased attention, both because of recent findings of their potential undesired impacts and a related surge in ethical principles and guidelines. In this paper we take a new approach to considering fairness in the design of autonomous systems: we examine fairness by obtaining formal definitions, applying them to a system, and simulating system deployment in order to anticipate challenges. We undertake this analysis in the context of the particular technical problem of robot navigation. We start by showing that there is a fairness dimension to robot navigation, and we then collect and translate several formal definitions of distributive justice into the navigation planning domain. We use a walkthrough example of a rescue robot to bring out design choices and issues that arise during the development of a fair system. We discuss indirect discrimination, fairness-efficiency trade-offs, the existence of counter-productive fairness definitions, privacy and other issues. Finally, we elaborate on important aspects of a research agenda and reflect on the adequacy of our methodology in this paper as a general approach to responsible innovation in autonomous systems.
- M. Brandao, “Age and gender bias in pedestrian detection algorithms,” in Workshop on Fairness Accountability Transparency and Ethics in Computer Vision, CVPR, 2019.
[Abstract]
[Dataset]
[arXiv]
[PDF]
#fairness
#safety
In this paper we evaluate the age and gender bias in state-of-the-art pedestrian detection algorithms. These algorithms are used by mobile robots such as autonomous vehicles for locomotion planning and control. Therefore, performance disparities could lead to disparate impact in the form of biased crash outcomes. Our analysis is based on the INRIA Person Dataset extended with child, adult, male and female labels. We show that all of the 24 top-performing methods of the Caltech Pedestrian Detection Benchmark have higher miss rates on children. The difference is significant and we analyse how it varies with the classifier, features and training data used by the methods. Algorithms were also gender-biased on average but the performance differences were not significant. We discuss the source of the bias, the ethical implications, possible technical solutions and barriers to "solving" the issue.