Vision-based Autonomous Control of a 7 DOF Franka Emika Robot Arm

Programmed a 7 DOF Franka Emika Robot Arm in ROS 2 and Python to detect and knock down colored bowling pins.

Overview

In this project, the Franka arm scans the environment for randomly placed bowling pins of known colors (blue, green, red, yellow) and knocks them down. Mounted with an Intel RealSense D435i depth camera, it perceptively observes the environment, while employing a Nerf blaster to accurately aim and knock over the identified targets. The user has the option to specify the initial color of the bowling pins for the arm to knock over. Once all the selected colored pins are knocked down, the arm pauses, awaiting a subsequent input to resume targeting differently colored pins. Additionally, it possesses the capability to switch between two Nerf blasters if ammunition runs low during the operation.

Control Flow and Nodes

Control: Main node which is used to call services to other nodes to carry out the project.

Shoot: Node that carries out all MoveIt-interface services such as cartesian planning, IK planning, and gripper requests.

Yolo: Node that runs YOLO (You only look once) to find the colored bowling pins with respect to the base of the Franka arm and display them as colored markers in Rviz2.

User_Input: Node that listens to the user’s audio input to set the color of the targeted pins.

Trigger: Node that controls the Arduino for the gun’s trigger.

Apriltag_node: Node that scans and gives the coordinates for the April tags.

The control flow diagram is shown below:

Control Flow Diagram

User Input

The user input is done using pyaudio package with the Google Web Speech API that listens to an user’s choice of target and transcibes that to a string that can be used in aiming at the chosen target. The inputs are set to only red, blue, green, and yellow.

Image Pipeline

The Image pipeline is comprised of two major parts: Scanning (Object detection and classification) and AprilTags. Object detection and classification is done with the help of the YOLOv8 (You Only Look Once) model which is a real-time object detection and image segmentation model. The model was trained on more than 300 images of five classes: blue_pins, red_pins, yellow_pins, green_pins and not_pins. This was used to detect the colored bowling pins in the environment. With the help of the Intel RealSense D435i depth camera mounted on top of the Franka’s arm, the depth of the pins as well as the x, y coordinates were calculated with respect to the Franka’s base and this was published as Markers in Rviz2

AprilTags are used to detect the position of the Nerf blasters in order to pick them up once the pin scanning was done. The AprilTags are located using the apriltag_ros package.

Shown below are the blue and yellow pins published as colored markers in Rviz2

Rviz markers

Motion Planning

Motion Planning is done using The MoveItApi - a python wrapper for basic MoveIt functionality offered by the MoveIt ROS API. It used Cartesian Planning to pick up the blasters and Inverse Kinematics for the scanning positions.

Control

The main control node is used to integrate all functionalities of the system together. It starts with the User input to select which pin gets knocked down. Scanning of the bowling pins is next, calling a the YOLO node to detect and publish markers of the detected pins. Once the pins are detected, the nerf blasters are loacted and picked up using the gun_pickup service and then targets are aimed at. The final service call is for the trigger mechanism that triggers the nerf blaster to fire at the aimed target.

Future Work

  • One of the biggest issues is the success rate of hitting targets because even if the gun is correctly pointed at the target, the error from the gun can cause the gun to miss. This would be something that would ideally be fixed by either increasing the accuracy of the gun so that as long as the gun is properly aimed that it would hit the target or increasing the bullets to increase the chances of hitting. Another option would be to include motion while the gun is shooting to try to correct the error from the gun.
  • 360-degree scanning and shooting. Currently, the arm is only able to scan and shoot targets in front of itself so the next step would be adding the ability to target in any direction. Additionally, further testing on the limits of the height of the targets would also be necessary to determine how high and how low a target can be and have the arm still successfully shoot it.
  • Dynamic aiming. The assumption is made that the targets are stationary and unable to move, so being able to add a camera to allow for the arm to constantly scan for targets would allow for two different things. The first is a dynamic environment where targets are moved before the gun is shot and remain stationary and can be hit, and the second is moving targets where the gun can track its motion and shoot.

The Team

  • Rahul Roy(YOLO, Computer vision, Image Pipeline)
  • Maximiliano Palay
  • Sophia Schiffer
  • Jialu Yu
  • Joel Goh