Get a Grip: Intel Neuromorphic Chip Used to Give Robotics Arm a Sense of Touch

This novel robotic system developed by NUS researchers comprises an artificial brain system that mimics biological neural networks, which can be run on a power-efficient neuromorphic processor such as Intel’s Loihi chip, and is integrated with artificial skin and vision sensors. Credit: National University of Singapore

Get a Grip: Intel Neuromorphic Chip Used to Give Robotics Arm a Sense of Touch

By John Russell

Moving neuromorphic technology from the laboratory into practice has proven slow-going. This week, National University of Singapore researchers moved the needle forward demonstrating an event-driven, visual-tactile perception system that uses Intel’s Loihi chip to control a robotic arm combining tactile sensing and vision. Noteworthy, they also ran the exercise on a GPU system and reported the Loihi-based the system performed slightly better and at much lower power.

NUS researchers presented their results today at the virtual Robotics Science and Systems conference being held this week. The combination of tactile sensing (grip) with vision (location) is expected to significantly enhance Robotic arm precision and delicacy of grip when handling objects. The use of neuromorphic technology also promises progress in efforts to reduce the power consumption required for robotics which is a central goal for neuromorphic technology.

“We’re excited by these results. They show that a neuromorphic system is a promising piece of the puzzle for combining multiple sensors to improve robot perception. It’s a step toward building power-efficient and trustworthy robots that can respond quickly and appropriately in unexpected situations,” said Harold Soh a NUS professor and author on a paper describing the work (Event-Driven Visual-Tactile Sensing and Learning for Robots).

Intel has long been at the forefront of efforts to commercialize neuromorphic technology, and its Loihi (chip)/Pohoiki (system) is among the most developed platforms. Neuromorphic systems mimic natural systems such as the brain in that they use spiking neural networks (SNN) to process information instead of the artificial neural networks (ANN) more commonly used in machine/deep learning.

Mike Davies, director of Intel’s Neuromorphic Computing Lab, said, “This research from National University of Singapore provides a compelling glimpse to the future of robotics where information is both sensed and processed in an event-driven manner combining multiple modalities. The work adds to a growing body of results showing that neuromorphic computing can deliver significant gains in latency and power consumption once the entire system is re-engineered in an event-based paradigm spanning sensors, data formats, algorithms, and hardware architecture.” Intel also posted an account of the work.

This excerpt from the NUS paper nicely describes the challenge and contribution:

“Many everyday tasks require multiple sensory modalities to perform successfully. For example, consider fetching a carton of soymilk from the fridge humans use vision to locate the carton and can infer from a simple grasp how much liquid the carton contains. They can then use their sense of sight and touch to lift the object without letting it slip. These actions (and inferences) are performed robustly using a power-efficient neural substrate—compared to the multi-modal deep neural networks used in current artificial systems, human brains require far less energy.

“In this work, we take crucial steps towards efficient visual-tactile perception for robotic systems. We gain inspiration from biological systems, which are asynchronous and event- driven. In contrast to resource-hungry deep learning methods, event-driven perception forms an alternative approach that promises power-efficiency and low-latency—features that are ideal for real-time mobile robots. However, event-driven systems remain under-developed relative to standard synchronous perception methods.”

The value of multi-modal sensing has long been recognized as an important component for advancing robotics. However, limitations in the use of spiking neural networks have impeded the use of neuromorphic chips in real-time sensing functions.

“Event-based sensors have been successfully used in conjunction with deep learning techniques. The binary events are first converted into real-valued tensors, which are processed downstream by deep ANNs (artificial neural networks). This approach generally yields good models (e.g., for motion segmentation, optical flow estimation, and car steering prediction, but at high compute cost,” write the researchers

“Neuromorphic learning, specifically Spiking Neural Networks (SNNs), provide a competing approach for learning with event data. Similar to event-based sensors, SNNs work directly with discrete spikes and hence, possess similar characteristics, i.e., low latency, high temporal resolution and low power consumption. Historically, SNNs have been hampered by the lack of a good training procedure. Gradient-based methods such as backpropagation were not available because spikes are non-differentiable. Recent developments in effective SNN training, and the nascent availability of neuromorphic hardware (e.g., IBM TrueNorth and Intel Loihi) have renewed interest in neuromorphic learning for various applications, including robotics. SNNs do not yet consistently outperform their deep ANN cousins on pseudo-event image datasets, and the research community is actively exploring better training methods for real event-data.”

Another obstacle was simply developing adequate tactile sensing devices. “Although there are numerous applications for tactile sensors (e.g., minimal invasive surgery and smart prosthetics), tactile sensing technology lags behind vision. In particular, current tactile sensors remain difficult to scale and integrate with robot platforms. The reasons are twofold: first, many tactile sensors are interfaced via time-divisional multiple access (TDMA), where individual taxels are periodically and sequentially sampled. The serial readout nature of TDMA inherently leads to an increase of readout latency as the number of taxels in the sensor is increased. Second, high spatial localization accuracy is typically achieved by adding more taxels in the sensor; this invariably leads to more wiring, which complicates integration of the skin onto robot end- effectors and surfaces,” according to the paper.

The researchers developed their own a novel “neuro-inspired” tactile sensor (NeuTouch): “The structure of NeuTouch is akin to a human fingertip: it comprises “skin”, and “bone”, and has a physical dimension of 37×21×13 mm. This design facilitates integration with anthropomorphic end-effectors (for prosthetics or humanoid robots) and standard multi-finger grippers; in our experiments, we use NeuTouch with a Robotiq 2F-140 gripper. We focused on a fingertip design in this paper, but alternative structures can be developed to suit different applications,” wrote the researchers.

NeuTouch’s tactile sensing is achieved via a layer of electrodes with 39 taxels and a graphene-based piezoresistive thin film. The taxels are elliptically-shaped to resemble the human fingertip’s fast-adapting (FA) mechano-receptors, and are radially-arranged with density varied from high to low, from the center to the periphery of the sensor.

“During typical grasps, NeuTouch (with its convex surface) tends to make initial contact with objects at its central region where the taxel density is the highest. Correspondingly, rich tactile data can be captured in the earlier phase of tactile sensing, which may help accelerate inference (e.g., for early classification). The graphene-based pressure transducer forms an effective tactile sensor, due to its high Young’s modulus, which helps to reduce the transducer’s hysteresis and response time,” report the researchers.

The primary goal, say the researchers, was to determine if their multi-modal system was effective at detecting differences in objects that were difficult to isolate using a single sensor, and whether the weight spike-count loss resulted in better early classification performance. “Note that our objective was not to derive the best possible classifier; indeed, we did not include proprioceptive data which would likely have improved results, nor conduct an exhaustive (and computationally expensive) search for the best architecture. Rather, we sought to understand the potential benefits of using both visual and tactile spiking data in a reasonable setup.”

They used four different containers: a coffee can, Pepsi bottle, cardboard soy milk carton, and metal tuna can. The robot was used to grasp and lift each object 15 times and classify the object and determine its weight. The multi-modal SNN model achieved the highest score (81percent) which was about ten percent better than any of the single mode tests.

In terms of comparing the Loihi neuromorphic chip with the GPU (Nvidia GeForce RTX 2080), their overall performance was broadly similar but the Loihi-based system used far less power (see table). The latest work is significant step forward.

It’s best to read the full paper but here is an overview of the experiment taken from the paper.

Robot Motion. The robot would grasp and lift each object class fifteen times, yielding 15 samples per class. Trajectories for each part of the motion was computed using the MoveIt Cartesian Pose Controller. Briefly, the robot gripper was initialized 10cm above each object’s designated grasp point. The end-effector was then moved to the grasp position (2 seconds) and the gripper was closed using the Robotiq grasp controller (4 seconds). The gripper then lifted the object by 5cm (2 seconds) and held it for 0.5 seconds.
Data Pre-processing. For both modalities, we selected data from the grasping, lifting and holding phases (corresponding to the 2.0s to 8.5s window in Figure 4), and set a bin duration of 0.02s (325 bins) and a binning threshold value Smin = 1. We used stratified K-folds to create 5 splits; each split contained 240 training and 60 test examples with equal class distribution.
Classification Models. We compared the SNNs against conventional deep learning, specifically Multi-layer Perceptrons (MLPs) with Gated Recurrent Units (GRUs) [54] and 3D convolutional neural networks (CNN-3D) [55]. We trained each model using (i) the tactile data only, (ii) the visual data only, and (iii) the combined visual-tactile data. Note that the SNN model on the combined data corresponds to the VT-SNN. When training on a single modality, we use Visual or Tactile SNN as appropriate. We implemented all the models using PyTorch.