The traditional distinction between voice and RF applications is disappearing as old school voice-directed warehouse applications are giving way to multi-modal voice applications that combine voice direction, speech recognition, barcode scanning, and device displays. This transition started almost 10 years ago, but has accelerated in recent years as a byproduct of GS1 data standardization initiatives and the drive for better product tracking and traceability within and beyond the four walls of the DC. Multi-modal voice applications provide greater flexibility and efficiency in capturing product-level data, helping DCs to meet traceability objectives without adding to costs.
Just as no two DC’s are alike, not all “multi-modal” voice solutions are created equal. For example, some solutions offer voice-direction and scanning, but do not include speech recognition. Others may offer complete voice capabilities with barcode scanning, but require the use of a special purpose hardware device that does not include a screen and keyboard (for log-in or to display product images on the screen). True multi-modal solutions don’t require DCs to purchase dedicated hardware or to limit their data capture and display options. Rather, they permit the interchangeable use of speech recognition and barcode scanning, and the selective use of device displays, in addition to key and touch screen entry using a broad choice of standard multi-purpose hardware platforms.
As described in this article, the genesis of multi-modal voice applications was tied to hardware technology developments in the early 2000s, but the compelling business driver for multi-modal processes is related to product tracking and traceability initiatives.
From voice-only or scan-only to multi-modal
Foodservice and grocery distributors were among the early adopters of voice-directed warehouse applications as a highly accurate, efficient, and ergonomic means for order selection and other warehouse processes. In the typical voice process, selectors are directed by voice and confirm their activity by speaking a location- or item-based check string (typically a two- or three-digit number) as they grab items from a location. The voice system recognizes the user’s speech using advanced recognition technology running on a mobile, belt-worn computing device.
Early voice applications used single-purpose, voice-only “appliances” that were purpose-built for speech recognition applications and did not have a screen or scanner. Those voice-only solutions were often adopted as a replacement or alternative to RF systems running on multi-purpose wearable or handheld devices.
The clean separation between voice and RF solutions started to break down in the middle of the last decade when the major manufacturers of rugged mobile computers used for RF upgraded and optimized their hardware to support speech recognition. Since then, standard, multi-modal mobile computers have steadily taken a larger share of the total hardware market for voice applications.
More and more voice solutions have been delivered on multi-modal hardware, but not all of these applications utilize scanning, screens or keypads. Similarly, some RF/scanning systems have incorporated voice direction in their solutions, but they do not utilize speech recognition; instead they rely on barcode scanning (and key entry) to confirm activity and capture information.
Although voice-only or scan-only approaches work for capturing item-level information, neither technology is best for every situation. As a result, any DC that relies on one technology to the exclusion of the other may be settling for a sub-optimal solution. The operational benefits of a multi-modal approach using both voice and scanning are accentuated as data capture requirements increase, due in large part to new product traceability and GS1 data standards initiatives.