AI as Human-Augmentation – How smartphone VR is a game-changer for Weak AI and Touchless Technology.

Jean-Baptiste Guignard is a Senior Cognitive Scientist of the UTC-Sorbonne Universités. He supervises the work of several Computer Vision and/or AI PhD candidates. Former Invited Scholar of Princeton University [Green Hall], CTO & Head of Research of CLAY – SDK solutions for Gesture Recognition on smartphones – he is also a mentor for IBM Watson XPrize.

AI has largely been mystified and is commonly viewed as self-thinking “strong” AI (the Ghost in the Shell Fantasy), when operational AI is and remains probability tools and/or classifiers, no matter how complex or bio-inspired they are. The three waves of AI each carry specific assets and liabilities, but endlessly turn out to be “weak”, instructional, predetermined and purpose-oriented. Heuristics, fuzzy logics, expert systems, NN, RNN and their newly-patented architectures are as old in the history of Computer Science as relevant in today’s entrepreneurial world.

I shall rather argue for a prosthetic conception of AI – if human cognition is embodied, distributed and non-computational, then AI has to be a tool for human augmentation (not imitation or full substitution), the way an instrument is for a musician. As such, it is an entire constituent of one’s understanding of daily patterns and routines – one may not see the world if he/she glares at his/her glasses, but only perceives it once looking through them. The metaphor extends to VR where feeling immersed rests on the (prosthetic) tools you appeal to for adequate perception.

Clay VR is a SDK for gesture recognition on smartphones from any embedded lens (no need for additional hardware). It is designed for uses in VR and distal control to drastically enrich the user’s interaction possibilities – it displays one’s own hands contoured in a virtual environment. Such a touchless experience is control without the hardware, touch, pinpoint or remote-control pains, ensures immersion by preserving self-perception, but implies massive technical difficulties. Because it is constrained by the device limitations, the approach has to be minimal and data, often, has to be rebuilt or inferred. In that, it centrally resorts to Computer Vision and AI. Computer Vision for real-time image interpretation, and AI for learning (from the ever-changing capture environments) and automation (scoring, gesture validation, depth processing, etc.).

The overall architecture is, indeed, a multi-image/feature-fed RNN that, in turn, nourishes a heuristics-jugulated expert system, but most of the intelligence provided only serves poise, both technically and for the user’s perceptual loops that the AI-boosted external system provides. It (AI) therefore becomes a constituent of human distributed cognition processes, augments the user’s capacities, but never replaces intellection itself.