When we come into this world, it takes us three years or less to build up in our brain a model box with which to reconstruct in our mind real-time representations of the reality immediately surrounding us.  We are then able to speak and be conscious about that reality, act purposefully and start establishing in our memory a personal history.

The raw amount of sensory information we have taken up by that time is less than 200 Million sensory patterns – counting a handful per waking second.  And all that with very scant supervision and in a rather simple and static nursery environment!  

A decent nursery could probably be recreated by a compact virtual reality system not bigger than the human genome.

Compare this to the flood of information needed to train a deep learning system on a single task like sorting objects on photographs, each into one of a thousand classes.

True, at the age of three we cannot name a thousand object types.  But at that age we are fast learners, able to recognize a new type of object after inspecting just one instance, unfazed by variation in shape, material, color, perspective or illumination.

How can Humans Learn so Fast?

What is it that makes humans learn so much faster than our technical systems?  And, more importantly, what enables the infant brain to quickly make its way into purposeful behavior, language, scene reconstruction and consciousness?

Don’t expect final answers in a blog. But a few things are clear, to be sure.

Computer graphic systems are able to create an infinite variety of realistically looking visual scenes on a compact playstation.  They do so by dealing separately with different aspects – shape, texture, pose, spatial arrangement, movement, illumination –, thus opening a universe of individual scenes by combining the same elements and transformations in ever new ways.

The visual system evidently does the same in reverse, decomposing the input into those aspects, such that shape can be learned independently of texture, texture independently of illumination, and so on.

Moreover, structured objects are decomposed into shape primitives.  Infants spend their first years learning those primitives and their patterns of arrangement, such that by the age of three they have a model box from which to recreate whatever shape and whatever scene comes their way.

Language is digested the same way.  

Only that in addition the brain is able to pick up meaning!  

The fragments into which a scene is decomposed during learning may span the different senses, so that the sight and sound and the whole feeling of a scene get connected to meaning in terms of intentions, emotions – or words.  

This makes sense only if the fragments of the scene, of the inner senses and of spoken words that are picked out by attention are of mutual relevance.  So that when they merge, words, emotions and sensations are tied into meaningful building blocks for our mental life.

Why haven’t Electronic Organisms been grown yet?

Everybody knows by introspection that the mind works like this.  But if it is so obvious, why is it that we still don’t have electronic organisms emulating the mind in silico?  The missing ingredient very clearly is a data structure, a language, a neural code, that is able to express all mental structure in the different senses.  A neural code that can shape itself into the hierarchical complex symbols to represent visual scenes composed of objects composed of familiar primitives composed of local features, to represent motion patterns obeying complex grammars, to represent social constellations full of emotional meaning, and to represent, of course, language in the proper sense, with all its intricate meaning-laden phrases and expressions.  Maybe the most crucial aspect of the neural code will have to be the ability to express in a generic form not only all structures in the different senses but also relationships between them, mapping elements onto elements, relations onto relations.

Expressed this way it seems not too outlandish to expect that the missing link, the neural code, might be just around the corner, might be a mere step away from us.  

And as soon it is found we will see an explosion, a tsunami, the emergence of a totally new technology of autonomous agents transforming our life more then profoundly.

When we come into this world, it takes us three years or less to build up in our brain a model box with which to reconstruct in our mind real-time representations of the reality immediately surrounding us.  We are then able to speak and be conscious about that reality, act purposefully and start establishing in our memory a personal history.

The raw amount of sensory information we have taken up by that time is less than 200 Million sensory patterns – counting a handful per waking second.  And all that with very scant supervision and in a rather simple and static nursery environment!  

A decent nursery could probably be recreated by a compact virtual reality system not bigger than the human genome.

Compare this to the flood of information needed to train a deep learning system on a single task like sorting objects on photographs, each into one of a thousand classes.

True, at the age of three we cannot name a thousand object types.  But at that age we are fast learners, able to recognize a new type of object after inspecting just one instance, unfazed by variation in shape, material, color, perspective or illumination.

How can Humans Learn so Fast?

What is it that makes humans learn so much faster than our technical systems?  And, more importantly, what enables the infant brain to quickly make its way into purposeful behavior, language, scene reconstruction and consciousness?

Don’t expect final answers in a blog. But a few things are clear, to be sure.

Computer graphic systems are able to create an infinite variety of realistically looking visual scenes on a compact playstation.  They do so by dealing separately with different aspects – shape, texture, pose, spatial arrangement, movement, illumination –, thus opening a universe of individual scenes by combining the same elements and transformations in ever new ways.

The visual system evidently does the same in reverse, decomposing the input into those aspects, such that shape can be learned independently of texture, texture independently of illumination, and so on.

Moreover, structured objects are decomposed into shape primitives.  Infants spend their first years learning those primitives and their patterns of arrangement, such that by the age of three they have a model box from which to recreate whatever shape and whatever scene comes their way.

Language is digested the same way.  

Only that in addition the brain is able to pick up meaning!  

The fragments into which a scene is decomposed during learning may span the different senses, so that the sight and sound and the whole feeling of a scene get connected to meaning in terms of intentions, emotions – or words.  

This makes sense only if the fragments of the scene, of the inner senses and of spoken words that are picked out by attention are of mutual relevance.  So that when they merge, words, emotions and sensations are tied into meaningful building blocks for our mental life.

Why haven’t Electronic Organisms been grown yet?

Everybody knows by introspection that the mind works like this.  But if it is so obvious, why is it that we still don’t have electronic organisms emulating the mind in silico?  The missing ingredient very clearly is a data structure, a language, a neural code, that is able to express all mental structure in the different senses.  A neural code that can shape itself into the hierarchical complex symbols to represent visual scenes composed of objects composed of familiar primitives composed of local features, to represent motion patterns obeying complex grammars, to represent social constellations full of emotional meaning, and to represent, of course, language in the proper sense, with all its intricate meaning-laden phrases and expressions.  Maybe the most crucial aspect of the neural code will have to be the ability to express in a generic form not only all structures in the different senses but also relationships between them, mapping elements onto elements, relations onto relations.

Expressed this way it seems not too outlandish to expect that the missing link, the neural code, might be just around the corner, might be a mere step away from us.  

And as soon it is found we will see an explosion, a tsunami, the emergence of a totally new technology of autonomous agents transforming our life more then profoundly.