AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Thoughts on the search for the math of intelligence
 
 
  [ # 31 ]

PART 9: CANONICAL FORMS IN FORMOLOGY

The presence of a previously undiscovered branch of math based on shape—the branch I’m calling “formology”—answers some questions I had earlier, and about which nobody asked me here: Where does geometry fit into all this? What would be the single abstract concept on which geometry is founded? Wouldn’t formology just be part of geometry?

I believe the general answer is that geometry is one of those mixed branches of math like Galois theory that I mentioned before. If you read about the history of geometry (http://en.wikipedia.org/wiki/Geometry), you’ll learn that it started off with surveying the earth (therefore its “geo-” prefix). Studies of triangles, angles, parallel lines, etc. allowed accurate and convenient ways to solve practical problems related to earth measurement, like how to measure the width of a river without having to cross it. It was neither a pure study of shape nor a pure study of measurement, but a practical mixture of both studies that took the simplest parts of two branches of math (formology for shape, numerocentric math for measurement) without considering the foundations of each topic. Formology seems to me to be the piece of the puzzle that allows the other pieces to fall into shape, which in turn allows math itself to take on more structure, whereby branches of math might now be combined in more deliberate ways rather than being accidents of history. Currently, if you look at a list of the main branches of math (e.g., http://www.math-atlas.org/), you’ll find little organizing structure, which seems odd for a study that is so extremely well organized at a more detailed level.

Formology is less than two months old and is an almost pristine mathematical landscape waiting to be explored, so this would be a prime topic for somebody looking for a Ph.D. dissertation in math, in my opinion. I personally don’t have the time to explore it in depth unless somebody wants to pay me to do so, and even then I might not have the mathematical credentials to do so successfully. For me this entire thread is just an intellectual exercise because I’m getting bored of coding in Visual Python month after month!

Maybe a good intro would be discussion of the Holy Grail (ultimate product) of formology. For me, formology’s Holy Grail would include:

()
a canonical (= standard) representation of every shape that is theoretically possible
()
a representation that is invariant to translation, rotation, and scaling
()
a metric or set of metrics for measuring how close any given shape is to another shape
()
all of the above based on exact methods arising from math itself rather than ad hoc approaches arising from engineering or biology
()
possibly: naturally-arising categories of shape, especially categories that have never been thought of before
()
ideally: a naturally-suggested set of features to consider when recognizing or categorizing objects
()
ideally: practical applications in pattern recognition and AI, especially computationally tractable approaches
()
ideally: approaches that aid in the solution of famous, longstanding unsolved math problems
()
ideally: a piece of the puzzle that allows all branches of math to form a more organized overall structure and to become more interrelated

One realization that is very tantalizing and promising is that formology is much more specific than topology, therefore if topology could become so well developed with such a bizarrely flexible foundation, formology should be able to do much better. You’ll probably soon notice that topics in topology keep popping up in conjunction with the topics of shape and formology, maybe because topology is not too far distant from the study of shape in that both look for general properties of an object whose points may have been slightly displaced. Topology is sometimes called “rubber sheet geometry” because in topology it is general connectedness that is important, rather than exact shape, but note that topology would default to formology whenever objects were not deformed at all. That is a good sign: in one sense, formology is a special case of topology. If so much good mathematics can be produced from flexible rubber sheets, think what could be produced from solid steel sheets.

One major difference between topology and formology is that topology is largely concerned with classifying surfaces rather than measuring them or measuring the differences between them. That’s understandable since topology is dealing with such a high level of abstraction that not much measurement can be done; classification is about all you can do if you’re incapable of measuring something. The same is not true of formology, where shapes are considered rigid configurations that can be measured in many ways.

I believe the first steps in formology should be to reduce the extreme amount of variety in the concept of “shape” down to a much smaller number of possibilities via constraining definitions and canonical forms. For starters, I would recommend that formological “shape” be defined only as contours, to the exclusion of any pattern within that contour. For example, a smiley face would reduce to a circle, and an oil painting would reduce to the rectangular shape of its picture frame. Each object inside of those structures could be considered independently, but to try to conjunct shapes seems to be inviting unmanageable complexity and seems to me to be counterproductive. Also, although “shape” applies to any dimensional object from 2D on up, I believe the most rapid progress in the field would be made when starting with 2D objects, then any discoveries or theorems from 2D contours could be extended to higher dimensions later.

Two major remaining problems for formology would be infinite shapes and fractal shapes. I would assume that one of the main goals of formology would be to apply it eventually to real-world objects, which are generally well-behaved, meaning finite-sized with clear-cut contours.

As for infinite shapes, very few real-world objects could be considered infinite with respect to their visible extent, but those few objects (say the shape of a laser beam projected into outer space) that are infinite could be reduced to finite sizes by different methods. One such method would be to represent only the cross-section of the object (a cylinder, in the case of a laser beam) and to label a diagram of that cross section with an infinity symbol along the axis that illustrates the length of its extention along its long axis. In a theoretical object like a pole in a plot of a complex zeta function (http://www.storyofmathematics.com/images2/riemann_hypothesis.gif), any kind of hyperbolic image transformation would be able to convert it to a finite object/shape, with maybe the singular point represented by an open circle, which would be inconsequential as far as affecting shape description. In either case, no information would be lost since any given shape, whether infinite or finite, consists of aleph(1) many points (along its contour).

As for fractal shapes, I believe those will be a problem for a long time, maybe indefinitely. As we saw, shape theory was extremely complex because it attempted to deal with such misbehaved shapes, and formology probably would, too. The best way I know to deal with fractals is with fractional dimensions such as box counting dimension, information dimension, correlation dimension, etc. (http://en.wikipedia.org/wiki/Fractal_dimension). I think those methods of measuring fractional dimensions are great ideas, though they strike me as artificial or ad hoc, rather than based on naturally arising formulas from pure math, which is probably why there exist so many suggestions for such measures. In any case, they might be the only tools currently available.

With the above suggestions, nearly all shapes could be reduced to contours of finite size without much difficulty, which would render such shapes similar to the shapes of real-world objects to which we will be applying formology, which is a great start. But then the real work begins, which is how to describe a finite shape, usually with a well-defined contour.

Of course all shapes considered in formology will be defined independent of location, size, or rotation, since intuitively we would like an oriented shape (like a teardrop-shaped drop of falling water) to have the same description in any orientation (like a teardrop-shaped bullet, say). Therefore orientation invariance of any shape must be implemented. Size invariance could just use the simple convention of scaling the longest distance to 1. For orientation invariance I would recommend using circles as the basic representation framework, since circles don’t have any rotation invariance: they are radially symmetric. Then by projecting rays out from the center of such a circle, the rays would fall upon any features of interest, and by considering the angles between rays falling upon important features, rotation would become irrelevant, and rotation invariance would be implemented. At this stage it wouldn’t matter how you drew the object with respect to size or orientation, because everything would be based on radial symmetry, such as angles of rays, and all reference distances would be relative from the center of the circle.

With the above scheme for rotation invariance, the only remaining issue would be where to place the reference circle relative to a given object. The simplest and most obvious answer is that the center of the circle should be placed at the object’s centroid. Every finite object has only one centroid, so that location is a unique point for each object, and the meaning of a centroid corresponds naturally to the concept of a circle anyway, since it is from the centroid that a shape can be rotated without ever becoming off-balance. (A “centroid” is basically the center of an object. A “centroid” is the same as the “center of mass” unless the object has uneven weight within it. (http://en.wikipedia.org/wiki/Centroid))

In summary, a canonical representation of any formological shape should be obtainable by this method based on a unit circle:

(1) Remove all interior details of the object; retain only the outer contour (or surface, if the object is 3D or higher).
(2) Convert any infinitely large objects to finite objects via symmetry shorthand or a hyperbolic transform.
(3) Determine the location of the resulting object’s centroid.
(4) Overlay a unit circle with the center of that unit circle at the centroid.
(5) Determine the maximum distance from the centroid to the contour’s most distant contour point, scale that distance to 1, and run the contour of the unit circle through that contour point.
(6) Project rays from the centroid to coincide with any important identifying contour points, and label such rays as desired, including the (normalized) distances from the centroid at which those features lie.

The result should be a unique signature for the given shape, independent of rotation or scale. No contour information is lost since aleph(1) points in the original object are mapped to aleph(1) points in the canonical representation, even for infinite objects. With this method, almost any shape can be reduced to a set of angles, distances, and labels, which makes shapes much more amenable to analysis.

No doubt a lot of work has been done in the field of image recognition, and no doubt many of my suggestions have already been considered and implemented in many systems. Much of this prior work is unknown to me, so I’m sure I’m reinventing the wheel in many respects. The main difference is that object recognition has always been an eclectic engineering field before, whereas here I’m suggesting that there exist much more rigorous foundations to the field, which together constitute a new branch of math that would form the core of any “equations of intelligence”

 

 
  [ # 32 ]

Some items you may find interesting as you think about shapes.
http://en.wikipedia.org/wiki/Computational_geometry
http://en.wikipedia.org/wiki/Wallpaper_group
http://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain
http://en.wikipedia.org/wiki/Point–line–plane_postulate

A lot has been done in 3d rendering. It is used in ray tracing.
http://geomalgorithms.com/a05-_intersect-1.html

Cosine similarity is used to evaluate topic similarity based on vectors.
http://en.wikipedia.org/wiki/Cosine_similarity

Some of your discussion reminds me of the issues we hit when trying to digitize and render fonts.
There are a number of representations (bitmap, line&arc;, raster, Bézier curves, Etc.).  Finding compact, easy to manipulate representations was a constant area of research for many years.

 

 
  [ # 33 ]

ERRATA ON TRANSFINITE NUMBERS

()
I checked Gamov’s book and learned that my original quote from Gamow was correct: he *did* omit aleph(0) from his list. That is bad for didactic purposes, but then I realized he probably did that to make that list consistent with his cute story about Hottentots counting “1, 2, 3”, which also matches the title of his book: “1, 2, 3… Infinity”.
()
My aleph(3) example was very poorly worded so I plan to post a full explanation about that later, though it’s mostly an irrelevant detail with respect to the main topic.
—————
ERRATA ON FORMOLOGY

()
“circles don’t have any rotation invariance”
...should be…
“circles don’t have any rotation variance”
()
It might have been more accurate to say “*inverse* hyperbolic transformation” rather than “hyperbolic transform”, since the goal was to cut infinite size down finite size, not the other way around. A logarithmic transform is one such example of what I am calling an inverse hyperbolic transform.
()
Similarly, the zeta function picture in my link is actually the *reciprocal* of the zeta function, not the original zeta function. I think math people like to show that reciprocal function more often since it makes the zeroes pop up as infinite poles, which makes them easier to see than flat spots on the plane.


By the way, my assessment of the difficulty of the problem of defining fractional dimension seems to have been right on target, because here’s a quote I came across since I posted that comment:

(p. 105)
  We have mentioned already that a straight line and the Koch curve are
topologically the same. Moreover, a straight line is the prototype of an object
which is of dimension one. Thus, if the concept of dimension is a topological
notion, we would expect that the Koch curve also has topological dimension
one. This is, however, a rather delicate matter and it troubled mathematicians
around the turn of the century.
  The history of the various notions of dimension involves the greatest math-
ematicians of that time:
men like H. Poincare, H. Lebesgue. L. E. J. Brouwer,
G. Cantor, K. Menger, W. Hurewicz, P. Alexandroff, L. Pontrjagin, G. Peano,
P. Urysohn, E. Cech, and D. Hilbert. That history is very closely related to the
creation of the early fractals. Hausdorff remarked that the problem of creating
the right notion of dimension is very complicated
. People had an intuitive idea
about dimension: the dimension of an object, say X, is the number of indepen-
dent parameters (coordinates), which are required for the unique description
of its points.

(“Chaos and Fractals: New Frontiers of Science, Second Edition”, Heinz-Otto Peitgen & Hartmut Jurgens & Dietmar Saupe, 2004)

Also, another possible measurement of fractal dimension, namely Ljapunov exponents, was not listed in the Wikipedia link I provided:

(p. 688)
  Based on numerical experiments made around 1978 James L. Kaplan and
James A. Yorke came to the conclusion that in most cases it is possible to predict
the dimension of a strange attractor from the knowledge of the Ljapunov
exponents
of the corresponding transformation. Although their formula
for the dimension has not been rigorously proven (except for some special
cases) it opens up the door to the experimental study of dimensions for many
dynamical systems. This so called Kaplan-Yorke conjecture has been tested
and discussed in many research papers. It is such an important topic because
in many dynamical systems the various dimensions of the attractors are hard
to compute, while the Ljapunov exponents are relatively accessible.

(“Chaos and Fractals: New Frontiers of Science, Second Edition”, Heinz-Otto Peitgen & Hartmut Jurgens & Dietmar Saupe, 2004)

(“Ljapunov” is supposed to be pronounced “lyah “poo ‘nof”, but I’ve always heard it pronounced as “lee ‘ap uh nahf” (http://www.nku.edu/~longa/classes/mat305/resources/pronunciation.html).)

More reason to put off worrying about describing fractal shapes until much later!

Since it’s clear I’m about to sink into a highly detailed description of my thoughts on formology, I’ll try to wrap up this thead in 1-2 more posts, then post such thoughts as addenda.

 

 

 
  [ # 34 ]
Mark Atkins - Sep 8, 2013:

Feigenbaum’s two constants look like the best candidates to me, but I don’t believe a formula for them exists, and they haven’t yet popped up outside of narrow contexts within chaos theory.

That was a good assessment Mark. grin Here’s a quote I came across recently that suggests the same thing:

(p. 546)
  The number delta = 4.6692… is a constant of chaos comparable only to
the fundamental importance of numbers like pi
. Feigenbaum’s discovery was
the first of many footprints by which the tracks of chaos are now recognized.
The number delta has been observed in systems as varied as dripping faucets, the
oscillation of liquid helium, and the fluctuation of gypsy moth populations. It
is a predictable constant in the world of chaos.

(“Chaos and Fractals: New Frontiers of Science, Second Edition”, Heinz-Otto Peitgen & Hartmut Jurgens & Dietmar Saupe, 2004)

 

 

 
  [ # 35 ]

PART 10: COMMONSENSE REASONING

Regardless of which type of mathematics is chosen for modelling intelligence, even for the complexities of vision, the showstopper will likely be the well-known AI problem called “commonsense reasoning” (http://en.wikipedia.org/wiki/Commonsense_reasoning), which was first noticed even back in the 1950s. Commonsense reasoning is the kind of reasoning humans do almost all day long without thinking, exactly the kind of reasoning that computers have so much trouble doing. Minsky gives this example of a computer trying to stack blocks as a child might:

(p. 22)
It would
have to decide whether there are enough blocks to accomplish its goal and whether they are
strong and wide enough to support the others that will be placed on them.
  What if the tower starts to sway? A real builder must guess the cause. It is [sic] because some joint
inside the column isn’t square enough? Is the foundation insecure, or is the tower too tall for
its width? Perhaps it is only because the last block was placed too roughly.
  All children learn about such things, but we rarely ever think about them in our later years.
By the time we are adults we regard all of this to be simple “common sense.” But that deceptive
pair of words conceals almost countless different skills.

Common sense is not a simple thing. Instead, it is an immense society of hard-
earned practical ideas—of multitudes of life-learned rules and exceptions, disposi-
tions and tendencies, balances and checks.

  If common sense is so diverse and intricate, what makes it seem so obvious and natural? This
illusion of simplicity comes from losing touch with what happened during infancy, when we
formed our first abilities. As each new group of skills matures, we build more layers on top of
them. As time goes on, the layers below become increasingly remote until, when we try to
speak of them in later life, we find ourselves with little more to say than “I don’t know.”

(“The Society of Mind”, Marvin Minsky, 1986)

So far I’m aware of only a few general approaches to commonsense reasoning, especially…

(1) the Cyc project of Doug Lenat, which attempts massive brute force coding of all commonsense knowledge that might be relevant
(2) Marvin Minsky’s agents, the approach quoted above
(3) formal logic, favored by John McCarthy
(4) reflexive reasoning of Lokendra Shastri
(5) my own approach

...but note that these solutions are all over the spectrum, which suggests there is absolutely no consensus in how to approach this deliciously complicated problem, and mathematics seems to be at a total loss to adequately tackle this domain.

My opinion of the above approaches are:

(1) Cyc substitutes inference for memory, or time for space, and will therefore run afoul of the 300 msec inference limit imposed by practical constraints.
(2) Minsky’s approach is convincing, but is just more automata theory that doesn’t address the difficult problem of vision in sufficient detail.
(3) Formal logic almost certainly is an inappropriate method of representation for the real world:

(p. 24)
  There are, however, two important differences between the mathe-
matical mindset and the mindset of research in commonsense reason-
ing. First, mathematics is driven by abstraction; it looks for common
abstract structures that underlie seemingly very different phenom-
ena. It is one of the glories of mathematics that phenomena as dif-
ferent as small-arc pendulums, masses on springs, LRC circuits, and
time-varying populations can all be characterized by the same kinds
of differential equations. In commonsense theories, by contrast, while
it is important that structural similarities exist between representa-
tions in different domains so that analogies can be constructed and
applied, it is equally important that the concrete, individual aspects
be kept in sight. If two very different kinds of things look the same
in your theory, then you have abstracted some important issues away.
Things that are commonsensically different should be described by a
different kind of commonsense theory.
(This argument is contested in
[Hobbs 1987]).
(p. 25)
  Another mathematical goal that is not generally carried over to AI is
that of axiomatic parsimony. Mathematicians like to find the smallest,
weakest set of independent axioims that will give the desired results.
Partly, this is a matter of mathematic aesthetics; partly, a desire
to make theories as general as possible; partly, a desire to make the
axioms as self-evidently true and consistent as possible. None of this
matters in AI. Knowledge bases in commonsense domains will, in any
case, have so many axioms (many of them just stating contingent facts
like “John loves Mary”) that nothing will make them aesthetically
appealing, or self-evidently true. Generality hardly matters, since AI
systems are resolutely focused on the specific.

(“Representations of Commonsense Knowledge”, Ernest Davis, 1990)

(4) Shastri’s system uses programmed networks—more automata theory—and doesn’t address the problem of vision.
(5) My own approach hasn’t been developed to the extent that it can even describe 2D objects, and my approach hasn’t been disclosed yet, anyway.

Yes, commonsense reasoning is a very difficult problem!

By the way, here are some great examples of the kind of obvious (to humans) knowledge that needs to be explicitly taught to a computer, without which a computer could produce some very humorous envisionments:

(p. 240)
  Stemming from an admission of defeat, Cyc (short for encyclopedia)
is a $25-million research project that will last for two person-centuries.
Lenat had become convinced that no amount of finessing and fancy
footwork would ever let a machine discover by itself such elementary
facts as “Nothing can be in two places at once,” or “Animals don’t like
pain,” and “People live for a single solid interval of time.” The most
salient discovery in AI since the Dartmouth conference is that we need
to know a colossal number of these common-sense assertions to get by
in the world. Lenat convinced his MCC sponsors that the woes of the
new discipline stemmed from repeatedly trying to wriggle out of the
need to encode this knowledge manually
, tedious fact after painful
assert, in machine-usable form.

(“AI: The Tumultuous History of the Search for Artificial Intelligence”, Daniel Crevier, 1993)

(p. 207)
  Ideally, all the relevant knowledge would be included in rules such as these, and
the program would be able to work out logically the consequences of the rules to
understand whatever situations came up. But decades of experience with GOFAI
have proved that this is hopeless. Hand-coding all the knowledge people have,
including implicit facts such as that waitresses drink with their mouths not with their
feet, is an immense undertaking and would build a huge network if it could be done
at all. Reasoning from this huge network to work out consequences of the various
rules would be incredibly slow if it could be done at all.

(“What Is Thought?”, Eric B. Baum, 2004)

(p. 167)
  The root of all the trouble, and the Achilles heel of AI that McDer-
mott zeroed in on, was what had become known as the common-sense
knowledge problem.
When you or I or anyone else reason, we use all
sorts of common sense knowledge of the world. We do so unreflectingly
and in general unknowingly, and we do it all the time. The “if you die,
then you are dead” and “if you are dead, then you stay dead” rules we
noticed earlier are just two of what is literally an endless list of such com-
mon-sense knowledge facts.

(“Goodbye, Descartes: The End of Logic and the Search for a New Cosmology of the Mind”, Keith Devlin, 1997)

What does commonsense reasoning have to do with the mathematics of intelligence? One of the fundamental goals of AI is to get machines to understand the real world, but to understand the real world there must be some sort of sensory interface to get a copy of the real world into the machine, the cleanest and most obvious interface of which is vision. But vision is just mechanical classification or identification of objects, without any inherent understanding, whereas understanding and thought are more central to the notion of the intelligence we want to achieve. Therefore vision is just one step toward intelligence.

However, while there are some mathematical approaches to vision, there may not exist a mathematical approach to commonsense reasoning, since such a mathematical approach would have to describe nearly every possible cause-and-effect scenario and nearly every possible functional relationship an organism is likely to encounter, which at first glance seems outrageously intractable, even if done statistically. It would be difficult enough to code all the applicable known laws of physics (i.e., mass, gravity, friction, momentum, elasticity, light reflections, buoyancy, etc.) into a computer, so modeling prediction based on psychology, animal behavior, interactions of common objects, etc. would likely be even more difficult. Imagine trying to predict the outcome of a person getting angry and clenching his or her fists, or the outcome of a monkey discovering a jar with an object in it, or even the outcomes of bizarre scenarios that have never even been witnessed before, even involving things that don’t exist, like the result of dropping a unicorn into the bottom of an infinite inverted cone, or scrambling an egg as it rolls down an infinite sheet of heated sandpaper. That our brains can easily make reasonable predictions of the outcomes of such scenarios is phenomenal; mathematics would be at a extreme disadvantage to describe even the involved objects, much less their projected outcomes. Yet commonsense reasoning is said to lie at the heart of many AI problems, including vision and language:

(p. 1)
In order for an intelligent creature to act sensibly in the real world,
it must know about that world and be able to use its knowledge effec-
tively. The common knowledge about the world that is possessed by
every schoolchild and the methods for making obvious inferences from
this knowledge are called common sense. Commonsense knowledge
and commonsense reasoning are involved in most types of intelligent
activities, such as using natural language, planning, learning, high-
level vision, and expert-level reasoning. How to endow a computer
program with common sense has been recognized as one of the cen-
tral problems of artificial intelligence since the inception of the field
[McCarthy 1959].
  It is a very difficult problem.
Common sense involves many subtle
modes of reasoning and a vast body of knowledge with complex inter-
actions.

(p. 2)
In short, most
of what we know and most of the conscious thinking we do has its
roots in common sense. Thus, a complete theory of common sense
would contain the fundamental kernel of a complete theory of human
knowledge and intelligence.

(“Representations of Commonsense Knowledge”, Ernest Davis, 1990)

If somebody is looking for “equations of thought”, then it makes more sense to look for the “equations of commonsense reasoning” instead, but if the complexity of commonsense reasoning seems to defy mathematical description, then that seems to kill any hope of approaching AI via equations.

All of the above suggests that math-as-we-know-it is not the proper route to strong AI, and I’ll give yet more evidence of this later.

 

 
  [ # 36 ]

PART 11: BEYOND MATHEMATICS

In all my recent book reading I’ve gradually become aware of a consensus among authors that seems to be materializing independently: mathematics is not going to take us to true AI. Basically, we need to go beyond mathematics, either beyond any of math’s currently existing forms or even beyond math altogether, to take us to understanding of human level thought. Several books I found suggest this situation in various contexts…

(1)
The most clear-cut argument was probably from Keith Devlin, who believes that the approach to commonsense reasoning does not involve any kind of math that is currently known. He claims that our traditional Western approach, namely rationality and mathematics, to all problems is failing us in its applicability to AI:

(p. 181)
  Western culture is dominated by an approach to knowledge that goes
back to Plato, and to his teacher, Socrates. Their love of mathematics and
of precise definitions led them to discount any human talent, ability, ac-
tivity, or skill that could not be defined and explained and subjected to ra-
tional argument.

(p. 183)
One circumstantial argument in favor
of this conclusion that I personally find appealing is that right from the
word go, the field of AI attracted some of the smartest thinkers around.
When so many very bright minds, provided with enormous resources,
failed to achieve their goal, it makes sense to look for a reason. And the
most obvious explanation in this case is that the goal is an impossible
one.

  Pascal put the point quite clearly—and bluntly—in his collection Pen-
sees, written in 1670:

These principles [involved in reasoning] are so fine and so numerous that a
very delicate and very clear sense is needed to perceive them, and to judge
rightly and justly when they are perceived, without for the most part being
able to demonstrate them in order as in mathematics. . . . Mathematicians
wish to treat matters of perception mathematically, and make themselves
ridiculous . . . the mind . . . does it tacitly, naturally, and without technical
rules.

  Pascal’s words might seem unnecessarily cruel when quoted in ref-
erence to twentieth-century work in AI, but remember that Pascal was
himself a mathematician. He was not decrying mathematics or mathe-
maticians. He was simply pointing out that, although mathematical
thinking has many uses, it cannot be applied to everything, and the func-
tioning of the human mind is one of the things to which it cannot be
applied.

(“Goodbye, Descartes: The End of Logic and the Search for a New Cosmology of the Mind”, Keith Devlin, 1997)

(2)
As I mentioned before, Ernest Davis’ comments about mathematics being driven by abstraction/generality and axiomatic parsimony, both of which are strongly lacking in real-world/AI problems, imply that mathematics and knowledge bases are not suitable tools for AI:

(p. 25)
None of this
matters in AI. Knowledge bases in commonsense domains will, in any
case, have so many axioms (many of them just stating contingent facts
like “John loves Mary”) that nothing will make them aesthetically
appealing, or self-evidently true. Generality hardly matters, since AI
systems are resolutely focused on the specific.

(“Representations of Commonsense Knowledge”, Ernest Davis, 1990)

(3)
Also as I mentioned before, James Bailey believes conventional math is obsolete, and will be replaced in the future with something he calls “intermaths”, which are exotic forms of math that are performed on extremely fast computers:

(p. 9)
  The designers of the first electronic computers deliberately
designed them to carry out the same numerocentric operations of alge-
bra and calculus that had been developed back when all computers
were people and when available computing capacity was minuscule.
These equational maths came to prominence in the Renaissance,
replacing the circles and diagrams of geometry. Their entry into the
schools almost three hundred years ago was the last substantial change
in the secondary mathematics curriculum.
  In an age when computing power is abundant, these maths are
obsolete.

(p. 26)
One of the great opportunities I think for the next few decades is
the development of a mathematics which is suitable to social sys-
tems, which the sort of eighteenth-century mathematics which we
mostly use is not. The world is topological rather than numerical.
We need non-Cartesian algebra as we need non-Euclidean geome-
try, where minus minus is not always plus, and where the bottom
line is often an illusion.

  Technology itself was critical in the transition from Part One to Part
Two, and it is critical again today. For classical scientists, the vocabu-
lary of rate of change, or pace, was out of reach because they had no
clocks. Similarly, for Industrial Age scientists, the vocabulary of adap-
tation was out of reach because they had no electronic computers and
few data to train on. The new adaptive maths presuppose massive
amounts of computing power, far more than was available even fifty
years ago, when all computers were still people.

These vast sprawling formu-
lations may then be adapted and changed in tiny, but persistent, ways.
A whole new set of such sprawling maths is now being created. Col-
lectively called evolutionary maths or intermaths or emergent behavior
maths or net maths, they presuppose the electronic computing milieu.
They take full advantage both of the elbow room and the erasability of
modern computer storage as they do their work. Each tiny piece of an
intermath formulation may be meaningless by itself, but the changing
aggregation of millions of them interacting in parallel is proving able
to save the appearances of phenomena heretofore unserved by num-
bers and equations:

(p. 28)
  These new intermaths have still-unfamiliar names like neural net-
works, genetic algorithms, simulated annealing, artificial life, and cellu-
lar automata. Pioneered by the computer scientist John Holland in the
(p. 29)
1960s and 1970s, they blossomed with the advent of parallel comput-
ers in the 1980s.

(“After Thought: The Computer Challenge to Human Intelligence”, James Bailey, 1996)

(4)
Finally, Stephen Wolfram, in his thousand-page new book “A New Kind of Science”, discusses how his 20 years of cellular automata research suggests that a new kind of science is beginning, one where almost all processing systems encountered in nature are realized to be extremely complex processes, which he calls “computationally irreducible”...

(p. 717)
  One might have assumed that among different processes
there would be a vast range of different levels of computational
sophistication. But the remarkable assertion that the Principle of
Computational Equivalence makes is that in practice this is not the
case, and that instead there is essentially just one highest level of
computational sophistication, and this is achieved by almost all
processes that do not seem obviously simple.

(p. 742)
  So when computational irreducibility is present it is inevitable
that the usual methods of traditional science will not work.
And indeed I suspect the only reason that their failure has not been
more obvious in the past is that theoretical science has typically tended
to define its domain specifically in order to avoid phenomena that do
not happen to be simple enough to be computationally reducible
.

(p. 743)
  So the result is that computational irreducibility can in the end
be expected to be common, so that it should indeed be effectively
impossible to outrun the evolution of all sorts of systems.

(p. 745)
  So the result of this is that if there is a traditional mathematical
formula for the outcome of a process then almost always this means
that the process must show great computational reducibility
.

(“A New Kind of Science”, Stephen Wolfram, 2002)

This is roughly analogous to the mathematical conjecture that almost all real numbers are transcendental, which is known to be a true statement (http://sprott.physics.wisc.edu/pickover/trans.html), therefore I suspect Wolfram is correct. Even though Wolfram is focused on computatational complexity demonstrated by cellular automata, one implication is that the human brain is also computationally irreducible, which means that there will be no fast way to simulate it: it will be its own fastest simulator.

Although I can’t find the quote at the moment, I believe it was this book and author that predicted that most math problems in the future will be undecidable in the Godel sense. That’s a rather dire prediction that suggests we’re reaching the limit of what math-as-we-know it can do. (I’ll post the quote when I find it.)

This general notion that math is becoming outdated due to most important modern problems being undecidable or impossible to simulate, which in turn means math is beginning to reach its limits for applicability in the real world, especially in AI, is fascinating but not unprecedented: Godel’s undecidability proof and chaos theory are two examples of some recent modern concepts of which our modern world has become aware, and both are unsettling concepts that require looking at our world in a very different way, especially in a more pessimistic way. I believe the extremely difficult problem of commonsense reasoning will be one of those problems that cannot be solved by math in any reasonable way, and since commonsense reasoning is at the heart of AI, that in turn means that intelligence cannot be understood by purely mathematical approaches. I believe the commonsense reasoning problem will be solved convincingly, and fairly soon, but the solution will turn out to be an engineering solution that just works because its components were put together logically to tackle that specific problem. In a sense, I believe that to go “beyond mathematics” will require the same kind of ad hoc solutions that nature painstakingly discovered—computationally irreducible solutions—whether for brain operation, organic chemistry, or other. That seems logical to me, since the real world is much more complex than the virtual worlds of digital computers or the idealized worlds of geometry and math.

These nascent suspicions of authors that math will not help us produce AI is just one reason I believe that seeking equations of thought is seriously misguided. I’ll list several more reasons in my summary & conclusion post, hopefully my next post.

 

 
  [ # 37 ]

Follow-up to “Beyond Mathematics”...

I found the Wolfram quote(s) I mentioned about his predictions of the future of mathematics:

(p. 782)
  In the early 1900s it was widely believed that this would
effectively be the case in all reasonable mathematical axiom systems.
For at the time there seemed to be no limit to the power of
mathematics, and no end to the theorems that could be proved.
  But this all changed in 1931 when Godel’s Theorem showed that
at least in any finitely-specified axiom system containing standard
arithmetic there must inevitably be statements that cannot be proved
either true or false using the rules of the axiom system.
  This was a great shock to existing thinking about the foundations
of mathematics. And indeed to this day Godel’s Theorem had continued
to be widely regarded as a surprising and rather mysterious result.

(p. 791)
  But from the discoveries in this book it now seems quite certain
that vastly simpler examples also exist. And it is my strong suspicion
that in fact of all the current unsolved problems seriously studied in
number theory a fair fraction will in the end turn out to be questions
that cannot ever be answered using the normal axioms of mathematics
.

(p. 791)
  And indeed from the Principle of Computational Equivalence I
strongly believe that in general undecidability and unprovability will
start to occur in practically any area of mathematics almost as soon as
one goes beyond the level of questions that are always easy to answer
.
  But if this is so, why then has mathematics managed to get as far
as it has? Certainly there are problems in mathematics that have
remained unsolved for long periods of time. And I suspect that many of
these will in fact in the end turn out to involve undecidability and
(p. 792)
unprovability. But the issue remains why such phenomena have not
been much more obvious in everyday work in mathematics.
  At some level I suspect the reason is quite straightforward: it is
that like most other fields of human inquiry mathematics has tended to
define itself to be concerned with just those questions that its methods
can successfully address
. And since the main methods traditionally
used in mathematics have revolved around doing proofs, questions that
involve undecidability and unprovability have inevitably been avoided.

(p. 821)
  So what this means is that in the future, when the ideas and
methods of this book have successfully been absorbed, the field of
mathematics as it exists today will come to be seen as a small and
surprisingly uncharacteristic sample of what is actually possible
.

(“A New Kind of Science”, Stephen Wolfram, 2002)

Almost paradoxically, however, Wolfram believes that the brain might be built on simple principles and that the complex systems mentioned in his Principle of Computational Equivalence may not specifically apply to the brain:

(p. 628)
  But from the discoveries in this book we now know that highly
complex behavior can in fact arise even from very simple basic rules.
And from this it immediately becomes conceivable that there could in
reality be quite simple mechanisms that underlie human thinking.
  Certainly there are many complicated details to the construction
of the brain, and no doubt there are specific aspects of human thinking
that depend on some of these details. But I strongly suspect that there is
a definite core to the phenomenon of human thinking that is largely
independent of such details—and that will in the end turn out to be
based on rules that are rather simple.

(“A New Kind of Science”, Stephen Wolfram, 2002)

Other interesting quotes from him relate to the inherent problem of using logic for modelling intelligence, and to commonsense reasoning and implications for a computer passing the Turing test:

(p. 627)
  In the past it was often thought that logic might be an appropriate
idealization for all of human thinking. And largely as a result of this,
practical computer systems have always treated logic as something
quite fundamental. But it is my strong suspicion that in fact logic is
very far from fundamental, particularly in human thinking
.

(p. 629)
  But a crucial point is that on their own such processes will most
likely not be sufficient to create a system that one would readily
recognize as exhibiting human-like thinking. For in order to be able to
relate in a meaningful way to actual humans, the system would almost
certainly have to have built up a human-like base of experience
.
(p. 630)
  No doubt as a practical matter this could to some extent be done
just by large-scale recording of experiences of actual humans. But it seems
not unlikely that to get a sufficiently accurate experience base, the system
would itself have to interact with the world in very much the same way
as an actual human
—and so would have to have elements that emulate
many elaborate details of human biological and other structure.

(“A New Kind of Science”, Stephen Wolfram, 2002)

By the way, from what I’ve heard on forums, it sounds to me like some people get totally obsessed with cellular automata, and also from what I’ve heard, the reason is that in cellular automata very complicated and visually interesting Turing complete systems arise as a result of very simple, mindless rules that specify only which patterns of 3 cells will map to a new “color” (shaded cell) in the next stage of output.

 

 
  [ # 38 ]

Hi Mark

Wow! tried/tyred to read over your incredible large set of posts.. and I’m still confussed!

Let me give some thought upon the topics, I am actually working in this stuff of AI and intelligence.

Statistics and AI-Math are a thought new “way” to machine-learn over raw data, dicovering order among kaos. All they are some kindof regression-calculus, applied in this or that way. all are somehow equivalent, even neural networks are similar to SVM and HMM is easily understand as of modelling statsitic behavior in a super-simplistic way (short dependencies). We (life on earth) do similar stuff in our brains + chemical logic since we were simple amoebas!

Actually I am focused upon simple “understanding” of the supposed logic that may be lies behind the human brain.

For example a formidable problem is to understand the complex “date-time” specification done in Natural Language.

The number of usable combinations is practically undefined (or too high)
you can use many mdifiers like after, previous, then when while etc. upon you might try to find some logic.

Many libraries tried and attempt to do this, may using using regular expressions (even at word/POS-level) many are in Javascript, and many others are open source. But they all have a high failure-ratio (>50%) when you really express a date/time as your mind dictates, in each and any situation, no constrains.

SO I tried not to theorize too much and constructed an essay of a logical-math for this kindof stuff.

The steps are simple using KISS lemma

1) Transform ‘words’ into an abstract chain of math representation (vector, with attributes, many of them might operate if chained)
2) Sort this “chain” in a “intelligible logical-way” (my own deduction, no background theory) even with possible multiple-representation (ambiguity)
3) assemble the chain, operating upon the context, discarding the most ambiguity, and ponderating each “plausibility” of the time-date logic operation, doing pertinent date-time math.
4) output the result in a Date-Time Period/logic (an instanced class)
5) convert this class into operable data (intersections, time-date points, recurrence)
6) generate apropriate Natural Language

ah.. ¿the results?

A somehow greedy-buggy algorithm, but generally nice and works acceptable… I tried to pach all leaks (but still has many..) but..

Understands even strange things like (translated to English) it actually does it in Spanish:

“the day after yesterday in the morning”
“second monday of the third month of the last year at night”
“first thursday of the last month of the year after 2003”
“next friday 13th”
“last saturday night” (fever..)
“year 2002 last month first wednesday”

..enjoy!

and thats weired enoug for me!

 

thats it

 

 


 

 

 

 

 

 
  [ # 39 ]
Andres Hohendahl - Oct 5, 2013:

Statistics and AI-Math are a thought new “way” to machine-learn over raw data, dicovering order among kaos. All they are some kindof regression-calculus, applied in this or that way.

That is how Google is treating the math applied to AI, too, and I believe that is going astray with respect to understanding the essence of intelligence. For example, I can’t believe that when a bee rotates an incoming image to match it to a learned visual pattern that statistics is the essence of that operation. At the very least animal brains do something like affine transformations, which has nothing to do with statistics. Statistics is more useful for making sense of large quantities of data of which one has no prior applicable knowledge, which is a useful applied direction of research for existing digital computers, but a truly intelligent machine would not need such statistical operations since it would have understood all that data right from the start when the data was being inputted! I’m “going for the jugular”, trying to make a breakthrough in AI that challenges the very essence of the math, representations, and machines we’ve been using that have been so unsuccessful for producing strong AI.

 

 

 
  [ # 40 ]

Mark, forgive e but I dont agree with you on the though that “statistics cannot describe true-AI”

Let me explain a little: our brains is the result of evolution, which got modelled by interaction with nature & environment, interaction is naurally statistics, thus statistics modelled our brains on the long-trerm.

On the other hand (short term) lets speak dynnamic-AI, statistics IS the way neural-networks learn patterns, with time, then those patterns became organized into specializad clusters, and when this clusters proof successfull on some repetitive task => the whole species, includes this info. for pre-organization, into its offspring’s DNA with this successful induced-mutations. Thus complex configuration of even small insect’s brains, is modelled by statistic behavior. So its all there.

The human languages are a classical example of this, we do have Broccas areas and verbal area, and many specialized areas inside of which a very similar complex linguistic-understanding and generation phenomena takes place on all human cultures. This is built-burned into our DNA. Any human, under any circumstances do indeed develop a language, which is similar in structure. almost all human languages share structural and logic affinity.
also we are capable of learning several languages, but.. they land on different areas (spare clusters) in our brains, having similar structure. (left & right Broccas Area)

But there is a distinction betwen long-term and short term statistics.

Here lies the confusion:

Long term statistics, burns “software into hardware” aka. Environmental Influence -> DNA

Then DNa constructs those “engines” (similar to learned algorithms) and bulds successful brains (and all other organs) into the offspring, so they are successful entities fron the day 0, this is like a Insticnt-ROM, the microprogram says:
“what to eat or not, where or wether to hide under certain circumstances, etc.” this gives the successful punch into evolution of primitive animals.

Then more evolved animals relay on secondary structures (hardware, built by DNA) which are capable to adapt in a lifetime, and perform complex calculations, based on sensory information, to survive, procreate and get food.

They even do some and plannification (even the simplest beings, are nor totally random at all) they have chemical-logic built-in, in a distributed way. (amoebas, hydras, etc.)

True-AI, for me, is like the correct hardware (bootstrap-algorithms) where the software + data, will produce more abstract calculations and this results into a meta^N auto-modification (learning in the lifetime) and having a successful run.

This might be true-AI.

so, for me, statistics rocks!!

 

 

 
  [ # 41 ]
Andres Hohendahl - Oct 7, 2013:

On the other hand (short term) lets speak dynnamic-AI, statistics IS the way neural-networks learn patterns, with time

Not all neural networks. There exist different types of memory, and artificial neural networks use only one type of learning, which is the slow, statistical type. Declarative memory, which is used for instantaneous binding of inputted facts, the type of facts you memorize when studying for a test, does not use statistical learning but rather instantaneous binding done in a single “trial”. (I might post a thread that covers the binding problem one of these days, but I’ve been too busy.) This is one of the things that AI researchers and Google in particular are overlooking, and that is partly why I’m focusing on that area.

Declarative memory is fast, specialized in learning things quickly. It makes connections among different stimuli, helping us model the world around us: what things are, how they work, what events we have personally observed or participated in.

http://www.dana.org/news/brainhealth/detail.aspx?id=10020

By the way, part of the confusion might come from the fact that I haven’t yet concluded my thread; my most recent post was not my conclusion yet. I plan to write 1-2 more posts before I get to my summary & conclusion, and in those I will discuss a lot more about what you mentioned.

(p. 26)
  I thought the field would move on to more realistic
networks, but it didn’t. Because these simple neural networks
were able to do interesting things, research seemed to stop right
there, for years
. They had found a new and interesting tool, and
overnight thousands of scientists, engineers, and students were
getting grants, earning PhDs, and writing books about neural
networks. Companies were formed to use neural networks to
predict the stock market, process loan applications, verify sig-
natures, and perform hundreds of other pattern classification
applications. Although the intent of the founders of the field
might have been more general, the field became dominated by
people who weren’t interested in understanding how the brain
works, or understanding what intelligence is
.

(p. 39)
  Connectionists intuitively felt the brain wasn’t a computer
and that its secrets lie in how neurons behave when connected
together. That was a good start, but the field barely moved on
from its early successes. Although thousands of people worked
on three-layer networks, and many still do, research on corti-
cally realistic networks was, and remains, rare
.

(“On Intelligence”, Jeff Hawkins with Sandra Blakeslee, 2004)

(p. 125)
  Hill climbing is a general optimization method loosely modeled on evolution: start
at some random point and take small steps uphill on the fitness landscape until a
peak is reached. This will generally not be the tallest peak but is likely to be quite a
bit higher than a random point. Intuitively, this happens for (at least) two reasons.
One is that hill climbing allows sustained progress toward optimality (for a while),
which rapidly produces a solution much better than almost all random solutions. The
other is that hill climbing assigns credit: each small change is evaluated as to whether
it works in the context of the entire system, so the whole system painstakingly settles
into a configuration where its elements are cooperating.
  Back-propagation is a sophisticated hill-climbing method used to train neural nets.
It suffices for generalizing on interesting practical problems but may not be useful for
producing human-scale thought
.

(“What Is Thought?”, Eric B. Baum, 2004)

(p. 4)
From the above discussion, it is apparent that a neural network derives its computing
power through, first, its massively parallel distributed structure and, second, its ability to
learn and therefore generalize; generalization refers to the neural network producing
reasonable outputs for inputs not encountered during training (learning). These two infor-
mation-processing capabilities make it possible for neural networks to solve complex
(large-scale) problems that are currently intractable. In practice, however, neural networks
cannot provide the solution working by themselves alone. Rather, they need to be integrated
into a consistent system engineering approach
. Specifically, a complex problem of interest
is decomposed into a number of relatively simple tasks, and neural networks are assigned
a subset of the tasks (e.g., pattern recognition, associative memory, control) that match
their inherent capabilities. It is important to recognize, however, that we have a long way
to go (if ever) before we can build a computer architecture that mimics a human brain
.

(“Neural Networks: A Comprehensive Foundation”, Simon Haykin, 1994)

 

 

 
  [ # 42 ]
Merlin - Sep 12, 2013:

Some items you may find interesting as you think about shapes.
...
Some of your discussion reminds me of the issues we hit when trying to digitize and render fonts.
There are a number of representations (bitmap, line&arc;, raster, Bézier curves, Etc.).  Finding compact, easy to manipulate representations was a constant area of research for many years.

Thanks. I finally had time to look at your links. The kind of thing I’d like to find that I suspect already exists is an article in the field in pattern recognition, probably in some conference procedings, probably in an article written by a math Ph.D. who proved some theorem about some numerical relationship within shapes that he discovered, a theorem that mathematically relates features of images. For example, a formula that relates one or more of perimeter length, area, angles between peaks or valleys, and curvature. Such a formula would fall exactly into the subfield I would describe as “algebraic formology”. Such a formula is something that brains *might* be able to discover either during evolution or during environmental learning early in life, and would allow them to very usefully categorize and map shapes without too much ambiguity, overlap, conflict, or uncertainty. Just one such formula would be a terrific start in the field of formology and would prove that my intuition was on the right track about that being a productive field of inquiry.

Not only fonts, but maps: makers of digitized maps, like for Google maps, “cheat” by using arcs of circles to represent any kind of bend in a road. Similarly, engineers “cheat” by using splines to approximate curves of unknown mathematical description. The brain loves to cheat since it needs fast shortcuts in order to compete with other organisms, so such solutions were probably discovered by our brains. That is in contrast to a non-cheating mathematical solution that is *exact* in its description, the direction I proposed called formology. I plan to discuss this more in a later post.

 

 
  [ # 43 ]

Ok, lets see the next posts!

Just for shaking the mind (yours.. Mark) you should read about Skousen’s Analogical Modelling,.

This is an interesting point of view, and can describe an mimic complex linguisics behvior based upon small-number-of-examples.

Its based upon something called Quantum-Analogical-Modelling, which mimics something like the wholy grail of calculations, and uses concepts like minimized entropy and energy to extract some answers. (they use other words, like supracontext, infracontext, blah.. blah!) but its the same, its just complexity gain/loss.

smile cheers!

 

 
  [ # 44 ]

so, for me, statistics rocks!!

I’d agree with that.

 

 
  [ # 45 ]
Jan Bogaerts - Oct 10, 2013:

so, for me, statistics rocks!!

I’d agree with that.

Jan, just an addenda..

I mean not any statistics..
I do mean really smart statistics..

In fact, stats is nothing else than a memory-based observation of the data, making deductions upon medians, deviations, and some other number-massage stuff. Interpreted by a non-linear action (energy, remember..? its quadratic..) So all is based upon accumulating the good stuff in some kinda reduced-processed memory.. which resembles.. experience!

In fact experience seems to be smart data reduction…

In behalf of this, what we must do, is give machines the capability to get experience!
and we need o teach them well, this means incorporate our experience into the systems..
.. in a smart (aka. compact) way!

To say it best:

smart statistics.. rock!

smile

 

 

 < 1 2 3
3 of 3
 
  login or register to react