3.9 Simple and Complicated Data

Note: This is an excerpt from a draft of my thesis, A Computer Model of National Behavior. The introduction and table of contents
are also available

3.9 Simple and Complicated Data

Robert Sargent wrote in 1997, “it is usually difficult, time consuming, and costly to obtain sufficient, accurate, and appropriate data.” Later in this thesis the data for this model, shown as attributes of entities, will be described. This “simple” or “objective” data (e.g. population and population density) is very well established and has been tracked by governments in Europe for centuries.

However, some of the attributes are more obscure. Four of the attributes for one entity alone can be described as “complex” or “subjective,” and more attributes of other entities are
likewise.The need for values for these subjective attributes presents a challenge.

This problem is not new. In 1970, Clema and Kirkham reference Geutzkow and note “[empirical] data, to a large extent, is lacking in the political sciences.” Because they are important they have to be accurate, otherwise there would be no sense in including them in the model. However, they are not objective so no value can be known for certain and good arguments might be made for dramatically different values for the same place by different experts. Several different options for getting values for these attributes are listed below. It goes without saying that whatever method is used should not allow picking-and-choosing, but must be coherent and fair.

One approach is to utilize a very small number of experts who agree with each other, so that their values will reflect one world view. It might not matter much which world view is reflected, so long as it is a developed and reasonable one. Different political philosophies might be equally valid given different facts and this approach would reflect that. This approach might be termed Bayesian because as Clema and Kirkham cite Kleinmuntz “[experiments] have shown that human opinions do in fact change when new information is acquired in close accordance with Baye’s theorem and that human opinion change is a quite orderly process.” However, this method’s reliance on subjective data is a significant drawback.

Another possible approach is the Delphi method. In 1990, Roth and Wood noted “A major unresolved issue in the knowledge acquisition literature is the appropriateness of using several experts as knowledgeable sources.” For a research project, Roth and Wood used “a series of questionnaires to aggregate the knowledge, judgments, or opinions of experts in order to address to complex questions.” In other words, knowledgeable people could be polled on what they believe the correct values for attributes are, and their answers would be averaged. By incorporating the opinions of many leading thinkers in a field and giving greatest weight to what is most agreed on, the Delphi method is often successful.

However, for this model neither a small group of agreeing mostly experts nor the Delphi method will be used. Instead, values that can not be directly found will be derived from information that can be.

Orser and Zimmerman use a mathematical approach and make one variable dependent and one variable independent; for example, “…and the formula Y= 15.06(X) + 194.23 for villages that contained over 88 lodges.” A similar tactic will be used for this model. However, given the wealth of European census information compared to century-old Arikara settlements, more than one independent variable may be used for each dependent variable.

These formulae will be determined during the construction of the model. Different values will be tried, and the final result will be the ones that give the most reasonable results. These values will not change during a run of the model because unlike a nation’s intelligence, these formulae effect the nature of the world, not how entities interact with the world.

Computer Science Thesis Index

3.8 Genetic Programming

Note: This is an excerpt from a draft of my thesis, A Computer Model of National Behavior. The introduction and table of contents
are also available

3.8 Genetic Programming

Genetic Programming (GP) is a significant extension of GAs. Gibbs states that GA was developed by Dr. John Koza in 1992 “to evolve entire programs of virtually any size and form.” Though all GP programs (GPPs) can be seen as GAs, Dobbs show GPPs are special in two related ways. GPPs deals with populations of variable-length chromosomes, and every member of the population is itself a program. These are important distinctions, because it allows for much more variation.

The difference between GPPS and GAs can best be understood by examples. Consider breeding the following two C programs:

for (x=0;x<10;x++) { printf(“Hello, world!\n”); }

and

int x;
x = 5;
if (x<5) { printf(“X is less than 5.\n”); }

After simplifying them to the pseudocode, they reduce to.

x := 0
while:

if x < 10 then

goto end

end ifprint “Hello world!”
x := x + 1

goto while
end:

and

x := 10
if x < 5 then

print “X is less than 5”

end if

DNA has rules for how genes can be paired. GP can have similar rules, so that no illegal statements are made, so that the system wastes no time attempting to run a program that will not even compile. However, there’s still a very large number of possible combinations. To continue the example, the genetic pairing of the two parent programs produced the following child. The child shares most of its structure with the first program, but one of the conditions now comes from the second program.

x := 0
while:

if x < 5 then

print “X is less than 5”

end if
print “Hello world!”
x := x + 1

goto while
end:

Then the GP system would test this program to see if it is more or less fit. If the goal is to maximize text output it succeeds, as it will print out an infinite amount of text. If the goal is to type out the letter X as much as possible it also succeeds, because it will do this five times, more than either parent. If the goal is to minimize program execution time it will fail abysmally. The example GP generated program can never end.

Thus, the selection mode is non-tournament, the data manipulated is code, and the chromosomes are variable lengthened, but otherwise GP is merely a subset of GA.

These seemingly small changes from GAs to GP have a huge effects. Instead of changing some values, we are changing the logic itself. This is very powerful, and GP would bring some advantages to the model. Instead of the nations modeled changing their behavior only under the narrow ranges of their attributes they could possess a complex internal logic. GP also allows the developer of the system to be relatively ignorant of how to implement it, because implementation is taken care of by the system itself.

However, GP is not applicable in this case. Two serious faults prevent it from being implemented in the model. GP would both limit the applicability of the model and make the model unintelligible.

For all of its innovation GP is a limiting tool, especially when compared to GAs. Because GAs modify only values, the effects of small changes will tend to be relatively small. Of course the magnitudes of effects will differ, but wildly unpredictable results would indicate a serious problem. However, this is what can be expected under GP. Discussing a relatively simple attempt to produce satellite guidance systems, Gibbs notes that “when [the GP system] was presented with situations it had never encountered, the program failed, a common problem with evolved software.” Among intelligent creatures, actual failure due to inability to adapt is thankfully rare.

GP programs can be very strange and inscrutable, like DNA. Willihnganz warns that evolved software programs are “often convoluted, bizarrely multiplayed creations, nothing like the software a human might write.” One AI researches discovered that GP correctly programmed the interface to an artificial limb – in one line of incomprehensible code. As it would be impossible to explain why something is happening under a GP approach, the model would not be useful.

Computer Science Thesis Index