```
```

```
```

# Section 1: Multiple choice

Answer all 20 multiple-choice questions on the multiple-choice answersheet. 2 marks each. There is no negative marking. Total 40 marks.

Only the filled-in boxes will be looked at. No additional commentary or analysis will be read.

```
```
1. One of these is known to be untrue:
1. Computers will never be better than humans at football.
2. AI (the construction of intelligent machines) is impossible.
3. AI is possible.
4. In the future, humans will live forever on earth in artificial bodies, regularly backing up their brains in order to cheat death.
5. In the future, immortal humans will travel to distant stars.
6. Computers will never be better than humans at chess.
7. In the future, intelligent machines will exterminate the human race.

2. Consider the Travelling salesman problem with n cities. We do not consider paths that visit the same city twice. The number of possible paths is:
1. n!
2. (n-1)!
3. 2n
4. n2
5. 2(n-1)
6. nn

3. We are learning from exemplars. We experience 3 exemplars:
When input is (x1) output should be (a1).
When input is (x2) output should be (a1).
When input is (x1) output should be (a2).
How can this happen?
1. The 2nd exemplar can happen but not the 3rd.
2. There are missing inputs, or the world is probabilistic.
3. This cannot happen.
4. The 3rd exemplar can happen but not the 2nd.
5. This always happens.
6. The machine is badly programmed.

4. We do machine evolution as follows:

Repeat:
Test all 100 for fitness.
Pick the 10 best.
Make 10 copies of each to construct the next generation.
Loop again.

The problem with this is:

1. It will end up focused on one solution.
2. No new solution is ever made.
3. 10 is too small.
4. We lose the best solutions after finding them.
5. 100 is too small.
6. There is no problem.
7. Too many solutions reproduce.
8. We should pick according to probabilities rather than an absolute number.

5. A perceptron has 2 inputs, taking values 0 or 1, and 1 output. Its 2 weights are 0 and 1. The output threshold is -0.5. This perceptron implements:
1. OR
2. Constant output 0.
3. XOR
4. NOT
5. AND
6. None of the above.
7. Constant output 1.

6. In machine evolution, we have 4 individuals, where individual i has fitness f(i):
` i     1   2   3   4 `
` f(i)  0  -1  -2  -1 `
If we use a "Soft max" probability distribution with a "temperature" parameter T to choose who reproduces, then as   T -> infinity   the probability of each individual reproducing goes to:

1. ` 1    0     0     0 		`
2. ` 1/4  1/4   1/4   1/4 		`
3. ` 0    1     1     1 			`
4. ` 0    0     1     0 		`
5. ` 0    1/4   1/2   1/4 		`
6. ` 0    1     0     1 			`
7. ` 0    1/2   0     1/2 		`

7. In a multi-layer neural network, we have n hidden units with binary outputs, 1 output unit, weights w1,..,wn connecting them, and threshold t for the output unit. Which one of the following implements an output unit that fires if and only if all the hidden units fire.
1. w1=w2=..=wn=0,   t=1
2. w1=w2=..=wn=W, W > 0,   t=n
3. w1=w2=..=wn=W, W > 0,   t=W
4. w1=w2=..=wn=W, W > 0,   t=nW
5. w1=w2=..=wn=1,   t=0
6. w1=w2=..=wn=W, W > 0,   t=W+n

8. In neural networks, we need thresholds. Otherwise if all inputs are 0, then output of every single hidden node is:
1. 1
2. 1/2
3. infinity
4. 0
5. -infinity

9. We have two exemplars:

Temperature 22 degrees, pressure 1000 mb. Rain.
Temperature 15 degrees, pressure 990 mb. No rain.

Then the question is:

Temperature 17 degrees, pressure 1010 mb. Will there be rain?

We predict by Hamming distance to the nearest exemplar, (a) with pressure in millibars, (b) with pressure in bars. What does it predict?

1. (a) Rain, (b) Rain
2. (a) Rain, (b) No rain
3. (a) No rain, (b) No rain
4. (a) No rain, (b) Rain

10. Given a line in 2-dimensional space:
y = a x + b
Define a perceptron that receives x and y as input and fires if and only if the point is on this side of the line:
y ≤ a x + b
1. weights: -a, 1, threshold: b
2. weights: a, -1, threshold: b
3. weights: -a, 1, threshold: -b
4. weights: a, -1, threshold: -b

11. An upper bound on the state-space complexity of Xs and Os is:
1. 9!
2. 93
3. 33
4. 99
5. 3!
6. 39

12. One of these is not a sceptic of AI:
1. Searle
2. Brooks
3. Edelman
4. Godel
5. Penrose
6. Dreyfus
7. Eccles

13. Human mate choice is a heuristic because:
1. Humans do not implicitly store all their memories.
2. Humans cannot search the entire search space.
3. Humans cannot search the partial search space.
4. Humans do not run algorithms.
5. Humans do not operate according to logic.
6. Humans are not machines.
7. Humans cannot compare different solutions in the search space.
8. Humans do not explicitly store all their memories.

14. In traversing a search tree, nodes that are placed on the tree to the left are searched before nodes placed to the right. Does the ordering of the nodes introduce a bias?
1. Yes. But it is never a problem.
2. Yes. But it is alright because all nodes will be searched eventually.
3. Yes. But it is rarely a problem.
4. Best-first search will eliminate all bias.
5. Yes. But a heuristic can be used to eliminate all bias.
6. No.
7. Yes. But a heuristic can be used to produce a bias that is better than random.
8. Not if done right.

15. In machine evolution with asexual reproduction, mutation might break the parent and make it unfit. So therefore:
1. We should only mutate about 1 in a million bits.
2. We should set an appropriate mutation rate for the problem.
3. We should only mutate at points that will improve the parent.
4. We should mutate the unfit, not the fit.
5. We should not mutate.
6. The original statement is not true.
7. We should mutate the fit, not the unfit.
8. It is not something to worry about.

16. The set of inequalities that define a perceptron to implement NOT are:
1. t ≤ w < 0
2. w ≤ t < 0
3. w < t ≤ 0
4. 0 ≤ w < t
5. 0 < t ≤ w
6. t < w ≤ 0
7. 0 ≤ t < w
8. 0 < w ≤ t

17. In machine evolution, our chromosome is a binary string of length n. We interpret it as encoding an integer x. To convert this to encoding a real number in the interval 0 to 1, we calculate:
1. x/(2(n-1)-1)
2. x/(2(n-1))
3. x/(2n)
4. x/(2n-1)

18. In machine evolution, we have 4 individuals, where individual i has fitness f(i):
` i     1   2   3   4  `
` f(i)  0  -1  -2  -1 `
If we use a "Soft max" probability distribution with a "temperature" parameter T to choose who reproduces, then as   T -> 0   the probability of each individual reproducing goes to:

1. ` 1/4  1/4   1/4   1/4 		`
2. ` 0    1/2   0     1/2 		`
3. ` 1    0     0     0 		`
4. ` 0    1/4   1/2   1/4 		`
5. ` 0    1     1     1 			`
6. ` 0    0     1     0 		`
7. ` 0    1     0     1 			`

19. Given a line in 2-dimensional space:
a x + b y = c
Define a perceptron that receives x and y as input and fires if and only if the point is on this side of the line:
a x + b y ≤ c
1. weights: a, b, threshold: -c
2. weights: a, b, threshold: c
3. weights: -a, -b, threshold: -c
4. weights: -a, -b, threshold: c

20. Why would we in practice not learn the function y = sin(x) + (1/x) from exemplars?
1. We would learn it from exemplars.
2. There is nothing to learn. The problem is solved.
3. Learning it from exemplars is impossible.
4. Divide by zero error.
5. It would take too long.
6. It is a chaotic function.
7. Learning it from exemplars is impossible for some values of x.

```
```

# Section 2: Long questions

Total 60 marks.

Answer 3 of the 5 long questions. 20 marks each. Total 60 marks.

```
```
 Question 1 [Total marks: 20]

 1(a) Imagine a heuristic search to maximise a function f(x) by trial and error. The heuristic repeatedly proposes a value for x and receives a value for f(x). Explain with a graph how no heuristic that works like this can guarantee to find the global optimum. Does that mean all heuristics are the same? [4 marks]

 1(b) Is it hard to learn the function: ``` f(x) = sin(x) ``` from exemplars? Is it hard to learn the function: ``` f(x) = 5x + 7 ``` from exemplars? Is it hard to learn the function: ``` f(x) = cos(1/x) ``` from exemplars? [6 marks]

 1(c) Invent a heuristic for either car buying or job hunting. Explain how it may fail. [4 marks]

 1(d) Consider searching a "search tree": When a problem is represented as a State space Search problem, a "search tree" is constructed. Explain. In traversing a search tree, does not the order of the search tree introduce a bias? (For example, nodes that are placed towards the left-hand side are searched first.) [6 marks]

 Question 2 [Total marks: 20]

 2(a) Chess has a branching factor of about 35. The average length of a chess game is about 80 plies. Explain what these terms mean. Why do they mean that exhaustive search of chess is impossible? If exhaustive search of the future moves of a chess game is impossible, what do chess programs do instead? [3 marks]

 2(b) Consider breadth-first search: Breadth-first search is a form of uninformed search. What does this mean? Explain with a diagram how breadth-first search works. Discuss the properties, advantages and disadvantages of breadth-first search. If the branching factor is large, the breadth-first search "OPEN list" may run out of memory. But if the number of states is so large, the statespace graph probably cannot be drawn or represented in memory. So is it not the case that depth-first search (or any other search) will run out of memory too? Explain the answer. [8 marks]

 2(c) Consider how heuristic search picks the next state to examine: If you always pick the immediate next state with the highest heuristic evaluation, explain what type of search you are doing. Are there any limitations of this search? Give an example in some problem not mentioned on this course of where one might need to pick a state that scores lower according to the heuristic evaluation function. Best-first search picks the state that scores highest according to the heuristic evaluation function. Explain how best-first search, though, is different to simple hill-climbing. [3 marks]

 2(d) Consider using Best-first search: Imagine the following situation in Best-first search. The heuristic evaluation function tells us a state is 4 steps away from the goal. We examine the children of this state and the heuristic evaluation function tells us all of them are 5 steps away from the goal. Can this happen? Imagine the following situation. We go down a path where the heuristic evaluation function tells us we are 2 steps away from the goal - and no matter how far down this path we go, the heuristic evaluation function still tells us we are 2 steps away from the goal. Can this happen? Why is the second situation above worse than the first situation? [6 marks]

 Question 3 [Total marks: 20]

 3(a) Explain the concept of a genetic algorithm (GA). List what needs to be defined precisely in order to code the algorithm. Why can a GA not guarantee to find the global maximum? A GA is search by "a population of hill-climbers". Explain. Why is this better than / different to search by a single hill-climber? What would happen if you had a variable (rather than a constant) population size in a genetic algorithm? [5 marks]

 3(b) In a genetic algorithm (GA), let the probability of an individual being picked for reproduction be: ```p = my fitness --------------------------------- sum of fitnesses of all creatures ``` Fitness cannot be both positive and negative. Why? Fitness cannot be zero. Why? Given fitnesses all above zero, what might still be wrong with this scheme? [3 marks]

 3(c) In a genetic algorithm (GA), let pc be the probability of crossover. What would happen if pc = 1? What would happen if pc = 0, but you still had mutation? What would happen if pc = 0 and there was no mutation? [6 marks]

 3(d) In a genetic algorithm (GA), let each individual   i   have fitness f(i), and let its probability of reproducing, p(i), be: where T is the "temperature". Consider the individuals: ```i 1 2 3 4 f(i) 0 0 0 1 ``` Calculate the values p(i) if T=1. Calculate the values p(i) if T=10. Show that as T -> infinity, the individuals are all equally likely to reproduce. [6 marks]

 Question 4 [Total marks: 20]

 4(a) Consider the character recognition problem for handwriting. Discuss some of the problems to be solved in handwriting recognition. When a character is isolated, there may be a problem with missing inputs when we try to recognise it. Explain. [4 marks]

 4(b) You are using the Hamming distance as a simple distance metric to recognise faces. Inputs are images of 10 pixels by 10 pixels. Pixels are black or white. Explain the Hamming distance metric. What do input vectors look like? Show an example input vector. How exactly could we use a distance metric to recognise faces? [8 marks]

 4(c) Given exemplars: ```Temperature 22 degrees. No rain. Temperature 15 degrees. Rain. ``` Answer the following questions. Can you answer the question: ```Temperature 17 degrees. Will there be rain? ``` Explain your answer. Say we gathered more weather data. Let the exemplars be: ```Temperature 22 degrees, pressure 1000 mb. No rain. Temperature 15 degrees, pressure 990 mb. Rain. ``` Then the question is: ```Temperature 17 degrees, pressure 1010 mb. Will there be rain? ``` Explain your answer. What would the Hamming distance predict for the previous? If we use units of 1000 mb (units of K mb) to measure pressure, so that, e.g., 1010 mb is written as 1.010 K mb, then will there be rain? What would the Hamming distance predict? [4 marks]

 4(d) In a Neural Network, search can make a directed move. How come? In a Genetic Algorithm search can only make a random move. Why? Why, in both Neural Nets and Genetic Algorithms, can we never have a method that is guaranteed to find the global minimum (or maximum)? [4 marks]

 Question 5 [Total marks: 20]

 5(a) In neural networks, why is it not a constraint to say that all networks are fully connected? Derive the set of inequalities that define a Perceptron for OR. Supply an example of weights and thresholds that will satisfy these inequalities. If the threshold has to be set to 5 for some reason, derive the weights that will implement OR. [6 marks]

 5(b) Define the 2 perceptrons that fire for the 2 sides of the line: ``` y = a x + b ``` [3 marks]

 5(c) In the following multi-layer network, the threshold values are circled. The weights are listed along each link. The input units take values 0 or 1. Demonstrate the output that is generated for each possible input. What function does this network implement? If the -2 weight was changed to +2, what function would this network implement? If the -2 weight was changed to +2, and the output unit threshold to 1.5, what function would this network implement? If the -2 weight was changed to +2, and the output unit threshold to 4.5, what function would this network implement? [5 marks]

 5(d) Using the notation on this course, here is the backprop rule for the output layer of a multi-layer network: Calculate all the =   yk ( 1 - yk ) ( yk - Ok ) Calculate all the = yj For all j,k: What happens if all weights in the output layer start off very large? What happens if all weights in the output layer start off at zero? How should we initialise weights? [6 marks]

```

```

# CA686 - 2020

### Black out the square that matches the correct answer.

```
```
Exam No:_____________________

Class:_____________________

Seat No:_____________________

```
```

 Question 1. 2. 3. 4. 5. 6. 7. 8. 1 2 3 4 5 6 7 8 9 10
 Question 1. 2. 3. 4. 5. 6. 7. 8. 11 12 13 14 15 16 17 18 19 20