Genetic Algorithm for Evolution of Neural Network Configuration

In this study, each individual in the population represents a possible configuration for a neural network. Each individual has four components describing different aspects of the configuration: number of hidden nodes; learning rate from input to hidden layer; learning rate from hidden to output layer; and momentum term. A real valued representation was used to describe these components. Each component was assigned maximum and minimum values, based on previous experiments and experience with the dataset; these are detailed in Table 1. To calculate the fitness of a member of the population, a neural network was built based on the information encoded within that individual. This was then trained for 300 epochs and tested on the wavelet-compressed data. The root mean squared error in prediction (RMSEP), calculated using leave-one-out cross-validation, was used as the fitness measure for each individual.For this work, a parallel island model was used [18], where each island had a fixed population size of 50 and 19 islands were used. The parallel island model was used so that analysis could be distributed across multiple computers but is not strictly necessary for the working of the technique employed here. Our breeding strategy involves elitism, crossover, mutation and migration as follows.
• Elitism: The top two fittest individuals from each population are copied without mutation into next generation, to ensure a steady progression in fitness.
• Crossover: The preceding population was sorted by descending fitness and two randomly-selected individuals from the top 10 are crossed over. This was repeated until a new population was produced.
• Mutation: The mutation rate was set at 10%, i.e. one mutation in every 10 individuals. This high level of mutation is used to counter-balance the combined effects of using elitism and restricting crossovers to component boundaries.
• Migration: The migration rate between islands was set at 5%, i.e. every generation there was a 5% probability of incorporating the best individual from a randomly chosen island population (A) into a population (B) overwriting the worst individual in population (B). The initial weights in the neural networks were set randomly. They were initially set using the Nguyen and Widrow algorithm ; however, this was not found to improve the predictive power of the network or decrease the training time.