Creating images with Keras and TensorFlow avid processes

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

The recent announcement of TensorFlow 2.0 names Passionate execution As the number one main feature of the new major version. What does this mean for R users? As shown in our recent post on Neural Machine Translation, you can already implement Eagerness from R, together with the Keras Custom Models and Datasets API. Nice to know you can do Use it – but why should you? And in what cases?

In this and the next few posts, we want to show how eager implementation can greatly simplify models. The degree of simplicity will depend on the task – and only that how much You'll easily find the new method may also depend on your experience using functional APIs to model more complex relationships. Even if you think that GANs, encoder-decoder architectures, or neural neural networks didn't pose a problem before the advent of anxiety-free processes, you'll find that we can imagine the mental problems of humans. For alternative it is better suited.

For this post, we're porting the code from Recent. Google Collaborative Notebook Implementing the DCGAN architecture.(Radford, Metz, and Chintala 2015)
No prior knowledge of GANs is required – we'll keep this post practical (no math) and focus on how to achieve your goal, a simple and clear concept to code wonder. Map as small as possible.

As in the post on machine translation with focus, we need to cover some terms first. By the way, there's no need to copy code snippets – you'll get the complete code. eager_dcgan.R).

Conditions

The code in this post depends on the latest CRAN version of the TensorFlow R packages. You can install these packages as follows:

tfdatasets package for our input pipeline. So we end with the following preamble to set things in order:

that's it. Let's begin.

So what is GAN?

GAN stands for Generative adversarial network(Goodfellow etc. 2014). This is a setup of two agents. The generator And The discriminatorwhich work against each other (thus, Opposite). it is The creator Because the goal is to generate output (as opposed to, say, classification or regression).

In human learning, feedback – direct or indirect – plays a central role. Say we wanted to create a bank note (while they still exist). Assuming we can avoid failed trials, we will get better and better at faking over time. By improving our technique, we will become rich. This concept of feedback optimization is embodied in the first of two agents. The generator. Gets your opinion from. The discriminator, by a reverse method: if it can fool the discriminator, into believing that the note was genuine, all is well. If the discriminator sees a fake, he has to act differently. For a neural network, this means it has to update its weights.

How does the discriminator know what is real and what is fake? It also has to be trained on real notes (or the types of objects included) and fake notes generated by the generator. So the complete setup is a competition between two agents, one trying to produce realistic-looking fakes, and the other, denying the fraud. The goal of training is to develop and improve both, which in turn improves the other.

In this system, there is no minimum objective for the loss function: we want both components to learn and achieve optimality “in lockstep” rather than winning over each other. This makes optimization difficult. So in practice, tuning a GAN seems as much alchemy as science, and it often makes sense to rely on methods and “tricks” reported by others.

In this example, just like in the Google Notebook we're porting, the goal is to generate MNIST digits. While this may not sound like the most exciting task one can imagine, it allows us to focus on the mechanics, and allows us to keep the computation and memory requirements (comparatively) low.

Let's load the data (only the training set is required) and then, look at the first actor of our play, the generator.

Training data

mnist <- dataset_mnist()
c(train_images, train_labels) %<-% mnist$train

train_images <- train_images %>% 
  k_expand_dims() %>%
  k_cast(dtype = "float32")

# normalize images to [-1, 1] because the generator uses tanh activation
train_images <- (train_images - 127.5) / 127.5

Our complete training set will be broadcast once per round:

buffer_size <- 60000
batch_size <- 256
batches_per_epoch <- (buffer_size / batch_size) %>% round()

train_dataset <- tensor_slices_dataset(train_images) %>%
  dataset_shuffle(buffer_size) %>%
  dataset_batch(batch_size)

This input will only be given to the discriminator.

The generator

There are both generators and discriminators. Caras Custom Model. Unlike custom layers, custom models allow you to build models as independent units, complete with custom forward pass logic, backprops and optimizations. The model generator function defines the model layers (self) wants to assign, and returns the function that implements the forward pass.

As we will see shortly, the generator passes random noise vectors for input. This vector is converted to 3d (height, width, channels) and then successively sampled to the desired output size of (28,28,3).

generator <-
  function(name = NULL) {
    keras_model_custom(name = name, function(self) {
      
      self$fc1 <- layer_dense(units = 7 * 7 * 64, use_bias = FALSE)
      self$batchnorm1 <- layer_batch_normalization()
      self$leaky_relu1 <- layer_activation_leaky_relu()
      self$conv1 <-
        layer_conv_2d_transpose(
          filters = 64,
          kernel_size = c(5, 5),
          strides = c(1, 1),
          padding = "same",
          use_bias = FALSE
        )
      self$batchnorm2 <- layer_batch_normalization()
      self$leaky_relu2 <- layer_activation_leaky_relu()
      self$conv2 <-
        layer_conv_2d_transpose(
          filters = 32,
          kernel_size = c(5, 5),
          strides = c(2, 2),
          padding = "same",
          use_bias = FALSE
        )
      self$batchnorm3 <- layer_batch_normalization()
      self$leaky_relu3 <- layer_activation_leaky_relu()
      self$conv3 <-
        layer_conv_2d_transpose(
          filters = 1,
          kernel_size = c(5, 5),
          strides = c(2, 2),
          padding = "same",
          use_bias = FALSE,
          activation = "tanh"
        )
      
      function(inputs, mask = NULL, training = TRUE) {
        self$fc1(inputs) %>%
          self$batchnorm1(training = training) %>%
          self$leaky_relu1() %>%
          k_reshape(shape = c(-1, 7, 7, 64)) %>%
          self$conv1() %>%
          self$batchnorm2(training = training) %>%
          self$leaky_relu2() %>%
          self$conv2() %>%
          self$batchnorm3(training = training) %>%
          self$leaky_relu3() %>%
          self$conv3()
      }
    })
  }

The discriminator

A discriminator is just a pretty simple convolutional network that outputs a score. Here, using “score” instead of “probability” is on purpose: if you look at the last layer, it's fully connected, of size 1 but lacking the usual sigmoid activation. The reason for this is unlike Keras loss_binary_crossentropythe loss function we will use here is − tf$losses$sigmoid_cross_entropy – Works with raw logits, not sigmoid outputs.

discriminator <-
  function(name = NULL) {
    keras_model_custom(name = name, function(self) {
      
      self$conv1 <- layer_conv_2d(
        filters = 64,
        kernel_size = c(5, 5),
        strides = c(2, 2),
        padding = "same"
      )
      self$leaky_relu1 <- layer_activation_leaky_relu()
      self$dropout <- layer_dropout(rate = 0.3)
      self$conv2 <-
        layer_conv_2d(
          filters = 128,
          kernel_size = c(5, 5),
          strides = c(2, 2),
          padding = "same"
        )
      self$leaky_relu2 <- layer_activation_leaky_relu()
      self$flatten <- layer_flatten()
      self$fc1 <- layer_dense(units = 1)
      
      function(inputs, mask = NULL, training = TRUE) {
        inputs %>% self$conv1() %>%
          self$leaky_relu1() %>%
          self$dropout(training = training) %>%
          self$conv2() %>%
          self$leaky_relu2() %>%
          self$flatten() %>%
          self$fc1()
      }
    })
  }

Making the scene

Before we can start training, we need to create the usual components of a deep learning setup: the model (or models, in this case), the loss function(s), and the optimizer(s).

Model creation is just a function call, with a little extra:

generator <- generator()
discriminator <- discriminator()

# 
generator$call = tf$contrib$eager$defun(generator$call)
discriminator$call = tf$contrib$eager$defun(discriminator$call)

defun Compiles an R function (according to various combinations of argument shapes and non-tensor object values) into a tensor flow graph, and is used to speed up computation. This comes with side effects and potentially unpredictable behavior – please refer to the documentation for details. Here, we were primarily curious about how much of a speedup we see when using it from R – in our example, it resulted in a speedup of 130%.

On the downside. Discrimination loss consists of two parts: whether it correctly identifies genuine images as genuine, and whether it correctly identifies fake images as fake. Here real_output And generated_output Contains the logits returned from the discriminator – i.e. its decision whether the corresponding images are fake or real.

discriminator_loss <- function(real_output, generated_output) {
  real_loss <- tf$losses$sigmoid_cross_entropy(
    multi_class_labels = k_ones_like(real_output),
    logits = real_output)
  generated_loss <- tf$losses$sigmoid_cross_entropy(
    multi_class_labels = k_zeros_like(generated_output),
    logits = generated_output)
  real_loss + generated_loss
}

A generator's disadvantage depends on how the discriminator judges its creations: it will hope to see all of them as genuine.

generator_loss <- function(generated_output) {
  tf$losses$sigmoid_cross_entropy(
    tf$ones_like(generated_output),
    generated_output)
}

Now we need to define an optimizer for each model.

discriminator_optimizer <- tf$train$AdamOptimizer(1e-4)
generator_optimizer <- tf$train$AdamOptimizer(1e-4)

Training loop

There are two models, two loss functions and two optimizers, but only one training loop, because the two models are dependent on each other. The training loop will be on MNIST images streamed in batches, but we still need input to the generator – in this case a random vector of size 100.

Let's go through the training loop step by step. There will be an outer and an inner loop, an over epochs and an over batches. At the beginning of each cycle, we create a fresh iterator on the dataset:

transpose(
  list(gradients_of_generator, generator$variables)
))
discriminator_optimizer$apply_gradients(purrr::transpose(
  list(gradients_of_discriminator, discriminator$variables)
))
      
total_loss_gen <- total_loss_gen + gen_loss
total_loss_disc <- total_loss_disc + disc_loss

This terminates the loop over the batches. End the loop at positions showing current losses and saving some of the generator's artwork:

cat("Time for epoch ", epoch, ": ", Sys.time() - start, "\n")
cat("Generator loss: ", total_loss_gen$numpy() / batches_per_epoch, "\n")
cat("Discriminator loss: ", total_loss_disc$numpy() / batches_per_epoch, "\n\n")
if (epoch %% 10 == 0)
  generate_and_save_images(generator,
                           epoch,
                           random_vector_for_generation)

Here's the training loop again, shown as a whole – even including lines for reporting on progress, it's exceptionally comprehensive, and allows for a quick grasp of what's going on. Is:

train <- function(dataset, epochs, noise_dim) {
  for (epoch in seq_len(num_epochs)) {
    start <- Sys.time()
    total_loss_gen <- 0
    total_loss_disc <- 0
    iter <- make_iterator_one_shot(train_dataset)
    
    until_out_of_range({
      batch <- iterator_get_next(iter)
      noise <- k_random_normal(c(batch_size, noise_dim))
      with(tf$GradientTape() %as% gen_tape, { with(tf$GradientTape() %as% disc_tape, {
        generated_images <- generator(noise)
        disc_real_output <- discriminator(batch, training = TRUE)
        disc_generated_output <-
          discriminator(generated_images, training = TRUE)
        gen_loss <- generator_loss(disc_generated_output)
        disc_loss <-
          discriminator_loss(disc_real_output, disc_generated_output)
      }) })
      
      gradients_of_generator <-
        gen_tape$gradient(gen_loss, generator$variables)
      gradients_of_discriminator <-
        disc_tape$gradient(disc_loss, discriminator$variables)
      
      generator_optimizer$apply_gradients(purrr::transpose(
        list(gradients_of_generator, generator$variables)
      ))
      discriminator_optimizer$apply_gradients(purrr::transpose(
        list(gradients_of_discriminator, discriminator$variables)
      ))
      
      total_loss_gen <- total_loss_gen + gen_loss
      total_loss_disc <- total_loss_disc + disc_loss
      
    })
    
    cat("Time for epoch ", epoch, ": ", Sys.time() - start, "\n")
    cat("Generator loss: ", total_loss_gen$numpy() / batches_per_epoch, "\n")
    cat("Discriminator loss: ", total_loss_disc$numpy() / batches_per_epoch, "\n\n")
    if (epoch %% 10 == 0)
      generate_and_save_images(generator,
                               epoch,
                               random_vector_for_generation)
    
  }
}

Here is the function to save the created images…

generate_and_save_images <- function(model, epoch, test_input) {
  predictions <- model(test_input, training = FALSE)
  png(paste0("images_epoch_", epoch, ".png"))
  par(mfcol = c(5, 5))
  par(mar = c(0.5, 0.5, 0.5, 0.5),
      xaxs = 'i',
      yaxs = 'i')
  for (i in 1:25) {
    img <- predictions[i, , , 1]
    img <- t(apply(img, 2, rev))
    image(
      1:28,
      1:28,
      img * 127.5 + 127.5,
      col = gray((0:255) / 255),
      xaxt = 'n',
      yaxt = 'n'
    )
  }
  dev.off()
}

… and we are ready to go!

num_epochs <- 150
train(train_dataset, num_epochs, noise_dim)

Results

Here are some generated images after training 150 positions:

images epoch 150

As they say, your results will definitely vary!

Result

While tuning GANs will certainly remain a challenge, we hope that we have been able to demonstrate that mapping concepts to code is not difficult when using an eager implementation. If you've played with GANs before, you may have noticed that you need to pay careful attention to setting the losses correctly, freezing the discriminator weights when needed, etc. . In upcoming posts, we'll show more examples where its use simplifies model development.

Goodfellow, Ian J., Jean Puget-Ebadi, Mehdi Mirza, Bing Xu, David Ward-Farley, Sherjeel Ozier, Aaron C. Courville, and Yoshua Benjiu. 2014. “Generative Adversarial Nets.” I Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada2672–80. http://papers.nips.cc/paper/5423-generative-adversarial-nets.

Radford, Alec, Luke Metz, and Sumith Chantala. 2015. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” CoRR abs/1511.06434. http://arxiv.org/abs/1511.06434.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment