Impulsive execution and neural style transfer with Keras -

WhatsApp Group Join Now

Telegram Group Join Now

Instagram Group Join Now

What would your summer vacation pictures look like if they were painted by Edvard Munch? (Maybe it's better not to know). Let's take a more casual example: What would a great, concise river scene look like if it were painted by Kitsushika Hokusai?

The transfer of style to photographs is nothing new, but it was promoted by Gettys, Ecker and Bethge(Gatys, Ecker, and Bethge 2015) Shows how to do it successfully with deep learning. The main idea is straightforward: create a hybrid that trades between Image of content We want to manipulate, and a Style image We want to replicate, while at the same time optimizing the similarity between the two.

If you have read the chapter on neural style transfer. Deep learning with R, you may recognize some of the code snippets that follow. However, there is one important difference: this post uses TensorFlow. If execution, allows for an integral coding method that makes it easy to map concepts to code. Just like the hobbyist process of previous posts on this blog, this is a port. Google Collaborative Notebook which performs the same function in Python.

As always, please make sure you have the required package versions installed. And there's no need to copy snippets – you'll find the complete code in between. Examples of Keras..

Conditions

The code in this post depends on recent versions of several TensorFlow R packages. You can install these packages as follows:

install.packages(c("tensorflow", "keras", "tfdatasets"))

library(tensorflow)
install_tensorflow()

# If you have enough memory on your GPU, no need to load the images
# at such small size.
# This is the size I found working for a 4G GPU.
img_shape <- c(128, 128, 3)

content_path <- "isar.jpg"

content_image <-  image_load(content_path, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

isar

And this is the style model, Hokusai's The Great Wave of KanagawaFrom which you can download. Wikimedia Commons:

style_path <- "The_Great_Wave_off_Kanagawa.jpg"

style_image <-  image_load(content_path, target_size = img_shape[1:2])
style_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

The Great Wave off Kanagawa

We create a wrapper that loads and preprocesses the input images for us. As we will be working with VGG19, a network trained on ImageNet, we need to transform our input images to the same ones used in its training. Later, we'll apply an inverse transformation before displaying our composite image.

load_and_preprocess_image <- function(path) {
  img <- image_load(path, target_size = img_shape[1:2]) %>%
    image_to_array() %>%
    k_expand_dims(axis = 1) %>%
    imagenet_preprocess_input()
}

deprocess_image <- function(x) {
  x <- x[1, , ,]
  # Remove zero-center by mean pixel
  x[, , 1] <- x[, , 1] + 103.939
  x[, , 2] <- x[, , 2] + 116.779
  x[, , 3] <- x[, , 3] + 123.68
  # 'BGR'->'RGB'
  x <- x[, , c(3, 2, 1)]
  x[x > 255] <- 255
  x[x < 0] <- 0
  x[] <- as.integer(x) / 255
  x
}

Making the scene

We're going to use a neural network, but we won't train it. Transfer neural style is a bit unusual because we don't optimize the weights of the network, but damage the input layer (pictured) to move it in the desired direction.

We will be interested in two types of output from the network, corresponding to our two objectives. First, we want to place the composite image at the same level as the content image. In a concatenation, the top layer maps more complex concepts, so we're moving a layer higher in the graph to compare the output from the source and combination.

Second, the generated image must “look” like the style image. A style corresponds to low-level properties such as textures, shapes, strokes… So to compare a collection with a style instance, we select a set of low-level kino blocks for comparison and aggregate the results.

content_layers <- c("block5_conv2")
style_layers <- c("block1_conv1",
                 "block2_conv1",
                 "block3_conv1",
                 "block4_conv1",
                 "block5_conv1")

num_content_layers <- length(content_layers)
num_style_layers <- length(style_layers)

get_model <- function() {
  vgg <- application_vgg19(include_top = FALSE, weights = "imagenet")
  vgg$trainable <- FALSE
  style_outputs <- map(style_layers, function(layer) vgg$get_layer(layer)$output)
  content_outputs <- map(content_layers, function(layer) vgg$get_layer(layer)$output)
  model_outputs <- c(style_outputs, content_outputs)
  keras_model(vgg$input, model_outputs)
}

Disadvantages

When optimizing the input image, we will consider three types of loss. First of all, Loss of content: How different is the collection image from the source? Here, we are using sum of squared errors for comparison.

content_loss <- function(content_image, target) {
  k_sum(k_square(target - content_image))
}

Our second concern is that the styles match as closely as possible. The style is usually worked out as Gram matrix of flat feature maps in one layer. Thus we assume that style is related to how the maps in one layer are connected to another.

We therefore calculate the Gram matrix of the source image as well as the layers we are interested in (defined above) for the correction candidate, and again compare them using the sum of the squared errors. .

gram_matrix <- function(x) {
  features <- k_batch_flatten(k_permute_dimensions(x, c(3, 1, 2)))
  gram <- k_dot(features, k_transpose(features))
  gram
}

style_loss <- function(gram_target, combination) {
  gram_comb <- gram_matrix(combination)
  k_sum(k_square(gram_target - gram_comb)) /
    (4 * (img_shape[3] ^ 2) * (img_shape[1] * img_shape[2]) ^ 2)
}

Third, we don't want the composite image to look overly pixelated, so we're adding a regularization component, the total variance to the image:

total_variation_loss <- function(image) {
  y_ij  <- image[1:(img_shape[1] - 1L), 1:(img_shape[2] - 1L),]
  y_i1j <- image[2:(img_shape[1]), 1:(img_shape[2] - 1L),]
  y_ij1 <- image[1:(img_shape[1] - 1L), 2:(img_shape[2]),]
  a <- k_square(y_ij - y_i1j)
  b <- k_square(y_ij - y_ij1)
  k_sum(k_pow(a + b, 1.25))
}

The difficult part is how to reconcile these losses. We've reached acceptable results with the following weights, but feel free to play around with the fit:

content_weight <- 100
style_weight <- 0.8
total_variation_weight <- 0.01

Get model output for content and style images.

We need the output of the model for the content and style images, but here it is enough to do it only once. We concatenate the two images with batch dimensions, pass this input to the model, and get back an output list, where each element of the list is a 4-d tensor. For the style image, we are interested in the style output at batch position 1, while for the content image, we need the content output at batch position 2.

In the comments below, please note that dimensions 2 and 3 will have different sizes if you are loading images in different sizes.

get_feature_representations <-
  function(model, content_path, style_path) {
    
    # dim == (1, 128, 128, 3)
    style_image <-
      load_and_process_image(style_path) %>% k_cast("float32")
    # dim == (1, 128, 128, 3)
    content_image <-
      load_and_process_image(content_path) %>% k_cast("float32")
    # dim == (2, 128, 128, 3)
    stack_images <- k_concatenate(list(style_image, content_image), axis = 1)
    
    # length(model_outputs) == 6
    # dim(model_outputs[[1]]) = (2, 128, 128, 64)
    # dim(model_outputs[[6]]) = (2, 8, 8, 512)
    model_outputs <- model(stack_images)
    
    style_features <- 
      model_outputs[1:num_style_layers] %>%
      map(function(batch) batch[1, , , ])
    content_features <- 
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)] %>%
      map(function(batch) batch[2, , , ])
    
    list(style_features, content_features)
  }

Calculation of damages

At each iteration, we need to pass the composite image through the model, get style and content results, and calculate losses. Again, the code is commented out with the size of the mass tensor for easy verification, but please keep in mind that the exact numbers assume you're working with 128×128 images.

compute_loss <-
  function(model, loss_weights, init_image, gram_style_features, content_features) {
    
    c(style_weight, content_weight) %<-% loss_weights
    model_outputs <- model(init_image)
    style_output_features <- model_outputs[1:num_style_layers]
    content_output_features <-
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)]
    
    # style loss
    weight_per_style_layer <- 1 / num_style_layers
    style_score <- 0
    # dim(style_zip[[5]][[1]]) == (512, 512)
    style_zip <- transpose(list(gram_style_features, style_output_features))
    for (l in 1:length(style_zip)) {
      # for l == 1:
      # dim(target_style) == (64, 64)
      # dim(comb_style) == (1, 128, 128, 64)
      c(target_style, comb_style) %<-% style_zip[[l]]
      style_score <- style_score + weight_per_style_layer * 
        style_loss(target_style, comb_style[1, , , ])
    }
    
    # content loss
    weight_per_content_layer <- 1 / num_content_layers
    content_score <- 0
    content_zip <- transpose(list(content_features, content_output_features))
    for (l in 1:length(content_zip)) {
      # dim(comb_content) ==  (1, 8, 8, 512)
      # dim(target_content) == (8, 8, 512)
      c(target_content, comb_content) %<-% content_zip[[l]]
      content_score <- content_score + weight_per_content_layer *
        content_loss(comb_content[1, , , ], target_content)
    }
    
    # total variation loss
    variation_loss <- total_variation_loss(init_image[1, , ,])
    
    style_score <- style_score * style_weight
    content_score <- content_score * content_weight
    variation_score <- variation_loss * total_variation_weight
    
    loss <- style_score + content_score + variation_score
    list(loss, style_score, content_score, variation_score)
  }

Calculating gradients

As soon as we have the losses, it is simply a matter of calling the cumulative loss gradient with respect to the input image. tape$gradient On GradientTape. Note that the nested call to compute_lossand so our combination takes place within the call to the model on the image, . GradientTape idea, context.

compute_grads <- 
  function(model, loss_weights, init_image, gram_style_features, content_features) {
    with(tf$GradientTape() %as% tape, {
      scores <-
        compute_loss(model,
                     loss_weights,
                     init_image,
                     gram_style_features,
                     content_features)
    })
    total_loss <- scores[[1]]
    list(tape$gradient(total_loss, init_image), scores)
  }

The training phase

Now is the time to train! While the natural continuation of this sentence would be “… model”, the model we're training here is not VGG19 (which we're just using as a tool), but just a minimal setup:

a Variable Who has the image to improve us.
The loss functions we described above.
An optimizer that will apply the computed gradient to the image variable (tf$train$AdamOptimizer)

Below, we get the style attribute (of the style image) and the content attribute (of the content image) only once, then iterate over the optimization process, saving the output every 100 iterations.

Unlike the original article and Deep learning with R book, but instead following Google Notebook, we are not using L-BFGS for optimization, but Adam, since our goal here is to provide a brief introduction to eager implementations. However, you can plug in another optimization method instead if you want.
optimizer$apply_gradients(list(tuple(grads, init_image)))
by an algorithm of your choice (and of course, assigning the optimization result to Variable capturing the image).

run_style_transfer <- function(content_path, style_path) {
  model <- get_model()
  walk(model$layers, function(layer) layer$trainable = FALSE)
  
  c(style_features, content_features) %<-% 
    get_feature_representations(model, content_path, style_path)
  # dim(gram_style_features[[1]]) == (64, 64)
  gram_style_features <- map(style_features, function(feature) gram_matrix(feature))
  
  init_image <- load_and_process_image(content_path)
  init_image <- tf$contrib$eager$Variable(init_image, dtype = "float32")
  
  optimizer <- tf$train$AdamOptimizer(learning_rate = 1,
                                      beta1 = 0.99,
                                      epsilon = 1e-1)
  
  c(best_loss, best_image) %<-% list(Inf, NULL)
  loss_weights <- list(style_weight, content_weight)
  
  start_time <- Sys.time()
  global_start <- Sys.time()
  
  norm_means <- c(103.939, 116.779, 123.68)
  min_vals <- -norm_means
  max_vals <- 255 - norm_means
  
  for (i in seq_len(num_iterations)) {
    # dim(grads) == (1, 128, 128, 3)
    c(grads, all_losses) %<-% compute_grads(model,
                                            loss_weights,
                                            init_image,
                                            gram_style_features,
                                            content_features)
    c(loss, style_score, content_score, variation_score) %<-% all_losses
    optimizer$apply_gradients(list(tuple(grads, init_image)))
    clipped <- tf$clip_by_value(init_image, min_vals, max_vals)
    init_image$assign(clipped)
    
    end_time <- Sys.time()
    
    if (k_cast_to_floatx(loss) < best_loss) {
      best_loss <- k_cast_to_floatx(loss)
      best_image <- init_image
    }
    
    if (i %% 50 == 0) {
      glue("Iteration: {i}") %>% print()
      glue(
        "Total loss: {k_cast_to_floatx(loss)},
        style loss: {k_cast_to_floatx(style_score)},
        content loss: {k_cast_to_floatx(content_score)},
        total variation loss: {k_cast_to_floatx(variation_score)},
        time for 1 iteration: {(Sys.time() - start_time) %>% round(2)}"
      ) %>% print()
      
      if (i %% 100 == 0) {
        png(paste0("style_epoch_", i, ".png"))
        plot_image <- best_image$numpy()
        plot_image <- deprocess_image(plot_image)
        plot(as.raster(plot_image), main = glue("Iteration {i}"))
        dev.off()
      }
    }
  }
  
  glue("Total time: {Sys.time() - global_start} seconds") %>% print()
  list(best_image, best_loss)
}

Ready to run.

Now, we are ready to start the process:

c(best_image, best_loss) %<-% run_style_transfer(content_path, style_path)

In our case, the results didn't change much after ~1000 iterations, and our river scene looked like this:

style epoch 1000

…definitely more inviting than Edvard Munch painted it!

Result

With nervous style transitions, it may take some fiddling until you get the desired result. But as our example shows, that doesn't mean the code has to be complicated. Additionally for ease of understanding, the eager implementation lets you step through the code line-by-line to include debugging output and check tensor shapes. Until next time in our hobbyist hanging series!

Gettys, Leon A., Alexander S. Ecker, and Matthias Bettge. 2015. “A Neural Algorithm of Artistic Style.” CoRR abs/1508.06576. http://arxiv.org/abs/1508.06576.

Post Views: 90