Getting started with the Keras functional API

Keras功能API是定义复杂模型(例如多输出模型,有向无环图或具有共享层的模型)的方法.

本指南假定您已经熟悉Sequential模型.

让我们从简单的事情开始.


First example: a densely-connected network

Sequential模型可能是实现这种网络的更好选择,但它有助于从一个非常简单的事情入手.

  • 一个图层实例是可调用的(在张量上),它返回一个张量
  • 输入张量和输出张量可用于定义Model
  • 可以像Keras Sequential模型一样训练这种模型.
from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
output_1 = Dense(64, activation='relu')(inputs)
output_2 = Dense(64, activation='relu')(output_1)
predictions = Dense(10, activation='softmax')(output_2)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # starts training

All models are callable, just like layers

使用功能性API,可以轻松重用经过训练的模型:通过在张量上调用任何模型,就可以将其视为层. 请注意,通过调用模型,您不仅可以重用模型的体系结构 ,还可以重用其权重.

x = Input(shape=(784,))
# This works, and returns the 10-way softmax we defined above.
y = model(x)

例如,这可以允许快速创建可以处理输入序列的模型. 您只需一行即可将图像分类模型转换为视频分类模型.

from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

Multi-input and multi-output models

这是功能性API的一个很好的用例:具有多个输入和输出的模型. 功能性API使操作大量相互交织的数据流变得容易.

让我们考虑以下模型. 我们试图预测在Twitter上会收到多少条新闻头条和新闻头条. 该模型的主要输入将是标题本身,以单词的顺序显示,但是为了使事情变得有趣,我们的模型还将具有一个辅助输入,接收额外的数据,例如标题发布的时间等.该模型还将通过两个损失函数进行监督. 在模型的早期使用主要损失函数是深度模型的良好正则化机制.

我们的模型如下所示:

multi-input-multi-output-graph

让我们使用功能性API来实现它.

主输入将接收标题,它是整数序列(每个整数编码一个单词). 整数将在1到10,000之间(词汇为10,000个单词),并且序列的长度将为100个单词.

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
import numpy as np
np.random.seed(0)  # Set a random seed for reproducibility

# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

在这里,我们插入辅助损耗,即使模型中的主要损耗会高得多,也可以平稳地训练LSTM和嵌入层.

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

至此,我们通过将辅助输入数据与LSTM输出连接起来,将其输入模型:

auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# We stack a deep densely-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

这定义了一个具有两个输入和两个输出的模型:

model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

我们编译模型,并为辅助损失分配0.2的权重. 要为每个不同的输出指定不同的loss_weightsloss ,可以使用列表或字典. 在这里,我们传递一个损失作为loss参数,因此所有输出将使用相同的损失.

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

我们可以通过传递输入数组和目标数组的列表来训练模型:

headline_data = np.round(np.abs(np.random.rand(12, 100) * 100))
additional_data = np.random.randn(12, 5)
headline_labels = np.random.randn(12, 1)
additional_labels = np.random.randn(12, 1)
model.fit([headline_data, additional_data], [headline_labels, additional_labels],
          epochs=50, batch_size=32)

由于我们的输入和输出是命名的(我们为它们传递了" name"参数),因此我们还可以通过以下方式编译模型:

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': headline_labels, 'aux_output': additional_labels},
          epochs=50, batch_size=32)

要将模型用于推理,请使用

model.predict({'main_input': headline_data, 'aux_input': additional_data})

或者,

pred = model.predict([headline_data, additional_data])

Shared layers

功能API的另一个好用法是使用共享层的模型. 让我们看一下共享层.

让我们考虑一下推文的数据集. 我们想要构建一个模型,该模型可以判断两个推文是否来自同一个人(例如,这可以使我们通过用户的推文的相似性来比较用户).

实现此目的的一种方法是构建一个模型,该模型将两个tweet编码为两个向量,将这些向量连接起来,然后添加逻辑回归. 这会输出两个推文共享同一作者的可能性. 然后,将在正向推文对和负向推文对上训练模型.

因为问题是对称的,所以应该重用编码第一条推文的机制(权重和全部)来编码第二条推文. 在这里,我们使用共享的LSTM层对推文进行编码.

让我们使用功能性API来构建它. 我们将一条形状为(280, 256) 280,256)的二进制矩阵(280, 256)即280个大小为256的向量的序列(280, 256)作为一条推文的输入,其中256维向量中的每个维都编码字符的存在/不存在( 256个常用字符的字母).

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))

要跨不同的输入共享一个图层,只需实例化该图层一次,然后根据需要在任意多个输入上调用它:

# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)

# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

让我们暂停一下,看看如何读取共享层的输出或输出形状.


The concept of layer "node"

每当在某些输入上调用图层时,都将创建一个新的张量(该图层的输出),并在该图层上添加一个"节点",将输入张量链接到输出张量. 当您多次调用同一层时,该层拥有多个索引为0、1、2 ...的节点.

layer.get_output()版本的Keras中,您可以通过layer.get_output()获得图层实例的输出张量,或者通过layer.output_shape其输出形状. 您仍然可以(除非get_output()已由属性output代替). 但是,如果一个层连接到多个输入怎么办?

只要一层仅连接到一个输入,就不会造成混淆, .output将返回该层的一个输出:

a = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)

assert lstm.output == encoded_a

如果该图层具有多个输入,则不是这样:

a = Input(shape=(280, 256))
b = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)

lstm.output
>> AttributeError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.

好吧. 以下作品:

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b

很简单,对不对?

The same is true for the properties input_shape and output_shape: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of "layer output/input shape" is well defined, and that one shape will be returned by layer.output_shape/layer.input_shape. But if, for instance, you apply the same Conv2D layer to an input of shape (32, 32, 3), and then to an input of shape (64, 64, 3), the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to:

a = Input(shape=(32, 32, 3))
b = Input(shape=(64, 64, 3))

conv = Conv2D(16, (3, 3), padding='same')
conved_a = conv(a)

# Only one input so far, the following will work:
assert conv.input_shape == (None, 32, 32, 3)

conved_b = conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)

More examples

代码示例仍然是入门的最佳方法,因此这里还有更多示例.

Inception module

有关Inception架构的更多信息,请参见《通过卷积进行深入研究》 .

from keras.layers import Conv2D, MaxPooling2D, Input

input_img = Input(shape=(256, 256, 3))

tower_1 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)
tower_1 = Conv2D(64, (3, 3), padding='same', activation='relu')(tower_1)

tower_2 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)
tower_2 = Conv2D(64, (5, 5), padding='same', activation='relu')(tower_2)

tower_3 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(input_img)
tower_3 = Conv2D(64, (1, 1), padding='same', activation='relu')(tower_3)

output = keras.layers.concatenate([tower_1, tower_2, tower_3], axis=1)

Residual connection on a convolution layer

有关残差网络的更多信息,请参见用于图像识别的深度残差学习 .

from keras.layers import Conv2D, Input

# input tensor for a 3-channel 256x256 image
x = Input(shape=(256, 256, 3))
# 3x3 conv with 3 output channels (same as input channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# this returns x + y.
z = keras.layers.add([x, y])

Shared vision model

该模型在两个输入上重用了相同的图像处理模块,以对两个MNIST数字是相同数字还是不同数字进行分类.

from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten
from keras.models import Model

# First, define the vision modules
digit_input = Input(shape=(27, 27, 1))
x = Conv2D(64, (3, 3))(digit_input)
x = Conv2D(64, (3, 3))(x)
x = MaxPooling2D((2, 2))(x)
out = Flatten()(x)

vision_model = Model(digit_input, out)

# Then define the tell-digits-apart model
digit_a = Input(shape=(27, 27, 1))
digit_b = Input(shape=(27, 27, 1))

# The vision model will be shared, weights and all
out_a = vision_model(digit_a)
out_b = vision_model(digit_b)

concatenated = keras.layers.concatenate([out_a, out_b])
out = Dense(1, activation='sigmoid')(concatenated)

classification_model = Model([digit_a, digit_b], out)

Visual question answering model

当询问有关图片的自然语言问题时,该模型可以选择正确的单字答案.

它的工作原理是将问题编码为向量,将图像编码为向量,将两者连接起来,然后对一些潜在答案的词汇进行逻辑回归训练.

from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential

# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Flatten())

# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(224, 224, 3))
encoded_image = vision_model(image_input)

# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 words long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# Let's concatenate the question vector and the image vector:
merged = keras.layers.concatenate([encoded_question, encoded_image])

# And let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# This is our final model:
vqa_model = Model(inputs=[image_input, question_input], outputs=output)

# The next stage would be training this model on actual data.

Video question answering model

现在我们已经训练了图像质量检查模型,我们可以将其快速转换为视频质量检查模型. 经过适当的培训,您可以向其展示一个短片(例如100帧人类动作),并询问有关该片自然语言的问题(例如"男孩正在从事什么运动?"->"足球").

from keras.layers import TimeDistributed

video_input = Input(shape=(100, 224, 224, 3))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)