本节主要从两方面学习模型的训练和评估:使用内建的API进行训练和评估或者是自定义函数实现训练和评估。不管使用哪种方法,不同方式构建的模型的训练和评估方式是一样的。

使用内建API

当我们使用内建的API来进行训练和评估时,我们传入的数据必须是Numpy arrays或者是tf.data.Dataset对象,在接下来的几个小节里,我们将会使用MNIST数据集作为示例。

内建API总览

首先创建一个模型,如下:

1
2
3
4
5
6
7
8
9
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

接下来,我们定义一个如下一个数据集:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

然后配置训练参数:

1
2
3
4
5
model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
# Loss function to minimize
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
# List of metrics to monitor
metrics=['sparse_categorical_accuracy'])

接下来按照小批次的数目(batch_size)来训练这个模型,迭代整个数据集次数通过epochs设置:

1
2
3
4
5
6
7
8
9
10
11
print('# Fit model on training data')
history = model.fit(x_train, y_train,
batch_size=64,
epochs=3,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)
>> history dict: {'loss': [0.34013055738687514, 0.15638909303188325, 0.11687878879904746], 'sparse_categorical_accuracy': [0.90308, 0.95404, 0.96512], 'val_loss': [0.18770194243788718, 0.13478265590667723, 0.11865641107037664], 'val_sparse_categorical_accuracy': [0.9454, 0.9615, 0.9672]}

返回的对象记录了训练过程中损失值(loss value)和度量值(metrics)。下面的代码用于评估和预测:

1
2
3
4
5
6
7
8
9
10
# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)

定义损失函数,评价指标和优化器

为了训练模型,我们需要定义损失函数,评价指标和优化器。我们可以再模型编译期间传入这些参数:

1
2
3
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])

注意,metrics参数必须是一个列表,可以传入多个评价指标。对于含有多个输出的模型,我们可以分别为其定义损失函数,评价指标和优化器。同时,一些默认的参数值我们也可以使用字符串。为了重用,我们定义如下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model

def get_compiled_model():
model = get_uncompiled_model()
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
return model

内建的损失函数,评价指标和优化器

内建优化器:SGD,RMSprop,Adam;内建损失函数:MeanSquareError,KLDivergence,CosineSimilarity;内建评估指标:AUC,Precision,Recall。

自定义损失函数

有两种方法来自定义我们的损失函数,第一种是定义一个函数,接受y_truey_pred参数:

1
2
3
4
5
6
7
def basic_loss_function(y_true, y_pred):
return tf.math.reduce_mean(tf.abs(y_true - y_pred))

model.compile(optimizer=keras.optimizers.Adam(),
loss=basic_loss_function)

model.fit(x_train, y_train, batch_size=64, epochs=3)

另外一种方法是构造tf.keras.losses.Loss的子类,并且实现以下两个方法:

  • __init__:接受传向损失函数的参数
  • call(self, y_true, y_pred):用于计算模型的损失

传向__init__的参数可以被call方法调用。以下方法实现实现了BinaryCrossEntropy损失函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class WeightedBinaryCrossEntropy(keras.losses.Loss):
"""
Args:
pos_weight: Scalar to affect the positive labels of the loss function.
weight: Scalar to affect the entirety of the loss function.
from_logits: Whether to compute loss from logits or the probability.
reduction: Type of tf.keras.losses.Reduction to apply to loss.
name: Name of the loss function.
"""
def __init__(self, pos_weight, weight, from_logits=False,
reduction=keras.losses.Reduction.AUTO,
name='weighted_binary_crossentropy'):
super().__init__(reduction=reduction, name=name)
self.pos_weight = pos_weight
self.weight = weight
self.from_logits = from_logits

def call(self, y_true, y_pred):
ce = tf.losses.binary_crossentropy(
y_true, y_pred, from_logits=self.from_logits)[:,None]
ce = self.weight * (ce*(1-y_true) + self.pos_weight*ce*(y_true))
return ce

由于数据集由10个类别,但我们使用的是二元损失,所以我们只考虑每个类别的预测,这样就能基于二元损失来计算了。首先创建独热码:

1
one_hot_y_train = tf.one_hot(y_train.astype(np.int32), depth=10)

接下俩训练模型:

1
2
3
4
5
6
7
8
9
model = get_uncompiled_model()

model.compile(
optimizer=keras.optimizers.Adam(),
loss=WeightedBinaryCrossEntropy(
pos_weight=0.5, weight = 2, from_logits=True)
)

model.fit(x_train, one_hot_y_train, batch_size=64, epochs=5)

自定义评估指标

可以通过创建Metric来实现自定义的评价指标,需要实现下列四种方法:

  • __init__:用于创建状态变量
  • update_state(self, y_true, y_pred, sample_weight=None):用于更新状态
  • result(self):使用状态变量计算最终结果
  • reset_states(self):重新初始化状态

下面是一个实现了CategoricalTruePositive评价指标:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class CategoricalTruePositives(keras.metrics.Metric):

def __init__(self, name='categorical_true_positives', **kwargs):
super(CategoricalTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name='tp', initializer='zeros')

def update_state(self, y_true, y_pred, sample_weight=None):
y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32')
values = tf.cast(values, 'float32')
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, 'float32')
values = tf.multiply(values, sample_weight)
self.true_positives.assign_add(tf.reduce_sum(values))

def result(self):
return self.true_positives

def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.true_positives.assign(0.)

下面是使用评价指标的代码:

1
2
3
4
5
6
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[CategoricalTruePositives()])
model.fit(x_train, y_train,
batch_size=64,
epochs=3)

处理非常规的损失函数和评价指标

可以通过y_predy_true来计算损失函数和评价指标,然而,并非对所有的损失函数和评价指标都是如此。比如,一个正则化的损失函数可能需要某个层的激励值,而这个激励值并非是模型的输出。处理此类问题,我们可以再自定义层中的call方法中加入self.add_loss(loss_value):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class ActivityRegularizationLayer(layers.Layer):

def call(self, inputs):
self.add_loss(tf.reduce_sum(inputs) * 0.1)
return inputs # Pass-through layer.

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))

# The displayed loss will be much higher than before
# due to the regularization component.
model.fit(x_train, y_train,
batch_size=64,
epochs=1)

同样,对于评价指标也是如此:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class MetricLoggingLayer(layers.Layer):

def call(self, inputs):
# The `aggregation` argument defines
# how to aggregate the per-batch values
# over each epoch:
# in this case we simply average them.
self.add_metric(keras.backend.std(inputs),
name='std_of_activation',
aggregation='mean')
return inputs # Pass-through layer.


inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)

# Insert std logging as a layer.
x = MetricLoggingLayer()(x)

x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
batch_size=64,
epochs=1)

再函数式API中,我们可以通过model.add_loss(loss_tensor)model.add_metric(metric_tensor, name, aggregation)来实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

model.add_loss(tf.reduce_sum(x1) * 0.1)

model.add_metric(keras.backend.std(x1),
name='std_of_activation',
aggregation='mean')

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x_train, y_train,
batch_size=64,
epochs=1)

自动设置验证集

再第一个实例中,我们使用validation_data来手动设置验证集。其实我们还可以使用validation_split参数来定义我们验证集的比例,需要注意的是,验证集在fit之前选取原数据集的前$ x% $比例的数据作为验证集。validation_split参数只能在训练Numpy数据集时使用:

1
2
model = get_compiled_model()
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1, steps_per_epoch=1)

从Datasets中训练和评估

在TF2中,tf.data下的API用于加载数据和数据预处理。我们可以直接将Dataset的实例传给fit,evaluate,predoct等函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
model = get_compiled_model()

# First, let's create a training Dataset instance.
# For the sake of our example, we'll use the same MNIST data as before.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Now we get a test dataset.
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(64)

# Since the dataset already takes care of batching,
# we don't pass a `batch_size` argument.
model.fit(train_dataset, epochs=3)

# You can also evaluate or predict on a dataset.
print('\n# Evaluate')
result = model.evaluate(test_dataset)
dict(zip(model.metrics_names, result))

注意Dataset在每次迭代结束后都会被重置,以此让我们在下次迭代中可以重新使用。如果我们想要定义每次迭代的步数,我们可以使用take参数。在达到指定的步数时,Dataset不会重置,除非它已经被遍历完了:

1
2
3
4
5
6
7
8
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Only use the 100 batches per epoch (that's 64 * 100 samples)
model.fit(train_dataset.take(100), epochs=3)

使用测试数据集

可以给fit函数传入validation_data:

1
2
3
4
5
6
7
8
9
10
11
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3, validation_data=val_dataset)

同样,如果我们定义每次迭代时使用验证集的次数,可以定义validation_steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
model = get_compiled_model()

# Prepare the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

# Prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

model.fit(train_dataset, epochs=3,
# Only run validation using the first 10 batches of the dataset
# using the `validation_steps` argument
validation_data=val_dataset, validation_steps=10)

注意,此时不管测试数据集是否遍历完,都会被重置。

其他输入格式的数据

除了Numpy中的数组和TF2中的Dataset对象,我们还可以使用Pandas的dataframs,或者是Python的generator(能够yield小批次数据)。

总体来说,对于少量数据,可以在内存中保存的,推荐使用Numpy中array,否则使用TF2中Dataset对象。

使用样本权重和类标权重

我们可以在使用fit方法的时候传入样本的权重和类标的权重:

  • 当使用Numpy数据时:通过sample_weightclass_weight参数
  • 当使用TF2的Dataset时:让其返回一个这样的元组:(input_batch, target_batch, sample_weight_batch)

下面是使用Numpy数据进行训练的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np

class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
# Set weight "2" for class "5",
# making this class 2x more important
5: 2.,
6: 1., 7: 1., 8: 1., 9: 1.}
print('Fit with class weight')
model.fit(x_train, y_train,
class_weight=class_weight,
batch_size=64,
epochs=4)

# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
print('\nFit with sample weight')

model = get_compiled_model()
model.fit(x_train, y_train,
sample_weight=sample_weight,
batch_size=64,
epochs=4)

下面是使用Dataset数据集的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=3)

将数据传入多输入输出模型

考虑这样一个模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from tensorflow import keras
from tensorflow.keras import layers

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
outputs=[score_output, class_output])

这个模型有两个输入两个输出。在模型编译期间,我们传入不同的损失函数:

1
2
3
4
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)])

同样可以传入不同的评价指标:

1
2
3
4
5
6
7
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)],
metrics=[[keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
[keras.metrics.CategoricalAccuracy()]])

由于我们已经为输出层赋予了 名字,我们可以使用字典的方式传递参数:

1
2
3
4
5
6
7
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
'class_output': [keras.metrics.CategoricalAccuracy()]})

同样的,我们可以定义损失权重:

1
2
3
4
5
6
7
8
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'score_output': keras.losses.MeanSquaredError(),
'class_output': keras.losses.CategoricalCrossentropy(from_logits=True)},
metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
keras.metrics.MeanAbsoluteError()],
'class_output': [keras.metrics.CategoricalAccuracy()]},
loss_weights={'score_output': 2., 'class_output': 1.})

我们可以为某个输出定义损失函数,而另外一个输出不定义损失函数:

1
2
3
4
5
6
7
8
9
# List loss version
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[None, keras.losses.CategoricalCrossentropy(from_logits=True)])

# Or dict loss version
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={'class_output':keras.losses.CategoricalCrossentropy(from_logits=True)})

传入训练数据集的方式和上述介绍的方式差不多:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.MeanSquaredError(),
keras.losses.CategoricalCrossentropy(from_logits=True)])

# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))

# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
batch_size=32,
epochs=3)

# Alternatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets},
batch_size=32,
epochs=3)

而Dataset的使用方式如下:

1
2
3
4
5
6
train_dataset = tf.data.Dataset.from_tensor_slices(
({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

使用回调

回调对象可以在不同的时间点(每轮迭代的开始,每个批次的结束,每个迭代的结束)被调用来实现不同的功能。回调对象可以被传入fit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
model = get_compiled_model()

callbacks = [
keras.callbacks.EarlyStopping(
# Stop training when `val_loss` is no longer improving
monitor='val_loss',
# "no longer improving" being defined as "no better than 1e-2 less"
min_delta=1e-2,
# "no longer improving" being further defined as "for at least 2 epochs"
patience=2,
verbose=1)
]
model.fit(x_train, y_train,
epochs=20,
batch_size=64,
callbacks=callbacks,
validation_split=0.2)

内建的回调对象

  • ModelCheckpoint: 定时保存模型检查点
  • EarlyStopping: 当评估指数没有改进的时候提前停止
  • TensorBoard: 记录模型的参数值
  • CSVLogger: 将模型的损失值和评价指标保存到CSV文件中

自建回调对象

我们可以通过继承Callback对象实现自定义的回调对象,回调对象可以通过self.model获取相关联的模型,下面是一个实例:

1
2
3
4
5
6
7
class LossHistory(keras.callbacks.Callback):

def on_train_begin(self, logs):
self.losses = []

def on_batch_end(self, batch, logs):
self.losses.append(logs.get('loss'))

保存模型的检查点

当我们的训练集很大的时候,我们需要定期保存模型的检查点,最简单的方式是使用ModelCheckpoint回调:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
model = get_compiled_model()

callbacks = [
keras.callbacks.ModelCheckpoint(
filepath='mymodel_{epoch}',
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
save_best_only=True,
monitor='val_loss',
verbose=1)
]
model.fit(x_train, y_train,
epochs=3,
batch_size=64,
callbacks=callbacks,
validation_split=0.2)

使用学习速率表

深度学习中一个常见的训练模式是递减我们的学习速率,学习速率递减可以实静态的或者是动态的。

将学习速率表传给优化器

我们可以将一个静态的学习速率表通过参数传递给优化器:

1
2
3
4
5
6
7
8
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=100000,
decay_rate=0.96,
staircase=True)

optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

内建的学习速率表还有:ExponentialDecay, PiecewiseConstantDecay, PolynomialDecay, and InverseTimeDecay

使用回调实现动态学习速率表

由于优化器不能获取评估指标,所以动态的学习速率表不能通过内建的学习速率表实现。但是,我们可以通过回调来实现动态学习速率表,因为回调能够获取所有的评价指标。实际上,这已经内建在ReduceLROnPlateau 这个回调中了。

可视化损失和评估值

最好的方法是使用TensorBoard,它可以帮助我们实时可视化损失值和评估值。启动Tensorboard的方法如下:

1
tensorboard --logdir=/full_path_to_your_logs

使用Tensorboard回调

最简单的方法就是在fit的时候传入Tensorboard回调:

1
2
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])

Tensorboard还有一些其他参数可供选择:

1
2
3
4
5
keras.callbacks.TensorBoard(
log_dir='/full_path_to_your_logs',
histogram_freq=0, # How often to log histogram visualizations
embeddings_freq=0, # How often to log embedding visualizations
update_freq='epoch') # How often to write logs (default: once per epoch)

编写自己的训练和评估方法

使用GradientTape

在GradientTape作用域中调用模型会使你很容易得到相关参数的梯度值。下面是一个MNIST模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

训练方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))

# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

# Open a GradientTape to record the operations run
# during the forward pass, which enables autodifferentiation.
with tf.GradientTape() as tape:

# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch

# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)

# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))

# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))

实现自定义评估值

接着我们添加自定义的评估值,下面是工作流:

  • 在每次迭代前初始化评价指标
  • 在每个批次结束时调用metric.update_state
  • 在需要展示结果的时候调用metric.result
  • 在需要重置的时候(如每次迭代的末尾)调用metric.reset_states

接下来手动实现SparseCategoricalAccuracy 评价指标,下面是模型创建时的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)

自定义训练的迭代如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))

# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))

# Update training metric.
train_acc_metric(y_batch_train, logits)

# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))

# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print('Training acc over epoch: %s' % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()

# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
val_logits = model(x_batch_val)
# Update val metrics
val_acc_metric(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print('Validation acc: %s' % (float(val_acc),))

处理额外的损失值

在前面的小节中我们在call方法中调用self.add_loss(values)来正则化损失值,通常来说我们需要将这些额外的损失值也考虑在内,下面是我们实现的其中一个模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class ActivityRegularizationLayer(layers.Layer):

def call(self, inputs):
self.add_loss(1e-2 * tf.reduce_sum(inputs))
return inputs

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

当我们调用模型的时候:

1
logits = model(x_train)

在前向传播过程中的损失值会被加到model.losses属性中。

为了将额外的损失值考虑在内,我们需要修改我们自定义的训练循环体中的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
optimizer = keras.optimizers.SGD(learning_rate=1e-3)

epochs = 3
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))

for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train)
loss_value = loss_fn(y_batch_train, logits)

# Add extra losses created during this forward pass:
loss_value += sum(model.losses)

grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))

# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))