Visualize Machine Learning metrics with Tensorflow and Tensorboard

or how to save Machine Learning Engineer’s life

Armin Hadzalic
6 min readDec 26, 2020

An important step in understanding machine learning processes and evaluations is the interpretation of various metrics. These include:

It can be very time-consuming to read individual results numerically, particularly if the model has been training for many epochs.

Tensorboard can offer a remedy for this. It is a convenient and graphical tool for:

  • Visually tracking model performance in real time
  • Reviewing historical model performances
  • Comparing the performance of different model architectures and hyper parameter settings applied to train the same data

In this post I want to show how Tensorboard enables you to, epoch over epoch, visually track your model’s cost (loss) and accuracy (acc) across both your training data and your validation (val) data.

Simple computational graph in Tensorboard

But let’s start with the basic framework Tensorboard needs in order to show the visualization of the computational graphs: the Tensorflow code.
Every Tensorflow code contains two essential elements:

  • build a graph that represents the data flow of the computation
  • run a session that executes the computation from the graph and provides the corresponding resources (CPU, GPU, TPU).

What does such a graph look like?

A simple example:

f(x,y) = x * y + 10

The computational graph for the above function looks like this in Tensorboard:

simple example for the graph f(x,y) = x * y + 10

One can take a closer look at the elements by double clicking and opening them:

more details within the objects

Our graph example can be created e.g. in a Jupyter notebook with the following code. I am using Tensorflow V2 — that’s why the compatibility namespace is considered. If you are using V1 just skip the „combat.V1.„-string:

import tensorflow as tf
import datetime

LOGDIR = '../logger/' +'%Y%m%d-%H%M%S') + '- computational_graph'
with tf.compat.v1.Session() as sess:

# Build a graph.
x = tf.compat.v1.Variable(5, name='x')
y = tf.compat.v1.Variable(10, name='y')
ten = tf.constant(11, name='ten')

f = tf.add(tf.multiply(x,y, 'multiply'), ten)

Before starting the logs, a so-called session and the log directory must be defined. Furthermore, the initialization of the variables must be carried out.

    writer = tf.compat.v1.summary.FileWriter(LOGDIR, sess.graph)

init = tf.compat.v1.global_variables_initializer()

Now the session is started and the computational graph is stored in the LOGDIR directory.

    result =
print(result) # --> 60

The dashboard is started with tensorboard -logdir=LOGDIR in any console or terminal program –> now the localhost ( can be called via the browser on port 6006 and the graph can be displayed.

Tensorboards GRAPHS-Tab

Graphs for accuracy and loss in Tensorboard

Let’s come to a workflow that has prepared data in a previous step.
The data may have been extracted on the basis of a CSV file. It does not matter for further consideration, thus the PreProcessing part can be skipped here. It is only important that the data already contains the features Xn and the corresponding label Yn.

Main program could look like this:

# Main program
x, y = PreProcessing(rawfile)
ExecuteNeuralNetwork(x, y)

Let’s dive into the details of the NeuralNetwork method:

code example for Tensorboard logdir by author

Here we see some important processes and hyperparameters that are essential for training. First, the data is separated into a training and test portion (line 8). This is done with a function from the SKLEARN library and the corresponding test_size parameter. In order to get a shuffle, the random_state-parameter is set to, e.g. 42. Taking zero instead, the list would be used as it is, i.e. the first 80% would be processed as training data, the remaining 20% as test data. This can sometimes have an unfavourable effect on the result.

In our example, a neural network is used as the underlaying architecture and algorithm (line 15–20). Of course, other methods such as RandomForest or the DecissionTree-Classifier are also conceivable. Here, one must proceed according to the objectives and prepare the choice very carefully. However, it is not the subject of this post, but you can find the examples for mentioned algorithms on my GitHub path

The only important thing for our consideration is that the results of the training end up in our Tensorboard object and its log directory defined in the callbacks-parameter (line 29, 32). It can be helpful here that all runs are time-stamped and concatenated with other hyperparameter strings. This makes it possible to see at a glance which parameters were used in the model and to display them more easily in the dashboard. In addition, the creation of subfolders in the log directory can conveniently also be used for the historical display of data in the Tensorboard dashboard.

additional metrics if validation_data is used

In order to also get the validation data metrics, one must set the parameter in line 32 accordingly. They will appear as separate fields in the Tensorboard dashboard.

The results of the different settings can even be observed during the training run. To do this, simply update the web browser from time to time. You can also filter the results by selecting the relevant runs on the middle left side of the dashboard.

final results with two selected training runs (each over 20 epochs)

The architecture of the NeuralNetwork or the underlaying algorithm can be viewed in the GRAPHS-tab. Just like we did already with the simple graph at the beginning of this article:

structure of algorithms can be shown in the GRAPHS-tab

Finally, let’s look at the explanation of the two metrics we achieved with our training:

Accuracy is a method of measuring the performance of a classification model. It is usually expressed as a percentage and easier to interpret than loss. Basically, it is the number of predictions where the predicted value is equal to the actual value. It is binary (true/false) for a selected sample.

As already mentioned above, accuracy can be plotted and monitored in Tensorboard at runtime of the training phase, although the value is often associated with the overall or final model accuracy.

A loss function, also called a cost function, takes into account the probabilities of a prediction. It is based on how much the prediction deviates from the true value. This gives us a more sophisticated view of how well the model is performing.

Unlike accuracy, loss is not a percentage — it is a summation of the errors made for each sample in the training or validation sets. Loss can be used in the training process to find the „best“ parameter values for the model (e.g. weights in a neural network). During the training process, the goal is to minimize this value and obtain a falling curve.

The most common loss functions are the logarithmic loss and the cross-entropy loss (which give the same result when calculating error rates between 0 and 1), as well as the mean square error and the likelihood loss.
Unlike accuracy, loss can be used in both classification and regression problems.

Most often, one observes that accuracy increases as loss decreases — but this is not always the case. Accuracy and loss have different definitions and measure different things. At first glance, they often appear to be inversely proportional, but there is no mathematical relationship between these two metrics. In our example, the two runs are almost identical — at least in terms of accuracy. In this respect, one cannot really speak of an advantage in the use for one or the other hyperparameter. Of course, this can look quite different for other data sets.

All in all, I can really say that with Tensorboard and its various overviews, a tool has been created that greatly simplifies the evaluation of Machine Learning Algorithms and its metrics.

I would be pleased if you could tell me your methods for displaying ML metrics. What advantages and disadvantages have you found?

I hope I was able to give you a better understanding of how my approach with Tensorboard works with this post. Have fun exploring further.

See also more details on LinkedIn



Armin Hadzalic

Software-Developer at the core. I am carrying over 15 years of experience in Software Testing Industry. Interested in Machine Learning and Data Science concepts