NaN output from Bidirectional LSTM in Keras

by cjbayron   Last Updated October 06, 2018 07:19 AM - source

I am creating a bi-directional LSTM using tf.keras APIs:

input_layer = tf.reshape(emb_seqs, [const.TRN_BATCH_SIZE, -1, const.VECTOR_SIZE])

lstm_layer = tf.keras.layers.LSTM(units=LSTM_UNITS, return_sequences=True)
outputs = tf.keras.layers.Bidirectional(lstm_layer, input_shape=(const.TRN_BATCH_SIZE,
            None, const.VECTOR_SIZE), merge_mode='ave')(input_layer)

Basically, what I have here as input is a batch of sequences, where each element is an embedding of a word:

emb_seqs = 
(batch 1) [embedding_of_word_13 , ... , embedding_of_word_56]
(batch 64) [embedding_of_word_134 , ... , embedding_of_word_1]

Now for debugging purposes, I used a batch size of 1, and I fed the following sequence as input to the bi-directional LSTM:

[[[-7.87589252e-02  3.81238684e-02  2.46017668e-02 ...  3.05491805e-01
    5.58324933e-01  2.69501805e-01]
  [-7.59337842e-03  2.51456839e-03  5.05969720e-03 ...  1.22368988e-02
    2.25961581e-02  8.48024990e-03]
  [-1.43043918e-03  6.58374513e-03 -2.07518134e-03 ...  1.81920007e-02
    2.98911836e-02  1.51892835e-02]
  [-1.32404581e-01  5.95593601e-02  3.21036987e-02 ...  4.91197437e-01
    8.74117851e-01  4.24527436e-01]
  [ 1.19205553e-03  2.33205431e-03  3.17622209e-03 ...  6.20844401e-03
    3.24445940e-03  1.87554935e-04]
  [-8.04784670e-02  3.15198787e-02  1.78723559e-02 ...  3.03947985e-01
    5.50726295e-01  2.61890590e-01]]]

Here comes the WEIRD PART. I am getting undefined behavior for the output. In different instances of feeding the SAME sequence as input, sometimes I get seemingly legit values from the LSTM, sometimes NaN.

Output 1:

[[[ 2.1026004e-02 -4.6658158e-02 -2.1737166e-02 ... -1.6085088e-02
    2.0144433e-02 -3.3039725e-03]
  [ 2.2544309e-04 -7.4539674e-03 -1.1727371e-02 ... -7.7428427e-03
    1.0690016e-02  3.1484920e-04]
  [ 6.9675897e-04  3.6862645e-01 -1.4593299e-02 ... -1.3194597e-02
   -3.6737880e-01 -3.8110828e-01]
  [ 3.7775612e-03 -3.9620440e-02 -2.6467213e-02 ... -4.1612733e-02
    2.6258381e-02 -7.2092814e-03]
  [ 1.8386112e-03 -1.2701134e-02 -8.0263400e-03 ... -1.1949232e-02
    1.1301106e-02 -2.6325984e-03]
  [ 1.2903613e-03 -2.1743327e-02 -1.2070607e-02 ... -2.0296849e-02
    1.4726696e-02 -3.5181935e-03]]]

Output 2:

[[[nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]
  [nan nan nan ... nan nan nan]]]

Has anyone encountered the same issue? Is there something wrong with the code?


  1. I used LSTM_UNITS = 512 and VECTOR_SIZE = 100
  2. I have already tried re-scaling the embedding vector values by a factor of 10, thinking that their initial values are too small, but I am still getting NaN values occasionally after doing this.

Related Questions

Using A Compressed Dataset with Keras LSTM

Updated May 18, 2017 22:19 PM

Parameters Grid Search for Keras LSTM on Time Series

Updated February 22, 2019 09:19 AM