Neural network for sequence prediction

Zatnikitelman · Jun 27, 2019

Hey guys, I know I've been absent from Orbiter itself for a long time, but I'm working on that. But I still poke around here and hope to get back into flights and dev work one day. So with that, I know neural networks have been a topic around these parts a number of times so I'm hoping someone can point me in the right direction.

I have two problems, one is a simple time-series prediction, I get a list of numbers (time interval is irrelevant), and I'd like to predict the next one in the series. Ideally, I'd get some kind of probability back rather than just a single number. From the research I've done, Long Short-Term Memory seems to be the way to go, but I haven't found great examples that I can go off of. I'm most experienced in .NET, but I was poking around in JavaScript the other day trying (and largely failing) to implement a TensorFlow solution. The example here would be my church's kid's program's attendance. It'd be nice to help schedule volunteers if we had a better idea what the next week's attendance will be.

The second problem is a time-series pair problem. I already have a something, and I want to predict the most likely number (again, a probability) associated to that number. The real world example would be after-church events. Let's say problem #1 predicts 50 kids this week. Given 50 kids, how many are staying after church?

So between the two problems, I already have data that looks like this (but lots more):

Week, Attendance, After-Church attendance
1, 51, 23
2, 48, 14
3, 65, 25
4, 56, 28

The output I'm going after I think would hopefully be something like Attendance: 50-55 80%, 55-60 70% ... and same for after-church attendance.

Can anyone point me in the direction of good resources online, or a good book? Looking up "neural network" on Amazon or Barnes and Noble yields lots of results and I'd rather not sink money into something blind, and I have about 18 tabs open right now from different search results so the internet isn't great either. This isn't an assignment per-se, we already do some rough statistics, like averages in our spreadsheets, but I think it'd be fun to do more, plus, it'd be a good intro into Neural Networks for me!

Thanks!

Urwumpe · Jun 27, 2019

I just have an older German textbook around about the basics, so no really good recommendations for reading here.

Do you want results or do you want to learn neural networks? In the first case, you shoul focus on modelling and just use a library.

Zatnikitelman · Jun 28, 2019

Using a library to get results is fine and really what I've been searching for the most.
https://www.codeproject.com/Articles/1265477/TensorFlow-js-Predicting-Time-Series-Using-Recurre This is the tutorial that I think has been the closest to what I'm looking for, but I still haven't made much progress with it. Ideally, I'd like something in .NET. Eventually, sure I'd like to get under the hood, but for now, just getting input, training, and output will be nice.

Urwumpe · Jun 28, 2019

Well, the biggest problem is how to model the input. You could use just one value of course for each number - but what does that number describe then?

For example, you can be sure, that on special holidays, the attendance will be vastly higher - how could you teach the network what a prediction for a holiday is? You would need to describe not just what the past values are, but also which of of those samples is a holiday, maybe even for better predictions, which kind of holiday (EG here, christmas would be overwhelming with attendance, but only very few people would stay afterwards).

Or capacity - could you get more attendance than people could enter? Sure not.

And of couse, as you see in the example, they did not just use one input neuron for the series, but 100, each getting one sample of the time series.

Zatnikitelman · Jun 28, 2019

Yea, I know there will be odd spikes, but I have to start somewhere and I figure these two sequence prediction problems are good places for me to start. I guess to make it simple, for now I'd just like to take an arbitrarily long sequence, train a neural net with it, and predict the next number in the sequence. I'd also like to compute a pairing so given {30,20},{31,25}, {41,15}, ..., {33,15}, {19,X} what is X likely to be?

Zatnikitelman · Jun 30, 2019

Well, I've found a tutorial for LSTMs using the Keras library in Python. It seems fairly good and straightforward, but it doesn't delve too deeply into a lot of the "why" of each step. https://stackabuse.com/time-series-analysis-with-lstm-using-pythons-keras-library/ My graph is also not matching theirs. The prediction might not be expected to match exactly, but the actual data should.

RisingFury · Jun 30, 2019

LSTM networks work by feeding in some input then they give you a predicted output. This works sort of well for text, speech or music, where you put in say 10 notes, it pumps out the 11th, then you put in the 9 old ones, dropping the first one and appending the note it generated. Same with words or audio samples.

If you want to use LSTM and don't want to spend months learning the SGD algorithm to produce a working network, I'd suggest downloading some code by Andrej Karpathy.

This project of your is most likely going to fail horribly, though. Neural networks are only as good as the data you feed into them. The CNN I wrote myself when learning SGD got about 8000 images that were 64x64. I trained different configurations of the network and filtered the data in different ways for a total of 60 training runs, then picked out the 11 best networks and made them vote on the outcome. When making a distinction between a nut or a bolt, the neural network ensemble got it right pretty much every time. I have not seen it get anything wrong.

But you simply don't have enough data for the network to train effectively. Generally you need hours of audio, thousands of pages of text or millions of lines of data in order for LSTMs to pick out a pattern from that.

If you want to learn the very basics of neural networks, then this online book by Dr. Michael Nielsen is REALLY good (and it's free):

http://neuralnetworksanddeeplearning.com/

A series of videos illustrating and explaining it, by 3Blue1Brown:

---------- Post added at 10:21 ---------- Previous post was at 08:27 ----------

Ok, so I should probably go a bit more into how neural networks work and what they do...

Neural networks are used to approximate mathematical functions.

We all know a few examples of mathematical functions:
f(x) = a*x^2 + b*X + c
f(t) = a*sin(w*t)
...

While some functions are easy to express in such mathematical terms, others are not. Driving a car is a function.

The inputs are: Road direction, traffic signs, presence of other commuters, weather, state of the car, current velocity, level of fatigue, date and time,...

The outputs are:
Steering, accelerator, brake, clutch, transmission, signal lights,...

The function can then be written as
f(inputs) = outputs
but it's damn near impossible to come up with a mathematical model to approximate this function.

This is what neural networks are good at. When told the inputs and the desired outputs, they find the function in between.

Here's your problem: You're giving the network input in the form of week number and then giving it the desired output in the form of attendance. But church attendance is not only a function of week number. Everything comes into play: Date, weather, vacations, state of economy, state of religiosity,...

You only gave the network one input and assumed that this is already enough. You're forcing the network to find a mathematical model that connects the inputs and the outputs you gave it.

The network will find them. And that right there is your next problem. It's called overfitting.

Let's say we have some data in the form of date and temperature:
March 1st: 13°c
March 2nd: 15°C
March 3rd: 14°C
March 4th: 15°C
March 5th: 16°C
March 6th: 16°C

We see a fairly steady rise in temperature. If we fit a model T(t) = k*t + T0 to this, we see that on average, the temperature changes at a rate of about 0.5°C per day. This model might describe March 7th and 8th quite well, predicting temperatures around 17°C, but by mid August, it'd predict temperatures in the 80 to 90°C.

You'll also notice that T(t) = k*t + T0 is a linear model and does not describe the given data accurately. You can find a high order polynomial that does, though. You'd need a polynomial of 6th order. That model would fit the data perfectly, but would be even more wildly wrong by August. This tendency for the networks to find really complicated models for the data is called overfitting, because they literally try too hard

You only have a limited data set, so in order for you to get any kind of training out of your network, you'd need to train it over and over and over on the same dataset, increasing the tendency for your network to overfit.

---------- Post added at 11:35 ---------- Previous post was at 10:21 ----------

Zatnikitelman said:
Well, I've found a tutorial for LSTMs using the Keras library in Python.

Oh, good God no!!! No no no! There are things that Python does ok, but number crunching is not it! I had to rewrite my convolutional neural network from Python to C++ because Python was literally 100 times slower. You need to embrace hardcore C/C++ for this.

CNNs can be parallelised, LSTMs cannot be. You'll have to train on the CPU.

Zatnikitelman · Jul 5, 2019

I finally have something to play with: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
That site breaks it down enough that I actually know what I'm doing now, and I can play around and see what happens when I do things. I know Python isn't fast, even in the simple sequences I'm using right now I can see the need for speed so to speak. But at least it's a start! I know there are a lot of caveats, you've done a great job breaking those down RisingFury, and I appreciate the dose of reality. I'm still going to try it, see what happens. Even if it doesn't end up achieving my specific goal, I'll have learned some good info along the way.

Thorsten · Jul 6, 2019

I have two problems, one is a simple time-series prediction, I get a list of numbers (time interval is irrelevant), and I'd like to predict the next one in the series.

Chiming with what RisingFury has written - I kind of don't see how this is a time series where a number can be predicted from the previous numbers in the first place.

I mean, how many people plan their church visit based on how many people were in church the last week?

There's plenty of outside (unknown) factors at play, so the actual series you have is very likely a random fluctuation around a average - which you can model by knowing the average and the standard deviation.

The problem of a neural net is that it will fit anything and produce a result - but there's no reason to assume that result would mean anything.

It would be far more rewarding to apply the technique to something which can actually be predicted and learned - just my 2 cents.

Enjo · Jul 6, 2019

Thorsten said:
There's plenty of outside (unknown) factors at play, so the actual series you have is very likely a random fluctuation around a average - which you can model by knowing the average and the standard deviation.

Agreed. Don't underestimate the power of simple models, when you don't have enough data for a complicated model like NNet to succeed i.e. be able to generalize instead of overfitting.

This looks like a good read for you currently:
https://towardsdatascience.com/overfitting-vs-underfitting-a-complete-example-d05dd7e19765

---------- Post added at 02:49 PM ---------- Previous post was at 09:49 AM ----------

RisingFury said:
Oh, good God no!!! No no no! There are things that Python does ok, but number crunching is not it! I had to rewrite my convolutional neural network from Python to C++ because Python was literally 100 times slower. You need to embrace hardcore C/C++ for this.

CNNs can be parallelised, LSTMs cannot be. You'll have to train on the CPU.

So what C++ libraries would you use for a GPU-parallelized CNN?

Zatnikitelman · Jul 9, 2019

Or just any C++ libraries that are faster than Python? Preferably as easy as Keras? Really starting to feel the performance problems now :uhh:

RisingFury · Jul 9, 2019

Zatnikitelman said:
I know there are a lot of caveats, you've done a great job breaking those down RisingFury, and I appreciate the dose of reality. I'm still going to try it, see what happens. Even if it doesn't end up achieving my specific goal, I'll have learned some good info along the way.

I hope you try it. There's a lot of valuable knowledge to be learned from it. Though if you're in it for learning, you should download some fairly large and good data sets. A classic data set to learn on is MNIST. 60 000 images that are 24x24 pixels that contain hand written digits 0 - 9. They're all centered and cleaned up, so if your data is "clean", the knowledge you gain from it is actually knowledge of neural networks and not that of manipulating data to fit them.

Thorsten said:
Chiming with what RisingFury has written - I kind of don't see how this is a time series where a number can be predicted from the previous numbers in the first place.

I mean, how many people plan their church visit based on how many people were in church the last week?

People don't plan their visits based on how many people there were last week, but there might be a pattern in the numbers that comes as a result of hidden behavior. A neural network could pick up on it.

Thorsten said:
It would be far more rewarding to apply the technique to something which can actually be predicted and learned - just my 2 cents.

That's the thing - humans are really terrible at figuring out "what can be learned", because we don't see patterns in anything more than 3 dimensional data - and even that's tough. Neural networks can deal with thousands of dimensions and pick out a pattern from there.

Enjo said:
So what C++ libraries would you use for a GPU-parallelized CNN?

I wouldn't!

Invest in knowledge of parallel computing. You don't have to do it straight away. I can already tell you that C++ is hundreds of times faster than Python and that a GPU can speed up certain types of neural networks about tens of times.

Thorsten · Jul 9, 2019

Neural networks can deal with thousands of dimensions and pick out a pattern from there.

Disclaimer: I worked on Neural Networks waaay back during a holiday workstudy at a plasma physics institute - they were just 'the new thing' back then (I seem to remember Windows NT had just appeared and was installed on the PC I was using for the sim) - so we have a history.

Having said that - they're not magic. They're a particular form of a fit function with parameters adjusted to a data set. So they don't really 'pick patterns' - they fit parameters to data. That's all really - take the buzzwords out, and its mathematically the sibling of a spline function. Or Fourier coefficients. And like pretty much every fit function, they come with their own pitfalls.

Enjo · Jul 14, 2019

RisingFury said:
I wouldn't!

Invest in knowledge of parallel computing. You don't have to do it straight away. I can already tell you that C++ is hundreds of times faster than Python and that a GPU can speed up certain types of neural networks about tens of times.

I guess everybody is on their own.

Urwumpe · Jul 15, 2019

Enjo said:
I guess everybody is on their own.

I am sure there is a linear algebra package that could solve a huge sparse matrix on a GPU in a short time... but I don't remember which one it was that I saw some years ago.

What about this one?

http://viennacl.sourceforge.net/viennacl-about.html

What has a BLAS level should not be bad.

Zatnikitelman · Jul 16, 2019

Um, I hope it's ok to post here, things were going ok until I realized I needed to normalize my data and I'm a bit stuck. I created a stackoverflow question that sums up my issue. It's probably something stupid I'm just missing. https://stackoverflow.com/questions/57023871/how-to-normalize-test-data-for-use-in-keras

Enjo · Jul 31, 2019

Urwumpe said:
I am sure there is a linear algebra package that could solve a huge sparse matrix on a GPU in a short time... but I don't remember which one it was that I saw some years ago.

What about this one?

http://viennacl.sourceforge.net/viennacl-about.html

What has a BLAS level should not be bad.

There's actually something high level already available:
http://blog.dlib.net/2016/06/a-clean-c11-deep-learning-api.html

dlib said:
Dlib 19.0 is out and it has a lot of new features, like new elastic net and quadratic program solvers. But the feature I'm most excited about is the new deep learning API. There are a lot of existing deep learning frameworks, but none of them have clean C++ APIs. You have to use them through a language like Python or Lua, which is fine in and of itself. But if you are a professional software engineer working on embedded computer vision projects you are probably working in C++, and using those tools in these kinds of applications can be frustrating.

So if you use C++ to do computer vision work then dlib's deep learning framework is for you. It makes heavy use of C++11 features, allowing it to expose a very clean and lightweight API

although I find this scary anyway:

dlib said:
I've also included a pretrained ResNet34A model and this example shows how to use it to classify images. This pretrained model has a top5 error of 7.572% on the 2012 imagenet validation dataset, which is slightly better than the results reported in the original paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun. Training this model took about two weeks while running on a single Titan X GPU.

Enjo · Aug 24, 2019

Another good example of a high level C++ or bare C library would be ArrayFire. Have a look at the examples here:
https://github.com/arrayfire/arrayfire/tree/master/examples/machine_learning

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures including CPUs, GPUs, and other hardware acceleration devices.

(...)

ArrayFire provides software developers with a high-level abstraction of data which resides on the accelerator, the af::array object. Developers write code which performs operations on ArrayFire arrays which, in turn, are automatically translated into near-optimal kernels that execute on the computational device.

ArrayFire is successfully used on devices ranging from low-power mobile phones to high-power GPU-enabled supercomputers. ArrayFire runs on CPUs from all major vendors (Intel, AMD, ARM), GPUs from the prominent manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety of other accelerator devices on Windows, Mac, and Linux.

Notice the bold text. The majority of time lost due to GPU usage is shifting data back and forth.

Orbiter 2024 has been released!

Neural network for sequence prediction

Zatnikitelman

Addon Developer

Urwumpe

Not funny anymore

Zatnikitelman

Addon Developer

Urwumpe

Not funny anymore

Zatnikitelman

Addon Developer

Zatnikitelman

Addon Developer

RisingFury

OBSP developer

Zatnikitelman

Addon Developer

Thorsten

Active member

Enjo

Mostly harmless

Zatnikitelman

Addon Developer

RisingFury

OBSP developer

Thorsten

Active member

Enjo

Mostly harmless

Urwumpe

Not funny anymore

Zatnikitelman

Addon Developer

Enjo

Mostly harmless

Enjo

Mostly harmless

Similar threads