In this post we take a look at course projects from the Neural Networks course in the autumn semester 2017/2018. The course was given by Raul Vicente and assisted by Tambet Matiisen and Ardi Tampuu. The projects listed here are not ordered by quality, instead we try to list them in a way to demonstrate the diversity. Indeed, the topics of projects were amazingly diverse and as teachers we also learned a lot from these works.
Music generation from MIDI datasets
Moritz Hilscher, Novin Shahroudi
(Report)
The goal of this project was to generate music from scratch. The training data consisted of works of Bach and Mozart, written down in MIDI format. After lots of preprocessing (for example taking in consideration the tempo of music) and data augmentation, the authors trained GRU-based RNNs to generate music based on a short “burn-in” sample. The generated tunes that were played during the project presentation were unexpectedly harmonious and music-like (though maybe Bach and Mozart experts would disagree). They also validated that the generated music was not just memorized training samples. You can listen one example here:
You can find more samples on the project home page.
Toy Car Racing via Imitation Learning
Martin Liivak, Meri Liis Treimann, Markus Loide, Sebastian Värv
(Report)
Training a neural network to drive a car is one of the hottest topics in AI. Creating self-driving cars equipped with high-tech sensors costs fortunes, money that our students did not want to spend. So cooperated with RC Snail and got access to remotely controlled toy cars equipped with simple cameras. They used behavioral cloning meaning that they recorded videostream and the commands that the car received while driven by human driver. Then they trained neural network to imitate the commands of the driver based solely on the video feed from the camera. With this approach they created a network that can guide the toy car a whole lap on the training circuit, albeit with multiple crashes against the walls. Both team and course instructors would like to thank Rainer Paat from RC Snail for giving excellent opportunity for the students.
Generating poetry using deep neural networks
Tanel Kiis, Markus Kängsepp
(Report)
In this work the students set out to generate poetry. Using online databases of poems (some of which are not very high artistic quality) and character and sub-word level recurrent neural networks, the authors managed to generate quite a few poem-like pieces of text. The models learned to switch lines and to end the poem, but they didn’t figure out the 5-7-5 rule of haikus. See these examples (handpicked from among many meaningless “poems”):
Learning 2048 using AlphaGo Zero
Karl Allik, Raul-Martin Rebane, Robert Sepp, Lembit Valgma
(Report) (Code)
The team used well-known AlphaGo Zero algorithm to solve 2048 game. They struggled to get the self-play algorithm working, but successfully trained neural network policy to imitate hard-coded algorithm. The best networks managed to get 2048 in 90% of time with average score of 31000.
You can see the algorithm playing here (click Auto-run).
ICLR Reproducibility Challenge: Synthesizing Robust Adversarial Examples
Prabhant Singh
(Report) (Code)
International Conference on Learning Representations invites people to try to reproduce the results of submitted articles. This is a noble goal to assure that the science in them is solid, reproducible and generalizes beyond the dataset the authors used. Robust adversarial examples are images (or 3D objects) that confuse the object-recognition networks. For example, there is a 3D-printed turtle with specific colour and texture that the networks consistently classify as rifle, no matter the angle or light conditions of the photo. Prabhant took up the challenge to generate similar adversarial examples. In particular he created images that would confuse the object-recognition networks no matter the rotation, zoom or crop (see a rotated cat that is classified as guacamole on the side figure). He also tested the transferability of adversarial examples, i.e. if adversarial images created for ResNet50 is also adversarial against VGG19. One interesting take-away from the project was that simpler networks like VGG16 were more resilient to transfer of adversarial attacks.
First impression based personality analysis
Jelena Gorbova
(Report)
Psychologists claim it’s possible to describe people’s personalities using just five axes (extroversion, neuroticism, openness, agreeableness, and conscientiousness). It sounds kind of dystopian to judge people based on these five numbers, but that’s exactly what HR departments of companies wish to do. The goal of this work is to investigate the possibility to estimate the scores of these five personality traits based on a 10 second video clip of a person. In a similar way “employability” could be predicted. Using key-frame selection, face detection and pre-trained VGG-face model Jelena achieved results that are comparable with best performing solutions in an online challenge.
Towards Emergence of Grid Cells by Deep Reinforcement Learning
Aqeel Labash, Daniel Majoral, Martin Valgur
(Report)
Brain is the smartest machine there is. If we have an idea how brain does some computations, we should aim to make our computers do the same things. In visual processing it has been shown that the brain and AI uses somewhat similar approach to recognizing objects. We know how the brain encodes location, so it is natural to ask if the AI agents (trained by deep RL) have similar space-representations as us. In this work the students investigated if place or grid-cell like neurons appear in artificial neural networks when asking it to solve a navigation task. Unfortunately they could not confirm the existence of grid cells (the obtained place-fields are shown on figure), but the effort was earnest and the search continues.
2D Racing Game with Reinforcement Learning
Henry Teigar, Miron Storožev, Janar Saks
(Report)
The students set out to train an agent to drive a car in a video game via deep reinforcement learning. They used two racing environments. First environment was coded by themselves – with distance sensors helping the car to figure out the obstacles and better control over the rewards that the RL agent receives. The other environment was a benchmark racing game from OpenAI Gym. Whereas in their own environment the RL learner managed to drive reasonably well, the Gym environment (with only visual input) proved to be quite challenging. They were able to learn it only with imitation learning from human demonstrations.
Replicating DCGAN
Liis Kolberg, Mari-Liis Allikivi
(Report)
Generating images is a very challenging task, but to create true AI it seems an important stepping stone (see well-known Richard Feynman quote: “What I cannot create, I do not understand”). The students set out to generate images of classrooms, inspired by existing work on generating images of bedrooms using deconvolutional generative adversarial networks (DCGAN). Due to low quality of the dataset, the generated classroom scenes were somewhat less impressive than the bedrooms created in the reference work. Nevertheless, taming GANs is nontrivial and the authors can be proud of their work.
Smart alarm clock powered by Deep Learning
Artem Bachynskyi, Maksym Sukhorukhov
(Report)
It is known that how tired one feels when waking up in the morning depends on the sleep-phase that the person was in while the alarm clock rang. The long term goal of the authors is to create an alarm clock that uses wrist-worn accelerometer data to decide which sleep stage the person is currently experiencing and if it is a good time to wake the person up. In the scope of this course project they investigated how well can the sleep-phase be predicted based on accelerometer data. While the overall accuracy was only 50% with the best model, REM-phase (best moment for waking up) was detected with 70% precision. Is it enough for an alarm clock? We let the clients decide.
Kaggle: CDiscount Image Classification
Team 1: Anton Potapchuk, Vladyslav Fediukov, Maksym Semikin, Viktor Mysko
(Report)
Team 2: Ardi Loot
(Report) (Code)
Team 3: Yurii Toma, Oliver-Matis Lill
(Report)
Three teams participated in CDiscount Image Classification Kaggle competition. This competition challenged the teams with huge dataset – the training set was 58.2GB of images from 5720 categories. This really pushed the limits of our HPC infrastructure and especially the limited amount of GPUs. The teams mostly relied on pre-trained models in Keras and made use of generators and multi-gpu training. In particular, Anton, Valdyslav, Maksym and Viktor ensembled models based on ResNet50, ResNet101, InceptionV3, InceptionResNetV2 and Inception3 pre-trained models. Their final accuracy was 0.73083, which got them 53rd place out of 627. Ardi used Xception model and implemented gradient accumulation in Keras to work with bigger batch sizes. His final accuracy was 0.72582, which got him 64th place out of 627. Yurii and Oliver-Matis trained ResNet50, Xception and Inception based models. The best validation result was 0.73 with Xception, but they did not manage to submit it in time, so no official standing. In general all teams did a good work, given massive dataset and limited amount of GPUs.
Music generation from MIDI datasets
Moritz Hilscher, Novin Shahroudi
(Report)
The goal of this project was to generate music from scratch. The training data consisted of works of Bach and Mozart, written down in MIDI format. After lots of preprocessing (for example taking in consideration the tempo of music) and data augmentation, the authors trained GRU-based RNNs to generate music based on a short “burn-in” sample. The generated tunes that were played during the project presentation were unexpectedly harmonious and music-like (though maybe Bach and Mozart experts would disagree). They also validated that the generated music was not just memorized training samples. You can listen one example here:
You can find more samples on the project home page.
Toy Car Racing via Imitation Learning
Martin Liivak, Meri Liis Treimann, Markus Loide, Sebastian Värv
(Report)
Training a neural network to drive a car is one of the hottest topics in AI. Creating self-driving cars equipped with high-tech sensors costs fortunes, money that our students did not want to spend. So cooperated with RC Snail and got access to remotely controlled toy cars equipped with simple cameras. They used behavioral cloning meaning that they recorded videostream and the commands that the car received while driven by human driver. Then they trained neural network to imitate the commands of the driver based solely on the video feed from the camera. With this approach they created a network that can guide the toy car a whole lap on the training circuit, albeit with multiple crashes against the walls. Both team and course instructors would like to thank Rainer Paat from RC Snail for giving excellent opportunity for the students.
Generating poetry using deep neural networks
Tanel Kiis, Markus Kängsepp
(Report)
In this work the students set out to generate poetry. Using online databases of poems (some of which are not very high artistic quality) and character and sub-word level recurrent neural networks, the authors managed to generate quite a few poem-like pieces of text. The models learned to switch lines and to end the poem, but they didn’t figure out the 5-7-5 rule of haikus. See these examples (handpicked from among many meaningless “poems”):
Haiku: daybreak a cold snow falls into the sound of the bay
|
Poem: Fatherful and broken throne. The sight of the conqueror seems; The spring is stiff and strong, And the small feet of things are still enlightened, And these his ears of soul in the sun ‘sing. The wind came on to see, he looks back, In the hills Where the boys can’t have a stream of space
|
Karl Allik, Raul-Martin Rebane, Robert Sepp, Lembit Valgma
(Report) (Code)
The team used well-known AlphaGo Zero algorithm to solve 2048 game. They struggled to get the self-play algorithm working, but successfully trained neural network policy to imitate hard-coded algorithm. The best networks managed to get 2048 in 90% of time with average score of 31000.
You can see the algorithm playing here (click Auto-run).
ICLR Reproducibility Challenge: Synthesizing Robust Adversarial Examples
Prabhant Singh
(Report) (Code)
International Conference on Learning Representations invites people to try to reproduce the results of submitted articles. This is a noble goal to assure that the science in them is solid, reproducible and generalizes beyond the dataset the authors used. Robust adversarial examples are images (or 3D objects) that confuse the object-recognition networks. For example, there is a 3D-printed turtle with specific colour and texture that the networks consistently classify as rifle, no matter the angle or light conditions of the photo. Prabhant took up the challenge to generate similar adversarial examples. In particular he created images that would confuse the object-recognition networks no matter the rotation, zoom or crop (see a rotated cat that is classified as guacamole on the side figure). He also tested the transferability of adversarial examples, i.e. if adversarial images created for ResNet50 is also adversarial against VGG19. One interesting take-away from the project was that simpler networks like VGG16 were more resilient to transfer of adversarial attacks.
First impression based personality analysis
Jelena Gorbova
(Report)
Psychologists claim it’s possible to describe people’s personalities using just five axes (extroversion, neuroticism, openness, agreeableness, and conscientiousness). It sounds kind of dystopian to judge people based on these five numbers, but that’s exactly what HR departments of companies wish to do. The goal of this work is to investigate the possibility to estimate the scores of these five personality traits based on a 10 second video clip of a person. In a similar way “employability” could be predicted. Using key-frame selection, face detection and pre-trained VGG-face model Jelena achieved results that are comparable with best performing solutions in an online challenge.
Towards Emergence of Grid Cells by Deep Reinforcement Learning
Aqeel Labash, Daniel Majoral, Martin Valgur
(Report)
Brain is the smartest machine there is. If we have an idea how brain does some computations, we should aim to make our computers do the same things. In visual processing it has been shown that the brain and AI uses somewhat similar approach to recognizing objects. We know how the brain encodes location, so it is natural to ask if the AI agents (trained by deep RL) have similar space-representations as us. In this work the students investigated if place or grid-cell like neurons appear in artificial neural networks when asking it to solve a navigation task. Unfortunately they could not confirm the existence of grid cells (the obtained place-fields are shown on figure), but the effort was earnest and the search continues.
2D Racing Game with Reinforcement Learning
Henry Teigar, Miron Storožev, Janar Saks
(Report)
The students set out to train an agent to drive a car in a video game via deep reinforcement learning. They used two racing environments. First environment was coded by themselves – with distance sensors helping the car to figure out the obstacles and better control over the rewards that the RL agent receives. The other environment was a benchmark racing game from OpenAI Gym. Whereas in their own environment the RL learner managed to drive reasonably well, the Gym environment (with only visual input) proved to be quite challenging. They were able to learn it only with imitation learning from human demonstrations.
Replicating DCGAN
Liis Kolberg, Mari-Liis Allikivi
(Report)
Generating images is a very challenging task, but to create true AI it seems an important stepping stone (see well-known Richard Feynman quote: “What I cannot create, I do not understand”). The students set out to generate images of classrooms, inspired by existing work on generating images of bedrooms using deconvolutional generative adversarial networks (DCGAN). Due to low quality of the dataset, the generated classroom scenes were somewhat less impressive than the bedrooms created in the reference work. Nevertheless, taming GANs is nontrivial and the authors can be proud of their work.
Smart alarm clock powered by Deep Learning
Artem Bachynskyi, Maksym Sukhorukhov
(Report)
It is known that how tired one feels when waking up in the morning depends on the sleep-phase that the person was in while the alarm clock rang. The long term goal of the authors is to create an alarm clock that uses wrist-worn accelerometer data to decide which sleep stage the person is currently experiencing and if it is a good time to wake the person up. In the scope of this course project they investigated how well can the sleep-phase be predicted based on accelerometer data. While the overall accuracy was only 50% with the best model, REM-phase (best moment for waking up) was detected with 70% precision. Is it enough for an alarm clock? We let the clients decide.
Kaggle: CDiscount Image Classification
Team 1: Anton Potapchuk, Vladyslav Fediukov, Maksym Semikin, Viktor Mysko
(Report)
Team 2: Ardi Loot
(Report) (Code)
Team 3: Yurii Toma, Oliver-Matis Lill
(Report)
Three teams participated in CDiscount Image Classification Kaggle competition. This competition challenged the teams with huge dataset – the training set was 58.2GB of images from 5720 categories. This really pushed the limits of our HPC infrastructure and especially the limited amount of GPUs. The teams mostly relied on pre-trained models in Keras and made use of generators and multi-gpu training. In particular, Anton, Valdyslav, Maksym and Viktor ensembled models based on ResNet50, ResNet101, InceptionV3, InceptionResNetV2 and Inception3 pre-trained models. Their final accuracy was 0.73083, which got them 53rd place out of 627. Ardi used Xception model and implemented gradient accumulation in Keras to work with bigger batch sizes. His final accuracy was 0.72582, which got him 64th place out of 627. Yurii and Oliver-Matis trained ResNet50, Xception and Inception based models. The best validation result was 0.73 with Xception, but they did not manage to submit it in time, so no official standing. In general all teams did a good work, given massive dataset and limited amount of GPUs.