April 22-23

Kyiv 2017


Data Fest is a conference that unites researchers, engineers and developers associated with data science and related areas.

  • Two-day program: both lectures and a workshop
  • Eight speakers. Four countries. State-of-the-art results and cutting-edge technologies
  • Broad coverage of topics, non-overlapping domains
  • Sentiment analysis "from scratch": through data preparation to deep neural networks
  • An excellent audience and even more networking within the data science community

Both days of Data Fest will be held in the Ciklum Kyiv office, in the Sky Point, 19th floor.





April 22

A conference day with a practical theory and its application to real-world cases.

  • Sergey

  • 10:00 And Why Did You Make This Prediction, Machine?

    Often, users want to know why an algorithm made a specific prediction. Why do we need to listen to those requests? And how can we make “black box model” predictions easier to understand?

  • Sergey

  • 11:00 Face Recognition and Search at Scale

    The talk will discuss the current state of face recognition technology, common pipelines and the nuts and bolts of building robust, large-scale face recognition systems like the FindFace app.

  • Alex

  • 12:00 Time-series Analysis in the ML Era: an Overview

    Forecasting has been the go-to application for several generations of statisticians, machine learners and domain specialists. We will start the day with a review of the most popular methods and approaches for time-series analysis, and will discuss modern models used in practice. In particular, we will play around with different feature engineering approaches used for forecasting-like problems in casual machine learning tasks: financial predictions, industrial monitoring, neuro-prosthetics and a few others.

    A coffee break

  • You

  • 13:00 Open Mic

    The aim of this brief session is to expand your data science connections. It is an excellent opportunity for everybody to present themselves to the audience and say a couple of words on their projects and interests related to data science. The presentation format is a two-minute talk and a one-minute question-answer part.

    A lunch

  • Yuriy

  • 15:00 Automatic Annotation of Speakers in Phone Conversations

    Indexing and searching multimedia content is becoming increasingly important for many analytical tasks. One such technique is speaker diarization, the purpose of which is to answer the question "Who spoke when?" without any a priori information about the speakers present in the audio recording. In this talk, we'll discuss the approaches and challenges of speaker diarization for phone conversations.

  • Roman

  • 16:00 ML Test Rubric

    “ML test rubruc: how to handle with machine learning in production

    A coffee break

  • Orest

  • 17:00 Motion Deblurring Using Generative Adversarial Networks

    We will present a new GAN-based solution to the problem of image motion deblurring. We’ll briefly talk about existing solutions in blind image debarring and discuss their pros and cons. Also, we will describe multi-scale CNN architecture and give a quick overview of the Wasserstein Generative Adversarial Network model, which claims to solve most of the problems related to GAN training. Finally, we’ll discuss new synthetic blur methods which help to overcome the problem with limited datasets and achieve state-of-the-art performance.

  • Grammarly

  • 18:00 Neural Attention Mechanism in NLP Applications

    In this talk, we will showcase a neural attention mechanism. It helps to achieve state-of-the-art results on a number of sequence-to-sequence modelling tasks. We will briefly overview modern NLP pipeline in deep learning and emphasize some limitations that the attention mechanism helps to overcome. Also, along the way, we will present the results of applying attention in machine translation and text entailment tasks.


    April 23

    A workshop day with a lot of practice and a small competition at the end.

  • Oleksandr

  • 10:00 Collecting Data

    1. Business problem identification
    2. Web scrapping. Pitfalls and possible problems
    3. Basic analysis of parsed data

  • Natalia

  • 10:30 Data Preprocessing. Fundamentals

    1. Purpose of data preprocessing
    2. Python libraries for NLP tasks. NLTK, Pattern, Spacy, Textblob
    3. Preprocessing features overview

    A coffee break

  • Fred

  • 11:00 Feature Extraction Techniques. Advanced

    1. Bag-of-Words approach. Hashing. TF-IDF
    2. Custom problem-related features

  • Oleksandr

  • 11:30 Data Preprocessing. Word Embeddings

    1. Word2Vec models and examples
    2. Brief overview of GloVe and Hellinger PCA

    A coffee break

  • Evgeny

  • 12:00 Metrics

    1. Classification metrics overview
    2. Appropriate metrics choice

  • Sergii

  • 12:30 Linear Models I

    1. Naive Bayes
    2. Passive Aggressive Classifier

  • Oleksandr

  • 13:00 Linear Models II

    1. Logistic Regression
    2. SVM

    A lunch

  • Valentina

  • 14:30 Non-linear Algorithms

    1. Tree-based algorithms. Decision trees, random forest
    2. Boosting. XGBoost, LightGbm

  • Andrii

  • 15:00 FastText

    1. FastText overview
    2. Data preprocessing for FastText
    3. Parameters tunning

    A coffee break

  • Ievgen

  • 15:30 Word-based LSTM and Recursive Neural Networks

    1. LSTM, Bidirectional LSTM
    2. LSTM with attention
    3. Recursive neural networks

  • Iryna

  • 16:00 Word-based CNN

    1. CNN intro
    2. CNN for text classification

  • Vitaliy

  • 16:30 Char-RNN and Char-CNN

    1. Data preprocessing for char-based models
    2. Char-RNN and Char-CNN training
    3. More power with char-based models

    A coffee break

  • Kaggle

  • 17:00 Competition

    To sum up the knowledge obtained during the workshop, a brief competition will be introduced for all the participants, so that they have one more wonderful opportunity to practice and get even more fun

    Other contributors


    Participation is absolutely free! However, you have to register to pass in, hurry up!

    Our location


    The FAQ section is incomplete but is being updated on a regular basis (using stochastic gradient descent, he-he )

    Do I have to register to attend Data Fest?

    You bet! If there are too many registrations, we will have to select N-best candidates based on their questionnaires to be sure that participation will bring you maximum benefits.

    Does the Facebook page's check-in count as registration?

    Nope, it does not count. Only those with Ciklum's confirmation are going to be considered for an invitation.

    Are there any registration fees?

    No, both the lectures and the workshop are free for all participants. There are no registration charges. But you have to register.

    What if I got a refusal?

    Unfortunately, you will not be able to attend the event onsite. Still, don't be upset—we will have all our presentations broadcasted! Moreover, feel free to join in live discussion via the Open Data Science #Slack channels

    When will I receive an event invitation?

    It will take some time for organizers to review all applications, but the process can be long, so please be patient. The first half of the invitations will be sent two weeks before the event and the rest approximately one week before the event start date. Also, several days before the event, we will contact all participants to receive their attendance confirmation. In the unlikely case that somebody declines their participation, we will give their invitation to the people at the top of the waiting list.

    What are the prerequisites for the conference and workshop?

    Ideally, we would like you to have middle+ level of experience for the conference because we will try to cover more advanced topics. As for the workshop, you are required to be comfortable working in Jupiter Notebooks with Python 3. Knowledge of numpy and scikit-learn will be a plus. It is NOT required to have prior knowledge of Keras, Tensorflow or other deep learning tools in order to attend the workshop. In the invitation letter, you will be provided with detailed instructions on how to install necessary software and other important details about the workshop.

    What will be the language of the event?

    During the conference, the speakers will present their slides in English and will use whatever language is more convenient for them (Russian mostly). The language of the workshop will be Russian with notebooks in English.

    What are the nearest nice places to have lunch?

    Fortunately for you, you will be served lunch during both days of the event. The exact menu will be available several days before the event and we will let you know shortly after.

    Will the conference materials be available later?

    Yes! The presentation videos and slides will be fully available in 2-3 weeks. For example, all the materials from the previous Data Fest meetings can be found at the Youtube channel Компьютерные науки and Mail.Ru website.

    So, you said all your speeches will be broadcasted?

    For sure! The whole conference and workshop will be broadcasted online. A live webcast link will be available on our site, as well as via our social media pages (Facebook, Slack).

    If there are not enough seats, what will be the application selection process?

    The main criteria we rely on are your experience in data science, as well as your motivation to attend the event.


    Who are they? What do they want from me?