Data Fest is a conference that unites researchers, engineers and developers associated with data science and related areas.
Former 1 on Kaggle
A conference day with a practical theory and its application to real-world cases.
Often, users want to know why an algorithm made a specific prediction. Why do we need to listen to those requests? And how can we make “black box model” predictions easier to understand?
The talk will discuss the current state of face recognition technology, common pipelines and the nuts and bolts of building robust, large-scale face recognition systems like the FindFace app.
Forecasting has been the go-to application for several generations of statisticians, machine learners and domain specialists. We will start the day with a review of the most popular methods and approaches for time-series analysis, and will discuss modern models used in practice. In particular, we will play around with different feature engineering approaches used for forecasting-like problems in casual machine learning tasks: financial predictions, industrial monitoring, neuro-prosthetics and a few others.
A coffee break
The aim of this brief session is to expand your data science connections. It is an excellent opportunity for everybody to present themselves to the audience and say a couple of words on their projects and interests related to data science. The presentation format is a two-minute talk and a one-minute question-answer part.
Indexing and searching multimedia content is becoming increasingly important for many analytical tasks. One such technique is speaker diarization, the purpose of which is to answer the question "Who spoke when?" without any a priori information about the speakers present in the audio recording. In this talk, we'll discuss the approaches and challenges of speaker diarization for phone conversations.
“ML test rubruc: how to handle with machine learning in production
A coffee break
We will present a new GAN-based solution to the problem of image motion deblurring. We’ll briefly talk about existing solutions in blind image debarring and discuss their pros and cons. Also, we will describe multi-scale CNN architecture and give a quick overview of the Wasserstein Generative Adversarial Network model, which claims to solve most of the problems related to GAN training. Finally, we’ll discuss new synthetic blur methods which help to overcome the problem with limited datasets and achieve state-of-the-art performance.
In this talk, we will showcase a neural attention mechanism. It helps to achieve state-of-the-art results on a number of sequence-to-sequence modelling tasks. We will briefly overview modern NLP pipeline in deep learning and emphasize some limitations that the attention mechanism helps to overcome. Also, along the way, we will present the results of applying attention in machine translation and text entailment tasks.
A workshop day with a lot of practice and a small competition at the end.
1. Business problem identification
2. Web scrapping. Pitfalls and possible problems
3. Basic analysis of parsed data
1. Purpose of data preprocessing
2. Python libraries for NLP tasks. NLTK, Pattern, Spacy, Textblob
3. Preprocessing features overview
A coffee break
1. Bag-of-Words approach. Hashing. TF-IDF
2. Custom problem-related features
1. Word2Vec models and examples
2. Brief overview of GloVe and Hellinger PCA
A coffee break
1. Classification metrics overview
2. Appropriate metrics choice
1. Naive Bayes
2. Passive Aggressive Classifier
1. Logistic Regression
1. Tree-based algorithms. Decision trees, random forest
2. Boosting. XGBoost, LightGbm
1. FastText overview
2. Data preprocessing for FastText
3. Parameters tunning
A coffee break
1. LSTM, Bidirectional LSTM
2. LSTM with attention
3. Recursive neural networks
1. CNN intro
2. CNN for text classification
1. Data preprocessing for char-based models
2. Char-RNN and Char-CNN training
3. More power with char-based models
A coffee break
To sum up the knowledge obtained during the workshop, a brief competition will be introduced for all the participants, so that they have one more wonderful opportunity to practice and get even more fun
The FAQ section is incomplete but is being updated on a regular basis (using stochastic gradient descent, he-he )
Do I have to register to attend Data Fest?
You bet! If there are too many registrations, we will have to select N-best candidates based on their questionnaires to be sure that participation will bring you maximum benefits.
Does the Facebook page's check-in count as registration?
Nope, it does not count. Only those with Ciklum's confirmation are going to be considered for an invitation.
Are there any registration fees?
No, both the lectures and the workshop are free for all participants. There are no registration charges. But you have to register.
What if I got a refusal?
Unfortunately, you will not be able to attend the event onsite. Still, don't be upset—we will have all our presentations broadcasted! Moreover, feel free to join in live discussion via the Open Data Science #Slack channels
When will I receive an event invitation?
It will take some time for organizers to review all applications, but the process can be long, so please be patient. The first half of the invitations will be sent two weeks before the event and the rest approximately one week before the event start date. Also, several days before the event, we will contact all participants to receive their attendance confirmation. In the unlikely case that somebody declines their participation, we will give their invitation to the people at the top of the waiting list.
What are the prerequisites for the conference and workshop?
Ideally, we would like you to have middle+ level of experience for the conference because we will try to cover more advanced topics. As for the workshop, you are required to be comfortable working in Jupiter Notebooks with Python 3. Knowledge of numpy and scikit-learn will be a plus. It is NOT required to have prior knowledge of Keras, Tensorflow or other deep learning tools in order to attend the workshop. In the invitation letter, you will be provided with detailed instructions on how to install necessary software and other important details about the workshop.
What will be the language of the event?
During the conference, the speakers will present their slides in English and will use whatever language is more convenient for them (Russian mostly). The language of the workshop will be Russian with notebooks in English.
What are the nearest nice places to have lunch?
Fortunately for you, you will be served lunch during both days of the event. The exact menu will be available several days before the event and we will let you know shortly after.
Will the conference materials be available later?
Yes! The presentation videos and slides will be fully available in 2-3 weeks. For example, all the materials from the previous Data Fest meetings can be found at the Youtube channel Компьютерные науки and Mail.Ru website.
So, you said all your speeches will be broadcasted?
If there are not enough seats, what will be the application selection process?
The main criteria we rely on are your experience in data science, as well as your motivation to attend the event.