VDSG Knowledgefeed - Natural Language Understanding and Feature Engineering

Sep 27, 2023 6:00 PM — Jun 2, 2022 8:30 PM
Wagramer Str.1 - 1220 Vienna
Wagramer Str.19, Vienna, Vienna 1220

We invite you to learn why deep neural networks are not always the best solution in natural language processing, and how to improve machine learning workflows and feature engineering using the transform design pattern.

Gábor Recski: Transparent Natural Language Understanding Transformer-based deep learning models have become the most widely used tool in natural language processing (NLP). When the goal is to extract structured information from text, nearly all solutions involve the training of such neural networks using human-annotated data and then using the resulting models directly for text processing. But the black box nature of these solutions greatly limit their applicability in domains that require transparency, predictability, or configurability. Rule-based solutions can offer all of these, but for complex domains they are difficult and costly to build and maintain. Our group at TU Wien has developed an approach to information extraction that uses human-in-the-loop (HITL) learning for the semi-automatic creation of rule-based solutions. Our tool allows domain experts to build white box solutions in highly technical domains such as legal or medical NLP. The method is based on graph-based representations of natural language syntax and semantics, which we will introduce together with 1-2 recent use cases. Gábor Recski is a computational linguist and a postdoctoral researcher at TU Wien. His research focuses on symbolic models of natural language semantics and their applications to information extraction tasks. He has published over 60 peer-reviewed papers with more than 400 citations.

Heinz Eckert: Elegant Feature Engineering with the Transform Design Pattern Inspired by insights from the book “Machine Learning Design Patterns” by Lakshmanan et al., this session will highlight how the Transform Design Pattern can improve your machine learning workflows. We will discuss how clearly separating raw input values from transformed features can yield significant advantages in model maintenance, code organization, and faster deployment. Using a hands-on example from the telco-industry, we will see how building custom transformation elements into your model pipeline can help you streamline your projects and improve overall efficiency. Heinz Eckert has worked as a data scientist in the telecommunications sector for over four years and currently leads Magenta’s data science team. Combining a background in psychology and computer science, he is passionate about delivering sustainable data applications and continuous improvement.

community building

We are an association promoting knowledge about data science as a nonprofit. We connect data scientists in Europe and all around the world. Our members are passionate data scientists from various areas of research and industry.