Technical Solution Specialist for Parallel Data Warehouse at Microsoft
<p>Stefan has a Phil D in Astrophysics. After University he joined Siemens Communications where he spent 17 years in different roles, mostly in Munich, but also in Silicon Valley. Starting in the Central Laboratories, he held positions in Marketing, Systems Engineering, Assistant to the Board of Directors, Strategic Investment, Product Line Management and Partner Management. In 2007 he joined a small Software and Analysis company where he held a position in Systems Engineering and headed the Department for Advanced Analytics. In 2012 he joined Microsoft and is responsible for the Introduction of Microsoft’s Big-Data appliance, the Analytics Platform System, in Central and Eastern Europe.</p>
Engineer, CTO and VP, Technical Leadership, IBM CEE
Technical Solutions Professional at Microsoft
Petra Korica-Pehserl is working in technical pre-sales and consulting for Application Platform (databases, business intelligence and Azure) for the CEE Multi-Country region (24 countries across CEE). Petra has rich international experience and has worked for Microsoft in Austria and France before joining CEE region. She has a software development and AI background.
Technical Account Manager at Hewlett-Packard
<p><span>Andrius Markvaldas has over 20 years of information technology and data management experience on a variety of different platforms. Prior to joining HP, Andrius defined enterprise data strategies, as well as, built and managed world-class data capabilities for proprietary trading firms. His broad experience includes engineering large-scale multi-petabyte data systems with extreme performance requirements. </span></p> <p><span>He is currently in presales and solution architecture with the HP Software’s Big Data Platform group. His focus is in full lifecycle development and operation of Big Data systems in both financial services and social media advertising.</span></p> <p><span>He holds a M.S. degree in Distributed Systems from DePaul University and is pursuing a M.S. degree in Predictive Analytics from Northwestern University.</span></p>
Founder & Core Developer at iTraff Technology and Programa.pl
<p>Former PhD student in the doctoral program of "Theoretical and Practical Issues of Computing Science". Prefers practical problems solving rather than theoretical considerations. Interested in new technologies, especially in the area of databases and knowledge discovery from the data.<br />One of the core engineers of the fastest image recognition technology. Finalist of the poll "Poles with Verve" in the category "Innovation in Business". Honorary blood donor, runner and reader.</p>
Director of Product and Data Science at Exacaster
<p>Egidijus is a co-founder at Exacaster and spent his last 9 years working with leading telecoms and retailers in United States, Latin America and Baltic States. Most of his work is about figuring out how to unlock business opportunities using big data analytics, machine learning and data driven thinking. Though technological capabilities are more powerful than ever, companies are still struggling to leverage their data to create unfair business advantage, mostly because the lack of analytical education. Egidijus will be sharing ROMI calculation techniques and biggest challenges he encountered within the companies that are implementing state of the art analytical thinking in their daily business processes.</p>
Data Scientist at SmartRecruiters
<p>Katarzyna is a Data Scientist with scientific background in Physics. Shortly after finishing her PhD at the University of Science and Technology in Krakow she joined SmartRecruiters where she has been working since 2013.</p>
ERP division manager at Affecto Lietuva
Šarūnas has studied Software Engineering and has more than 10 years experience in different IT systems implementation projects, 7 years of which he worked with large scale ERP implementation projects for public and private companies. In 2008 he joined Affecto Lietuva as ERP implementation consultant, continued his career as Project management. Currently Šarūnas is managing ERP Implementation Division in Affecto Lietuva and studying MBA in International School of Management.
Co-founder at Reach.ly
<p><span>Ernest is founder and CEO of <a href="http://reach.ly/" target="_blank" title="http://reach.ly/">Reach.ly</a> - real-time behavioral analytics company for e-retailers. <a href="http://reach.ly/" target="_blank" title="Reach.ly">Reach.ly</a> team has developed real-time infrastructure and algorithms to understand site visitor context of a visit and potential outcome. Real-time also enables store owners to engage with visitors while they are on-site and influence behavior.</span></p>
Álvaro Hernández Tortosa
CEO at 8Kdata Technology
<p>Álvaro is a 36 year-old IT entrepreneur, based in Madrid, Spain. Founder and CTO at <a href="http://www.8kdata.com" target="_blank" title="www.8kdata.com">8Kdata</a>, a database R&D company, he spends most of his time working on the <a href="http://www.torodb.com" target="_blank" title="www.torodb.com">ToroDB</a> project, the first NoSQL-on-SQL database, a MongoDB-compatible database that runs on top of PostgreSQL. He is a passionate software developer and open source advocate. Álvaro is a Java software developer, member of JavaSpecialists.eu, but also a DBA, trainer and frequent lecturer at international conferences. He also founded the PostgreSQL Spanish User Group (<a href="http://www.postgrespaña.es" target="_blank" title="www.postgrespaña.es">www.postgrespaña.es</a>), nowadays the 4th largest PUG in the world, with around 500 members.</p>
Director of Developer Advocacy (EMEA) at Couchbase
Leading the developer relations team for Couchbase in EMEA, Matthew Revell has been been active in open source since the early 2000s. Before Couchbase he ran EMEA developer relations at Basho and before that worked with developer communities at Canonical.
Director and Senior Trainer at Data Miner
<p>Working with Big Data solutions, Cloudera Hadoop, Microsoft SQL Server Analysis Services, Data Mining and MDX for more than 12 years, Ernestas Sysojevas is a mentor and senior trainer at DATA MINER and one of the founders of this company. He has been working with database, Business Intelligence and CRM systems in a variety of industries, including pharmacy, retail and telecommunication sectors. Ernestas holds a Master of Computer Science and professional certifications such as Cloudera (CCDH), Cloudera (CCAH), MCDBA, MCITP SQL, MCSE and PMP. Currently Ernestas specializes in advanced training and consulting services in the ﬁeld of Big Data, Business Intelligence, with a focus on analytics. He is experienced at consulting, project delivery, mentoring, training, public speaking and providing hands-on labs at conferences. Ernestas is the main author and developer of the MDX and Data Mining courses provided by DATA MINER. His mission is to transfer his unique knowledge to a broader audience of busy professionals.</p>
Senior Software Engineer at Percona
<p>Laurynas is a senior software engineer and Percona Server lead whose primary interest is InnoDB performance. He joined Percona in March 2011 as a member of the Percona Server and Percona XtraBackup development teams.</p> <p>He's been programming using C and C++ for almost two decades. Laurynas' past industrial experience includes interning in Google as a compiler software engineer, and in academia he has contributed research of physical database indexes, including large-scale spatial models of the brain.</p> <p>Laurynas lives in Vilnius, Lithuania, with his wife.</p>
Data scientist at Vinted
<p>Data scientist at Vinted with 6+ years of experience in data analysis and a background in mathematics. Previously worked on stock forecasting, credit risk modelling. Organizer of Vilnius Big Data meetup.</p>
Software engineer at Vinted
<p>Ruby veteran. Passionate about programming languages, data and distributed systems. Love tackling complex issues.</p>
Chief Operating Officer at Adform
<p>Mats Persson joined Adform early in 2011 as Chief Operating Officer. He has more than 15 years of experience as a leader in the IT industry. Mats has a deep knowledge of the entire life cycle of software development, from sales and requirement specification to implementation, maintenance and support. <a href="https://vimeo.com/119107118" target="_blank"><br /></a></p>
Program Development Manager at Adform
Database expert with 15+ years of database design, development and optimization experience. Dionizas specializes in Microsoft and Sybase platforms but enjoys working with all platforms. Current areas of interest include scalable RDBMS systems and NoSQL development while Dionizas has a particular passion for working on benchmarking and high availability testing.
Senior Developer at Adform
Ramunas Balukonis joined Adform 4 years ago as a senior developer. With more than a decade spent in the database world, Ramunas has worked as a database administrator, ETL consultant and business intelligence developer. Currently, Ramunas leads the Vertica ROLAP team. He is responsible for maintaining a 100TB analytic database running under Vertica and Microsoft Analysis Services ROLAP. He focuses on optimizing queries, building ETL solutions and enlarging the cluster. Ramunas strives to provide developers with exciting resources to learn new technologies. He likes diving into database design where he is well versed in potential benefits, possible pitfalls, and can provide best practices and exciting ideas.
Software Architect at Adform
<p>Ramūnas Urbonas works as a software architect in Adform. However, he always introduces himself as a developer first. For the last 5 years he has been working with Big Data and warehouses. He started with Adform ETL process optimization and moved to Hadoop stack eventually. He is passionate about the topics on data availability and the related social responsibility. He is currently driving his team to build a Big Data Lake in Adform and make it as available as possible.</p>
Developer at Adform
Full stack developer with 5+ years of experience in the finance and digital advertising industries. Rokas started as a pure .Net developer, but has accumulated more widespread and general knowledge over the past several years and can now choose from a wide selection of technologies to identify the best-suited technology solutions to solve any problem. He is highly interested in micro services and system-wide monitoring.
CTO, Chief Architect at LeadBullet S.A.
<p><span>Mariusz Gil is architect and CTO focused on high performance and scalable web applications. Trainer, consulant and conference speaker. He has been working for several companies on PHP projects for millions of active users, from biggest social network and instant-messaging software in Poland to multi-billion PV content personalization and discovery platform. Mariusz is also member of 4Developers and PHPcon Programme Commitees and one of core members behind PHPers, open meetups for PHP developers in many cities in Poland. Big-data enthustiast and data-sciencist wannabe. After hours, biker and rock <span>guitarist.</span></span></p>
CEO at Ivinco
<p>Mindaugas is founder and CEO of a boutique consulting firm Ivinco Ltd. (<a href="http://ivinco.com/">ivinco.com</a>). Ivinco is international team specialising in Big Data, Search and Real-Time, High-Performance application/API development and backend performance consulting. We consult established businesses and partner up with startups at various stages helping them overcome complex technical issues. Ivinco works with couple of selected interesting projects a year focusing on partner success.</p>
Data-warehouse geek at Vinted
<p><span>Interested in programming languages, algorithms, databases and distributed systems.</span></p>
Senior Developer at Adform
Senior Developer at Adform
Registration & Coffee
Welcome to the Event
Dionizas Antipenkovas, Big Data Program Development Manager at Adform
Dionizas Antipenkovas, Adform and Mats Persson, Adform
How to Improve Product Using Data? Data Science at SmartRecruiters.
Katarzyna Senderowska-Zemła, SmartRecruiters
SmartRecruiters is the Hiring Success Platform providing everything that companies need to transform their recruiting into effective talent marketing and sales machines and hire the best candidates. It includes Recruitment Marketing, Collaborative Hiring and a Modern Platform that solves the unique customization, compliance, integration and analytics needs. In this talk we discuss several aspects of optimizing product by using data. We will start by looking at how best to search for patterns in customers behavior and usage data followed by how to make predictions based on historical data before ending with how to fine tune your UI based on your findings. At the end of the presentation, several interesting conclusions about recruiting drown from data will be shown.
Slaying the ROMI Monster with Big Data Analysis
Egidijus Pilypas, Exacaster
War stories on measuring Return On Marketing Investment for a leading retailers and telecoms.
Starting Big Data B2B Company from Scratch
Ernests Stals, reach.ly
Everyone is talking about Big Data nowadays and challenges to work with large amount of data. But how one goes from plain Big Data idea to actual company? Presentation will cover challenges, failures and hard lessons learned while building such company. Some parts will be applicable to project managers in larger companies starting Big Data initiatives.
Image Recognition on a Daily Basis
Marcin Szajek, iTraff Technology
Humans use their senses to gather data from world. There are multiple ways how we could interact with computers. Text search engines are pretty old - Google has 17 years. How about visual search?
We will present some practical solutions of image recognition technology to prove that in next few years visual search could be helping us in day-to-day life.
Rocky Road to Big Data Analytics
Jonas Jarutis and Saulius Grigaliūnas, Vinted
Vinted is an online lifestyle marketplace and a social network geared towards young women. We currently have 10 million members and we process and analyze up to 1 billion events daily. To overcome the limitations imposed by the implementation of our initial analytics' solution based on MySQL, we evaluated Hive, Impala and Scalding, finally arriving at a solution built on Spark, Kafka and HBase.
This is a lessons-learnt talk about our bumpy road to Big Data analytics on the Hadoop platform. We will cover our Kafka-based data ingestion pipeline, fact table preparation, data aggregation and talk about how all of this leads to a sub-second slicing of pre-aggregated data cubes. In addition, we will mention how our pipeline is reused for ad-hoc data analysis with the help of interactive notebooks.
Smart as Einstein (Adform Data API)
Rokas Balevicius, Adform
Big Data without access is just a very expensive set of ones and zeros. It is paramount for great reporting solution what your user is able to access data in a way that matters to him and not only access it, but access it simply and naturally.
API is a great solution to such a problem. But devil is in the details. Sounds easy until you understand that you have to support multiple data warehouse solutions and ad-hock data slices.
In this presentation I will share the experience and lessons learned from building and consuming such a data API (project code name - Einstein).
Developing a Database Server: Software Engineer's View
Laurynas Biveinis, Percona
Part one of this talk will take a look at the development of Percona Server, a MySQL drop-in replacement. To provide context, we will also cover the MySQL ecosystem including major forks, patches and users.
Part two will cover one of the major challenges faced when developing a database server: Defining and implementing the right feature set, given the resource constraints of a small development team and strong competition, while still achieving quality goals.
Part three will focus on performance and look at our strategy for bridging the gap between the peak performance numbers used in marketing graphs and actually executing a fast server in production. More specifically, this talk will hone in on strategies for how to deal with the hazards of stalls and performance variance.
Registration & Coffee
Welcome to the Event
Dionizas Antipenkovas, Big Data Program Development Manager at Adform
Dionizas Antipenkovas, Adform and Mats Persson, Adform
Refining ETL. Going realtime with Storm
Simonas Gelazevicius and Ernestas Vaiciukevičius, Adform
Sometimes serving reports with data as old as few hours or even minutes is not enough, clients want to see what happens in real time. Traditional batch based ETL (extract - transform - load) techniques, which served for years, are unable to cope with our current needs. In this talk we will tell about our journey from batch based ETL to stream processing. So, if you want to hear how we run cluster, manage resources with mesos and do stream processing with storm, come to this presentation.
Navigating through BigData Swamp
Ramunas Urbonas, Adform
This is a war story. It tells about a small victory in the war, that we‘re currently losing. The story starts like this: „Adform Big Data Lake was born as a small and beautiful pool. A single spring of fresh water was feeding the Lake. Time passed and more and more springs were bringing the precious water to the Lake. People of the Adform were very happy to see it. But one day terrible Evil started growing in the darkest depths of the Lake...“
Big Data Search with MySQL and Sphinx
Mindaugas Zukas, Ivinco
While talking about one of Ivinco's projects I will introduce Sphinx Search and MySQL as an efficient alternative to today's more traditional big data systems (Hadoop, Solr, etc.). this talk will describe our architecture decisions when building a scalable backend for a social media data search engine. Starting with MySQL/InnoDB cluster which now stores 120TB+ text data (sharding strategies for different types of data, adding large amounts of incoming data with low latency, ensuring high availability), I'll introduce Sphinx Search and it's capabilities (indexing strategies, configuration, advanced features, how Sphinx compares to ElasticSearch and Solr). Finally I'll outline how we monitor system health and ensure great system performance.
Journey to Vertica+ROLAP Solution
Ramunas Balukonis, Adform
Most companies still uses old-fashioned MS MOLAP solutions, wasting time on development, scaling, HW and licenses. In this session we'll talk about Adform's experiences transitioning to a Vertica+ROLAP solution. The talk will cover:
* Why we decided to start using disrupting technology instead of optimizing existing ones;
* When sexy SQL defeats elegant MDX;
* About the nightmare we faced before the transition and how we reconnected wagons from one running train into another. We'll also discuss resources benefits that resulted from the transition;
* Vertica+ROLAP optimization tips;
* How Vertica falls within the Hadoop data lake and our philosophy for Big Data disaster recovery.
Building a Simple, Flexible and Scalable Data-cubing Solution with Spark, Algebird and HBase
Vidmantas Zemleris, Vinted
Once Vinted.com (a peer-to-peer marketplace to sell, buy and swap clothes) grew larger, demanding more advanced analytics, we needed a simple, yet scalable and flexible data-cubing engine. The existing alternatives (e.g. Cubert, Kylin, Mondrian) seemed not to fit, being too complex or not flexible enough, so we ended up building our own with Spark. We'll present:
- how DataFrames have proven to be the most flexible tool for fact preparation and cube input (c.f. typesafe Parquet-Avro schemas)
- how we support multivalued dimensions
- how we use Algebird aggregators for defining and computing our metrics
- how simple it is to get good cubing performance by pre-aggregating input before cubing with help of Algebird aggregators that are Semigroup-additive for free
- our HBase key design and optimizations such as bulk-loading to HBase, and how we read the cube back from HBase
Registration & Coffee
Welcome to the Event
Microsoft’s Analytics Solutions Handling Big Data
Stefan Cronjaeger, Microsoft
The presentation introduces the Analytics Platform System (APS), a Massively Parallel Appliance for Big Data, built on a shared-nothing architecture. It is fully integrated with other products of Microsoft, i.e., Azure, the BI stack, ETL, and Complex Event Processing.
Columnar Storage and Polybase will be presented: Columnar storage is a technology with a lot of optimization with respect to reporting and analytics. Polybase is a query extension which allows for simultaneously querying data stored in the data base as well as files on Hadoop, both via T-SQL. This allows for the application of BI tools to information stored in Hadoop.
The role of APS in large analytic projects will be discussed looking into a real-life example.
ToroDB: NoSQL on SQL
Álvaro Hernández Tortosa, 8Kdata
In the last 5-10 years, the industry have witnessed how dozens of new NoSQL databases emerge, bringing topics such as schema-less and scaling to buzzwords, hot topics. These NoSQL databases have taken a different approach to solving current scaling and Big Data problems, sometimes offering niche products, sometimes innovating on a given aspect, sometimes taking compromises to their CAP-compliance. However, and surprisingly to some, NoSQL databases share at least one common pattern: they were all built from scratch. Their storage engines, replication techniques, journaling, ACID support (if any), they were all coded from zero.
However, these are among the most complex problems in the software industry, yet they were implemented without leveraging the previously existing state of the art. From an engineering perspective, this is not what we all have been told: DRY. Wouldn't it be possible to construct a NoSQL database by layering it on top of a relational database? Wouldn't it be possible to "tune" a relational database to behave as a NoSQL database, so as to easily focus on being schema-less, scalable and anything else needed, but without re-inventing the wheel on "basic" stuff such as journaling or durability?
Enter Toro DB. Toro DB is an open source project that behaves as a NoSQL database but runs on top of PostgreSQL, one of the most respected and reliable relational databases. ToroDB offers a document interface, and implements the MongoDB wire protocol, hence being compatible with existing MongoDB drivers and applications. But ToroDB stores data on PostgreSQL - something which is transparent to database clients. But rather than storing JSON documents as a blob or using PostgreSQL 9.4's fantastic jsonb data type. ToroDB explored an innovative approach by transforming document data to a relational representation in a fully automated way - that does not require user intervention or configuration. the benefits of storing document data as relational are quite significant.
Knowledge Discovery – HP Big Data Platform
Andrius Markvaldas, HP
This presentation will give an overview and provide examples on two components of HP Predictive Analytics platform: HP Vertica and HP Distributed R. Distributed R is an open-source, scalable and high-performance engine for the R language. Designed for data scientists, HP Distributed R accelerates large-scale machine learning, statistical analysis, and graph processing. The secret is in how HP Distributed R splits tasks between multiple processing nodes to vastly reduce execution time and enables users to analyze much larger data sets.
Best of all, HP Distributed R retains the familiar R look and feel, and data scientists can continue to use their existing statistical packages. The Vertica Analytics Platform is easy to use and deploy, so users across an organization (not just DBAs) can get up and running quickly and immediately analyze mission-critical data. As a distributed shared-nothing database, Vertica has built in statistical and data mining tools that can be leveraged using the SQL language as well as have extensibility to scale your existing R code.
Global Technology Outlook 2015 – Data Transforming Industries
Mike Starkey, IBM Waston Analytics
Introduction to Spark - Next Generation Data Processing Framework in Hadoop
Ernestas Sysojevas, Data miner
With its popularity, development ease, and performance benefits, Apache Spark is primed to become the next general processing layer for Hadoop - succeeding MapReduce. We will discuss the main benefits of Spark, Resilient Distributed Datasets (RDD) programming paradigm and Scala functional programming language - designed to express common distributed programming patterns in A concise and elegant way never before seen in the Hadoop world.
Connecting the Dots with Big Data
Petra Korica-Pehserl, Microsoft
Handling Big Data doesn't have to be a Big Challenge. Use Microsoft's technologies based on open source technologies like Hadoop, advanced Machine Learning algorithms and easy to use visualization tools to get started fast on Big Data. In this session we will discuss examples and use-cases from various industry verticals and show how to build a Big Data solution from collecting data to visualizing data.
Lithuanian Exhibition and Congress Centre LITEXPO, Laisves ave. 5, LT-04215 Vilnius, Lithuania
Do you want to sponsor this event? Learn about our sponshorship opportunities here!
Four Topics: Gaming, Social Networks, Digital Advertising and Stock Exchange.
This event was the first major event of its type in the Baltics. The event featured 17 speakers from all over the world representing companies such as Aerospike, IBM, Amazon, RabbitMQ, and many others. More than 600 data-minded individuals attended the event, shared their experiences, and had a great time.
Director of Application Engineering at Aerospike
This is the premier conference for IT and computing in the Baltic States.
Senior Software Engineer at Tokutek
I came to High Load today to see some of Lithuania, meet the people here and find out how they are using large scale systems and solve interesting problems. The conference is exciting and full of fun people.
Technical Lead at Starling
Big, there is a huge amount of modern things going on and it’s really interesting.
The event attracted around 350 people, and 11 speakers from four different countries: Lithuania, Belarus, the United Kingdom, and the United States. Representatives from various companies, such as HP Vertica, Aerospike, Exacaster, Bing!, Expedia, and Altoros, also attended.
The success of Big Data Strategy Vilnius prompted Adform to organize another conference in Minsk with help from Dev.By. Over 400 people attended this event. Once again, speakers from all over the world participated. The companies represented include Data Miner, HP Vertica, Aerospike, Exacaster, Bing!, Expedia, and Altoros.