Databricks is a company founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, a distributed computing framework built atop Scala. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. In addition to building the Databricks platform, the company is co-organizing massive open online courses about Spark and runs the largest conference about Spark - Spark Summit.

Wikipedia
Databricks
Databricks
Blog Post
  • London, as a financial center and cosmopolitan city, has its historical charm, cultural draw, and technical allure for everyone, whether you are an artist, entrepreneur or high-tech engineer. As such, we are excited to announce that London is our next stop for Spark + AI Summit Europe, from October 2-4th, 2018, so prepare yourself for […] The post Spark + AI Summit Europe Agenda Announced appeare...

Databricks
Databricks
SlideShare Presentation
  • Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL ...

Databricks
Databricks
SlideShare Presentation
  • Nowadays, people are creating, sharing and storing data at a faster pace than ever before, effective data compression / decompression could significantly reduce the cost of data usage. Apache Spark is a general distributed computing engine for big data analytics, and it has large amount of data storing and shuffling across cluster in runtime, the data compression/decompression codecs can impact t...

Databricks
Databricks
SlideShare Presentation
  • In the past years, the increasing computational power has made possible larger scientific experiments that have high computing demands, such as brain tissue simulations. In general, larger simulations imply generating larger amounts of data that need to be then analyzed by neuroscientists. Currently, simulation reports are analyzed by neuroscientists with the help of Python scripts, thanks to its...

Databricks
Databricks
SlideShare Presentation
  • What do cranberry farmers, bugs, deep learning and a mid-western creative agency have in common? Join me to learn how with today’s tools, and a sense of why not, you can jump in and help solve real world needs. In this session, I am going to dive into how we are using deep learning to help cranberry farmers in the mid-west solve real world problems. A part of it is technology, but an equal emphas...

Databricks
Databricks
SlideShare Presentation
  • Deep neural network training is time consuming, often take days and weeks, and a hard topic to master. Selecting the right hyper-parameters is difficult, but so important since it directly affects the behavior of the training algorithm and has a significant impact on performance and accuracy. In this talk, we will discuss a novel approach using distributed Spark to explore the vast hyper-paramet...

Databricks
Databricks
SlideShare Presentation
  • This tech talk deals with how we leveraged Spark Streaming and Spark Machine Learning models to build & operationalize real-time credit card approvals for a banking major. We plan to cover ML capabilities in Spark and how a typical ML pipeline looks like. We are going to talk about the domain and the use case of how a major credit card provider is using spark to calculate card eligibility in rea...

Databricks
Databricks
SlideShare Presentation
  • American Diabetic association states that 29.1 million Americans and 300+ million people all over the world have diabetes. Diabetic medication management is always challenging. Based on doctor’s prescription, patients take insulin dosage one hour before breakfast, lunch or dinner. But the real world scenario insulin intake can be changed based on the blood glucose level, calorie intake on a speci...

Databricks
Databricks
SlideShare Presentation
  • The more time you spend developing within a framework such as Apache Spark, you learn there are additional features that would be helpful to have given the context and details of your specific use case. Spark supports a very concise and readable coding style using functional programming paradigms. Wouldn’t it be awesome to add your own functions into the mix using the same style? Well you can! I...

Databricks
Databricks
SlideShare Presentation
  • A mobile application is only as good as our design and how customers use it. But how do they use it? We’ve got over 35 million devices running our mobile banking platform, and we need to understand each and every one of them. Is the customer enjoying their experience, are they lost, or are they a fraudulent hacker 3000 miles away? We developed an algorithm to examine the user’s workflow so we ca...

Databricks
Databricks
SlideShare Presentation
  • Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community contributors have continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2. Continuing forward in that spirit, Apache Spark 2.3 has ...

Databricks
Databricks
SlideShare Presentation
  • At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them. Caching of ...

Databricks
Databricks
SlideShare Presentation
  • Apache Spark 2.2 shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, avg/max length, etc.) to improve the quality of query execution plans. Skewed data distributions are often inherent in many real world applications. In order to deal with skewed distr...

Databricks
Databricks
SlideShare Presentation
  • Learn from someone who has made just about every basic Apache Spark mistake possible so you don’t have to! We’ll go over some of the most common things that users do that end up doing that cause unnecessary pain and actually explain how to avoid them. Confused about serialization? Not sure what is meant by use a singleton to share connections? Together we will walk through concrete examples of h...

Databricks
Databricks
SlideShare Presentation
  • As a data driven company, we use Machine Learning algos and A/B tests to drive all of the content recommendations for our members. To improve the quality of our personalized recommendations, we try an idea offline using historical data. Ideas that improve our offline metrics are then pushed as A/B tests which are measured through statistically significant improvements in core metrics such as memb...

Databricks
Databricks
SlideShare Presentation
  • Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. Workday is a “pure SaaS” company, providing a suite of Financial and HCM (Human Capital Management) apps to about 2000 companies around the world, including more than 30% from Fortune-500 list. There are significant business and technical challenges to support millions of concurren...

Databricks
Databricks
SlideShare Presentation
  • We’ve all heard that AI is going to become as ubiquitous in the enterprise as the telephone, but what does that mean exactly? Everyone in IBM has a telephone; and everyone knows how to use her telephone; and yet IBM isn’t a phone company. How do we bring AI to the same standard of ubiquity — where everyone in a company has access to AI and knows how to use AI; and yet the company is not an AI co...

Databricks
Databricks
SlideShare Presentation
  • Clickstream data is messy. A single user session in a Zynga game can generate thousands of events, with each game, client version and OS having their own event schemas. Unfortunately, most ML models require their training data to be formatted as a uniform matrix, with each user having the exact same columns. It’s a time consuming challenge to develop feature sets that capture all the nuanced tren...

Databricks
Databricks
SlideShare Presentation
  • What we call the public cloud was developed primarily to manage and deploy web servers. The target audience for these products is Dev Ops. While this is a massive and exciting market, the world of Data Science and Deep Learning is very different — and possibly even bigger. Unfortunately, the tools available today are not designed for this new audience and the cloud needs to evolve. This talk woul...

Databricks
Databricks
SlideShare Presentation
  • Apache Spark is rapidly becoming the dominant big data processing framework for data engineering and data science applications. The simplicity of programming big data applications in Spark and the speed gained with in-memory processing are key factors behind this popularity. However, tools that aid developers build and debug Spark applications have not kept pace. For instance, a Spark applicatio...

Databricks
Databricks
SlideShare Presentation
  • At Apple we rely on processing large datasets to power key components of Apple’s largest production services. Spark is continuing to replace and augment traditional MR workloads with its speed and low barrier to entry. Our current analytics infrastructure consists of over an exabyte of storage and close to a million cores. Our footprint is also growing further with the addition of new elastic ser...

Databricks
Databricks
SlideShare Presentation
  • Time is the one thing we can never get in front of. It is rooted in everything, and “timeliness” is now more important than ever especially as we see businesses automate more and more of their processes. This presentation will scratch the surface of streaming discovery with a deeper dive into the telecommunications space where it is normal to receive billions of events a day from globally distrib...

Databricks
Databricks
SlideShare Presentation
  • Deep Learning is all the rage these days, but where does the reality of what Deep Learning can do end and the media hype begin? In this talk, I will dispel common myths about Deep Learning that are not necessarily true and help you decide whether you should practically use Deep Learning in your software stack. I’ll begin with a technical overview of common neural network architectures like CNNs,...

Databricks
Databricks
SlideShare Presentation
  • As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements. 1) Generality: support reading/writing most data management/storage systems. 2) Flexibility: c...

Databricks
Databricks
SlideShare Presentation
  • Machine Learning is everywhere, but translating a data scientist’s model into an operational environment is challenging for many reasons. Models may need to be distributed to remote applications to generate predictions, or in the case of re-training, existing models may need to be updated or replaced. To monitor and diagnose such configurations requires tracking many variables (such as performanc...

Databricks
Databricks
SlideShare Presentation
  • ING bank is a Dutch multinational, multi-product bank that offers banking services to 33 million retail and commercial customers in over 40 countries. At this scale, ING naturally faces a multitude of data consolidation tasks across its disparate sources. A common consolidation problem is fuzzy name matching: given a name (streaming) or a list of names (batch), find out the most similar name(s) f...

Databricks
Databricks
SlideShare Presentation
  • Personalized product recommendation, the selection of cross-sell, customer churn and purchase prediction is becoming more and more important for e-commerical companies, such as JD.com, which is one of the world’s largest B2C online retailers with more than 300 million active users. Apache Spark is a powerful tool as a standard framework/API for machine learning feature generation, model training ...

Out-Market Your Competitors?

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account Log in

By signing up, you agree to the Terms of Service and Privacy Policy.

Out-Market Your Competitors

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account

Already a user?  Log in

By signing up, you agree to the Terms of Service and Privacy Policy.