Databricks is a company founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, a distributed computing framework built atop Scala. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks. In addition to building the Databricks platform, the company is co-organizing massive open online courses about Spark and runs the largest conference about Spark - Spark Summit.

Wikipedia
Databricks
Databricks
Blog Post
  • With over 4 billion subscribers, Viacom is focused on delivering amazing viewing experiences to their global audiences. Core to this strategy is ensuring petabytes of streaming content is delivered flawlessly through web, mobile and streaming applications. This is critically important during popular live events like the MTV Video Music Awards. Streaming this much video can […] The post Viacom’s Jo...

Databricks
Databricks
Blog Post
  • Convergence of Knowledge For any Apache Spark enthusiast, these summits are the convergence of Spark knowledge. Used by a growing global community of enterprises, academics, contributors, and advocates, attendees have convened at these summits since 2013 to share knowledge. And this summer attendees will return to San Francisco—to an expanded scope and agenda. Expansion of […] The post 5 Reasons t...

Databricks
Databricks
Blog Post
  • In collaboration with the local chapter of Women in Big Data Meetup and our continuing effort by Databricks diversity team to have more women in the big data space as speakers to share their subject matter expertise, we hosted our second meetup with a diverse and highly-accomplished women in their respective technical fields as speakers […] The post Women in Big Data and Apache Spark: Bay Area Apa...

Databricks
Databricks
SlideShare Presentation
  • Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isn’t caught dead. Once the simple example job runs r...

Databricks
Databricks
YouTube Video
  • Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isn’t caught dead. Once the simple example job runs rel...

Databricks
Databricks
YouTube Video
  • At Uber, location data is our biggest asset. How do we create data visualizations with rich location data, render a million points of events in the blink of an eye, and, most importantly, derive insights from them? In this presentation, you'll get a behind the scenes look at the tools and data visualizations we use at Uber to inform business decisions. I will walk us through an overview of the dat...

Databricks
Databricks
YouTube Video
  • With the new Apache Arrow integration in PySpark 2.3, it is now starting become reasonable to look to the Python world and ask “what else do we want to steal besides tensorflow”, or as a Python developer look and say “how can I get my code into production without it being rewritten into a mess of Java?” Regardless of your specific side(s) in the JVM/Python divide, collaboration is getting a lot f...

Databricks
Databricks
SlideShare Presentation
  • We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the ...

Databricks
Databricks
SlideShare Presentation
  • Enterprises today face a daily barrage of cyberattacks. Responding quickly to threats is crucial to avoiding a serious breach. To do this successfully, security teams need to monitor and analyze billions of data signals or events each day. They come in different form and format. Yet, existing security tools are struggling to keep up. Threats are going unnoticed, and remediation timelines are bein...

Databricks
Databricks
Blog Post
  • As a digital society built around data and devices, we have reached a pivotal juncture where data and Artificial Intelligence must be accessible to everyone. Riding this trend, many homes now contain smart devices such as the Amazon Echo or Google Home. Yet these devices only offer limited computational power and AI capabilities. To remedy […] The post Introducing Data Brick™: The Building Block o...

Databricks
Databricks
Blog Post
  • Click is an open-source tool that lets you quickly and easily run commands against Kubernetes resources, without copy/pasting all the time, and that easily integrates into your existing command line workflows. At Databricks we use Kubernetes, a lot. We deploy our services (of which there are many) in unique namespaces, across multiple clouds, in multiple […] The post Introducing Click: The Command...

Databricks
Databricks
Blog Post
  • The confluence of cloud, data, and AI is driving unprecedented change. The ability to utilize data and turn it into breakthrough insights is foundational to innovation today. Our goal is to empower organizations to unleash the power of data and reimagine possibilities that will improve our world. To enable this journey, we are excited to […] The post Azure Databricks, industry-leading analytics pl...

Databricks
Databricks
Blog Post
  • Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons. First, it made developer’s experience with the APIs simpler: the APIs did not have to account for micro-batches. Second, it allowed developers to treat a stream as an infinite table to which they could issue queries as […] The post Introducing Low-latency Continuous Processin...

Databricks
Databricks
SlideShare Presentation
  • Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner. We’ll survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies D...

Databricks
Databricks
Blog Post
  • Early last month, we announced our agenda for Spark + AI Summit 2018, with over 180 selected talks with 11 tracks and training courses. For this summit, we have added four new tracks to expand its scope to include Deep Learning Frameworks, AI, Productionizing Machine Learning, Hardware in the Cloud, and Python and Advanced Analytics. […] The post Selected Sessions to Watch for at Spark + AI Summit...

Databricks
Databricks
Blog Post
  • Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing, task automation, and predictions. From autonomous cars to a more personalized shopping experience, big data and artificial intelligence is at the forefront of new solutions that are delighting customers, improving business operations […]...

Databricks
Databricks
Blog Post
  • We are excited to announce the general availability of Databricks Cache, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark workloads up to 10x, without any application code change. In this blog, we introduce the two primary focuses of this new feature: ease-of-use […] The post Databricks Cache Boosts Apache Spark Performance...

Databricks
Databricks
YouTube Video
  • Learn how to easily build an end-to-end data pipeline for high volume streaming use cases like mobile game analytics with Databricks. In this demo we will: - Build a mobile gaming data pipeline using AWS services such as API Gateway, Lambda, and Kinesis Streams - Build a stream ingestion service using Spark Structured Streaming - Use Databricks Delta as a sink for our streaming operations - Explor...

Databricks
Databricks
Blog Post
  • Since we introduced Structured Streaming in Apache Spark 2.0, it has supported joins (inner join and some type of outer joins) between a streaming and a static DataFrame/Dataset. With the release of Apache Spark 2.3, now available in Databricks Runtime 4.0 as part of Databricks Unified Analytics Platform, we now support stream-stream joins. In this […] The post Introducing Stream-Stream Joins in A...

Databricks
Databricks
Blog Post
  • In recent years, machine learning has become ubiquitous in industry and production environments. Both academic and industry institutions had previously focused on training and producing models, but the focus has shifted to productionizing the trained models. Now we hear more and more machine learning practitioners really trying to find the right model deployment options. In […] The post Announcing...

Databricks
Databricks
Blog Post
  • This is a community blog from Anirudh Ramanathan and Palak Bhatia, software engineer and product manager respectively at Google, working in the Kubernetes team. They are part of the group of companies that contributed to native Kubernetes support for the Apache Spark 2.3. This post is cross-posted on blog.kubernetes.io Kubernetes and Big Data The open […] The post Apache Spark 2.3 with Native Kube...

Databricks
Databricks
Blog Post
  • Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0 beta. We want to thank the Apache Spark community for all their valuable contributions to Spark 2.3 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.3 marks a major […] The post Introducing Apache Spark 2.3 appeared first on Databricks.

Databricks
Databricks
SlideShare Presentation
  • Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2. Continuing forward in that spirit, the upcoming release of Apache Spa...

Databricks
Databricks
Blog Post
  • It’s hard to believe that we are already three weeks into 2018. If you’re still struggling to get valuable insights from your data, now is the perfect time to try something new! We recently announced Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. With Azure Databricks, you can help your […] The post Accelerate Innovation with Microsoft...

Databricks
Databricks
Blog Post
  • Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing, task automation, and predictions. From autonomous cars to a more personalized shopping experience, big data and artificial intelligence is at the forefront of new solutions that are delighting customers, improving business operations […]...

Databricks
Databricks
Blog Post
  •   In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and how they impact Databricks from a security and performance perspective. Meltdown Meltdown breaks a fundamental assumption in operating system security: an […] The post Meltdown and Spectre: Exploits a...

Out-Market Your Competitors?

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account Log in

By signing up, you agree to the Terms of Service and Privacy Policy.

Out-Market Your Competitors

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account

Already a user?  Log in

By signing up, you agree to the Terms of Service and Privacy Policy.