Databricks is a company founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, a distributed computing framework built atop Scala. Its co-founders are: Ali Ghodsi, CEO, University of California, Berkeley adjunct professor and co-founder, Andy Konwinski, Ion Stoica, Executive Chairman, University of California, Berk...

Wikipedia
Databricks
Databricks
Blog Post
  • Adding new software to an enterprise is a difficult process.  In the past, choosing new software only required budget approval before it could be adopted.  Today’s enterprises have adopted processes that require security approval before a new product is approved for purchase.  On the surface, it seems like more bureaucracy, but software has changed. Software […] The post 3 Things CISO’s expect fro...

Databricks
Databricks
Blog Post
  • This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark Natural Language Processing (NLP) library. The blog expounds on three top-level technical requirements and considerations for this library. Apache Spark is a general-purpose cluster computing framework, with native support for distributed SQL, […] The post I...

Databricks
Databricks
Blog Post
  • This is a guest post from Matt Hogan, Sr. Director of Engineering, Analytics and Reporting at McGraw-Hill Education. McGraw-Hill Education is a 129-year-old company that was reborn with the mission of accelerating learning through intuitive and engaging experiences – grounded in research. When we began our journey, our data was locked in silos managed within […] The post Using Databricks to Democr...

Databricks
Databricks
Blog Post
  • This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming. Introduction Most data streams, though continuous in flow, have discrete events within streams, each marked by a timestamp when an event transpired. As a consequence, this idea of “event-time” is central to […] The post Arbitrary Stateful Processing in ...

Databricks
Databricks
Blog Post
  • This is a guest post from Movile. Eiti Kimura and Flavio Clésio share their highlights and what they’re looking forward to the most at Spark Summit EU in Dublin, Ireland. About the Authors: Eiti Kimura has over 15 years of experience working with software development. He holds a Master’s Degree in Electrical Engineering. He has […] The post Crossing The Ocean for Spark Summit EU appeared first on ...

Databricks
Databricks
Blog Post
  • Benchmarking is a crucial and common process for evaluating the performance of systems. What makes a benchmark credible is its reproducibility. Many existing benchmarks are hard to reproduce for a couple reasons: The code that was used to certain generate results is not publicly available. The hardware used to generate certain results is not easily […] The post Benchmarking Structured Streaming on...

Databricks
Databricks
Blog Post
  • SAVE 15% ON REGISTRATION* WITH CODE “DATABRICKS”. This year’s Spark Summit Europe is poised to be the biggest yet and we here at Databricks could not be more excited to be hosting this event. The level of participation and continued growth of the summit is a clear indicator of how critical Apache Spark has become […] The post The Biggest EU Summit Ever: Over One Hundred Presentations, Two Conferen...

Databricks
Databricks
Blog Post
  • At Databricks we strive to make our Unified Analytics Platform the best place to run big data analytics. For big data, Apache Spark has become the de-facto computing engine, while for advanced analytics, R is one of the most widely used languages and environments. R’s package ecosystem boasts more than 10k packages ranging from implementation […] The post Accelerating R Workflows on Databricks app...

Databricks
Databricks
Blog Post
  • Introduction Big data practitioners often post recurring questions on Quora: What is data engineering? How to become a data scientist? What’s a data analyst? Apart from understanding these roles and respective responsibilities, more important questions to pose are: How can three different personas, three different experiences, and three different requirements collaborate and combine their efforts?...

Databricks
Databricks
Blog Post
  • On September 7th, we held our monthly Bay Area Apache Spark Meetup (BASM) at HPE/Aruba Networks in Santa Clara. We had two Apache Spark related talks: one from Aruba Networks’ Data Engineering team and other from Databricks’ Machine Learning team. For those who missed the meetup, below is the video and link to each presentation […] The post Bay Area Apache Spark Meetup at HPE/Aruba Networks Summar...

Databricks
Databricks
YouTube Video
  • Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. A...

Databricks
Databricks
YouTube Video
  • The correlation of multiple streaming data sources is a difficult problem, especially when data can arrive out-of-order or delayed, and when the correlation logic can be complicated. In this talk, we describe a generic correlation framework built on PySpark and HDFS that handles these issues. Each data record passes through two stages of processing. In the first, stateless processing is performed...

Databricks
Databricks
Blog Post
  • Since Apache Spark 1.6, as part of the Project Tungsten, we started an ongoing effort to substantially improve the memory and CPU efficiency of Apache Spark’s backend execution and push performance closer to the limits of the modern hardware. This effort culminated in Apache Spark 2.0 with Catalyst optimizer and whole-stage code generation. Because Spark […] The post Learn about Apache Spark’s Mem...

Databricks
Databricks
Blog Post
  • First I’ll start with the sad truth. The technology industry at large has taken many hits over the years for discriminatory practices and underrepresentation of both women and minorities. Ageism, too, is a beast that lurks in the Valley. So as an employee, I’m happy to announce that Databricks has formed a Diversity Committee to address […] The post Databricks invites Colleen Lewis to Speak about ...

Databricks
Databricks
Blog Post
  • We are very excited today as we announce a partnership between Databricks and Looker. We have seen customers using these products together to provide an easy and intuitive way for business users to visualize and discover the powerful analytics results of Spark. Using Looker and Databricks, you can experience the following benefits: Easy to Use […] The post Looker and Databricks Partner to Bring Da...

Databricks
Databricks
SlideShare Presentation
  • Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. ...

Databricks
Databricks
Blog Post
  • Since Apache Spark 1.3, Spark and its APIs have evolved to make them easier, faster, and smarter. The goal has been to unify concepts in Spark 2.0 and beyond so that big data developers are productive and have fewer concepts to grapple with. Built atop the Spark SQL engine, with Catalyst optimizer and whole-stage code […] The post Learn about Apache Spark APIs and Best Practices appeared first on ...

Databricks
Databricks
Blog Post
  • At the Spark Summit in San Francisco in June, we announced an open-source project Deep Learning Pipelines. Deep Learning Pipelines provides high-level APIs for scalable deep learning in Python with Apache Spark, and the library leverages Spark for its two strongest facets: In the spirit of Spark and Spark MLlib, it provides easy-to-use APIs that […] The post Build, Scale, and Deploy Deep Learning ...

Databricks
Databricks
Blog Post
  • This summer, I worked at Databricks as a software engineering intern on the Growth team. By introducing two new features, user groups and API tokens, I simplified the user management experience and improved security for API authentication. In this blog, I briefly discuss their use and merits and share my personal experience as an intern […] The post A Summer of Personal and Professional Growth at ...

Databricks
Databricks
Blog Post
  • At the Spark Summit in San Francisco in June, we announced that Apache Spark’s Structured Streaming is marked as production-ready and shared benchmarks to demonstrate its performance compared to other streaming engines. Structured Streaming is a novel way to process streams. Not only does this new way make it easy to build end-to-end streaming applications, […] The post Do your Streaming ETL at Sc...

Databricks
Databricks
Blog Post
  • This is a joint engineering effort between Databricks’ Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei’s engineering team (Ron Hu and Zhenhua Wang) Apache Spark 2.2 recently shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, […] The po...

Databricks
Databricks
Blog Post
  • Developing custom Machine Learning (ML) algorithms in PySpark—the Python API for Apache Spark—can be challenging and laborious. In this blog post, we describe our work to improve PySpark APIs to simplify the development of custom algorithms. Our key improvement reduces hundreds of lines of boilerplate code for persistence (saving and loading models) to a single […] The post Developing Custom Machi...

Databricks
Databricks
Blog Post
  • On August 22, we held our monthly Bay Area Apache Spark Meetup (BASM) at Pinterest in San Francisco. In all, we had three Apache Spark related talks: two from Pinterest’s Data Engineering team, and one from Databricks’ Machine Learning team. For those who missed the meetup, below is the video and links to the presentation […] The post Bay Area Apache Spark Meetup at Pinterest Summary appeared firs...

Databricks
Databricks
YouTube Video
  • Tech-Talk 1: Large-scale batch processing at Pinterest with Apache Spark Abstract: Pinterest is a data product and we rely heavily on processing a large amount of data for various use cases ranging from discovery products to business metric computation. Spark has been present at Pinterest since 2014, but it was only last year when it started to attract large scale use cases and the use cases a...

Databricks
Databricks
SlideShare Presentation
  • Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner. In this talk, we’ll survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This packa...

Databricks
Databricks
Blog Post
  • Older anthologies collated a collection of contributions from various authors around a theme—bounded then as a journal or periodical. Newer anthologies, however, include multiple modals of expressions—digitized now as an ebook or a blog. Both offer an exposition of the subject matter. No matter their form, they provide a single source of focused content. In […] The post Anthology of Technical Asse...

Out-Market Your Competitors?

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account Log in

By signing up, you agree to the Terms of Service and Privacy Policy.

Out-Market Your Competitors

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account

Already a user?  Log in

By signing up, you agree to the Terms of Service and Privacy Policy.