Databricks is a company founded by the creators of Apache Spark, that aims to help clients with cloud-based big data processing using Spark. Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, a distributed computing framework built atop Scala. Its co-founders are: Ali Ghodsi, CEO, University of California, Berkeley adjunct professor and co-founder, Andy Konwinski, Ion Stoica, Executive Chairman, University of California, Berk...

Wikipedia
Databricks
Databricks
Blog Post
  • Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing, task automation, and predictions. From autonomous cars to a more personalized shopping experience, big data and artificial intelligence is at the forefront of new solutions that are delighting customers, improving business operations […]...

Databricks
Databricks
Blog Post
  •   In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and how they impact Databricks from a security and performance perspective. Meltdown Meltdown breaks a fundamental assumption in operating system security: an […] The post Meltdown and Spectre: Exploits a...

Databricks
Databricks
Blog Post
  • Last week, the details of two industry-wide security vulnerabilities, known as Meltdown and Spectre, were released. These exploits enable cross-VM and cross-process attacks by allowing untrusted programs to scan other programs’ memory. On Databricks, the only place where users can execute arbitrary code is in the virtual machines that run Apache Spark clusters. There, cross-customer […] The post M...

Databricks
Databricks
Blog Post
  • We are excited to announce the general availability of DBIO caching, a Databricks Runtime feature as part of the Unified Analytics Platform that can improve the scan speed of your Apache Spark workloads up to 10x, without any application code change. In this blog, we introduce the two primary focuses of this new feature: ease-of-use […] The post Databricks Runtime’s New DBIO Cache Boosts Apache Sp...

Databricks
Databricks
Blog Post
  • At Databricks we welcome the dawn of the New Year 2018 by reflecting on what we achieved collectively as a company and community in 2017. In this blog, we elaborate on the three themes: unification, expansion, and collaboration. Year of Unification Unification has been a pivotal and founding tenet of Apache Spark from its genesis. […] The post Databricks and Apache Spark 2017 Year in Review appear...

Databricks
Databricks
Blog Post
  • Today we released our Databricks Unified Analytics Platform video. This short video illustrates to analytics leaders how Databricks can unify their analytics efforts onto one platform. This unification makes analytics teams more efficient and enables them to tackle tougher analytics problems. Many organizations are moving to Spark for the awesome big data processing power it […] The post Unifying ...

Databricks
Databricks
Blog Post
  • This is a guest post from Chris Robison, Head of Marketing Data Science at Overstock.com. At Overstock.com we’ve never had a problem with a lack of data. At 19 years old, we have one of the most expansive user datasets in all of e-commerce. As Lead Data Scientist in Marketing, I can look back through […] The post Overstock Marketing + Databricks = Data Science at Scale appeared first on Databricks...

Databricks
Databricks
Blog Post
  • This is a community guest blog from Jakub Wozniak, a software engineer and project technical lead at CERN physics laboratory, further expounding and complementing his keynote at Spark Summit EU in Dublin. CERN is a physics laboratory founded in 1954 focused on research, technology, and education in the domain of Fundamental Physics and Standard Model […] The post The Architecture of the Next CERN ...

Databricks
Databricks
SlideShare Presentation
Databricks
Databricks
SlideShare Presentation
  • Continuous integration and continuous delivery (CI/CD) enables an organization to rapidly iterate on software changes while maintaining stability, performance, and security. Many organizations have adopted various tools to follow the best practices around CI/CD to improve developer productivity, code quality, and software delivery. However, following the best practices of CI/CD is still challengi...

Databricks
Databricks
Blog Post
  • We’re excited to announce that Spark Summit is expanding its coverage in 2018 to include in-depth content on artificial intelligence. We are also renaming the conference Spark + AI Summit. AI has always been one of the most exciting applications of big data and Apache Spark, so with this change, we are planning to bring […] The post Spark Summit is Becoming the Spark + AI Summit appeared first on ...

Databricks
Databricks
YouTube Video
  • Enterprises face a daily barrage of cyber attacks. Responding quickly to threats is crucial to avoid a breach. To do this successfully, security teams need to monitor and analyze billions of data signals each day.Yet, existing security tools are struggling to keep up. Overcoming these challenges requires a new approach to threat detection rooted in data science. This webinar covers: •Why cyberse...

Databricks
Databricks
Blog Post
  • High-profile cybersecurity breaches dominated headlines in 2017. In the first half of the year, over 1.9B records were stolen. That’s more than 7,000 records breached every minute. And the fallout from a single event can be staggering. Customer attrition, negative PR and regulatory fines amount to millions in financial losses. In fact, according to recent […] The post Improving Threat Detection in...

Databricks
Databricks
Blog Post
  • Big data workloads require access to disk space for a variety of operations, generally when intermediate results will not fit in memory. When the required disk space is not available, the jobs fail. To avoid job failures, data engineers and scientists typically waste time trying to estimate the necessary amount of disk via trial and […] The post Transparent Autoscaling of Instance Storage appeared...

Databricks
Databricks
Blog Post
  • Today we announced that Amazon has awarded Databricks with the Amazon Web Services (AWS) Machine Learning (ML) Competency status. This designation recognizes Databricks for enabling data scientists and machine learning practitioners with tools to take their data, train predictive models and make predictions on new data. Attaining the AWS ML Competency demonstrates to customers that Databricks […] ...

Databricks
Databricks
SlideShare Presentation
  • Apache Spark performance is notoriously difficult to reason about. Spark’s parallelized architecture makes it difficult to identify bottlenecks when jobs are running, and as a result, users often struggle to determine how to optimize their jobs for the best performance. This talk will take a deep dive into techniques for identifying resource bottlenecks in Spark. I’ll begin with the past, and ...

Databricks
Databricks
Blog Post
  • When Fei-Fei Li, the director of Stanford’s AI Lab and now a chief scientist at Google Cloud, was asked in an interview in the MIT Technical Review: The Artificial Intelligence Issue why she advocated more women be involved in technical fields, and AI in particular, she said: “When you are making a technology this pervasive […] The post Women in Big Data, Apache Spark and AI: Bay Area Spark Meetup...

Databricks
Databricks
SlideShare Presentation
  • Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipe...

Databricks
Databricks
Blog Post
  • Databricks and Microsoft have jointly developed a new cloud service called Microsoft Azure Databricks, which makes Apache Spark analytics fast, easy, and collaborative on the Azure cloud. Not only does this new service allow data scientists and data engineers to be more productive and work collaboratively with their respective teams, but it also gives them […] The post Cloud-based Relational Datab...

Databricks
Databricks
Blog Post
  • Databricks and Microsoft have jointly developed a new cloud service called Microsoft Azure Databricks, which makes Apache Spark analytics fast, easy, and collaborative on the Azure cloud. Not only does this new service allow data scientists and data engineers to be more productive and work collaboratively with their respective teams, but it also gives them […] The post Cloud-based Relational Datab...

Databricks
Databricks
YouTube Video
  • Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipel...

Databricks
Databricks
YouTube Video
  • Apache Spark performance is notoriously difficult to reason about. Spark’s parallelized architecture makes it difficult to identify bottlenecks when jobs are running, and as a result, users often struggle to determine how to optimize their jobs for the best performance. This talk will take a deep dive into techniques for identifying resource bottlenecks in Spark. I’ll begin with the past, and disc...

Databricks
Databricks
YouTube Video
  • The stereotypes of computer scientists just aren't flattering. Probably every computer scientist can think of dimensions of the stereotype that just doesn't fit. Why do these stereotypes of computer scientists matter? And how might we change them and the tech industry more broadly? Learn about how Harvey Mudd College went about changing the culture of CS to go from a major with about 10% women in ...

Databricks
Databricks
SlideShare Presentation
  • The stereotypes of computer scientists just aren't flattering. Probably every computer scientist can think of dimensions of the stereotype that just doesn't fit. Why do these stereotypes of computer scientists matter? And how might we change them and the tech industry more broadly? Learn about how Harvey Mudd College went about changing the culture of CS to go from a major with about 10% women in...

Databricks
Databricks
SlideShare Presentation
  • TensorFrames: Spark + TensorFlow: Since the creation of Apache Spark, I/O throughput has increased at a faster pace than processing speed. In a lot of big data applications, the bottleneck is increasingly the CPU. With the release of Apache Spark 2.0 and Project Tungsten, Spark runs a number of control operations close to the metal. At the same time, there has been a surge of interest in using GP...

Out-Market Your Competitors?

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account Log in

By signing up, you agree to the Terms of Service and Privacy Policy.

Out-Market Your Competitors

Get complete competitive insights on over 2.2 million companies to drive your marketing strategy.

Create Free Account

Already a user?  Log in

By signing up, you agree to the Terms of Service and Privacy Policy.