Spark Archives - Page 3 of 10

MapR to Autoscale Spark and Drill Via Prebuilt Kubernetes Containers

MapR Technologies today announced a technology preview of pre-built containers for Kubernetes that will give customers new capabilities for dynamically scaling their containerized Spark and Drill applications based on de Read more…

A Decade Later, Apache Spark Still Going Strong

Don't look now but Apache Spark is about to turn 10 years old. The open source project began quietly at UC Berkeley in 2009 before emerging as an open source project in 2010. For the past five years, Spark has been on an Read more…

Data Engineering Continues to Move the Employment Needle

Interested in a career in big data? You could do well by investing your time and effort in acquiring data science skills. But you may do even better by turning yourself into a data engineer, which is a title that continu Read more…

Microsoft Invests in Databricks

Databricks, the high-flying analytics startup founded by the creators of Apache Spark, announced yet another venture funding haul this week as it hustles to meet what it says is growing demand for its analytics platform. Read more…

Presto Backers Bolster Its Open Source Origins

A new industry group will promote Presto, the popular open source distributed SQL query engine launched by Facebook engineers in 2012 as a follow-on to Apache Hive. The Presto Software Foundation launched on Thursday Read more…

Build on the AWS Cloud with Your Eyes Wide Open

Building data applications on public clouds like Amazon Web Services is a no brainer for many organizations these days. The tools for ingesting, storing, and processing data in the cloud are rapidly maturing, and best of Read more…

Movie Recommendations with Spark Collaborative Filtering

Collaborative filtering (CF)[1] based on the alternating least squares (ALS) technique[2] is another algorithm used to generate recommendations. It produces automatic predictions (filtering) about the interests of a user Read more…

Nvidia Platform Pushes GPUs into Machine Learning, High Performance Data Analytics

GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end AI-related workloads (and gaming, of course), is mounting an open source end-to-end GPU acceleration platform and ecosy Read more…

Attunity Brings CDC to Google Cloud

Enterprises that are looking to push transactional data from on-premise systems into Google's cloud environment may want to check out the latest from Attunity, which today announced support for Google Cloud Platform with Read more…

Machine Teaching Will Drive Crowdsourced Cognition into the AI Pipeline

Building high-quality artificial intelligence (AI) is hard work. It’s a specialized discipline that historically has required highly skilled specialists, aka data scientists. Any time you require some highly skilled Read more…

Project Hydrogen Unites Apache Spark with DL Frameworks

The folks behind Apache Spark today unveiled Project Hydrogen, a new endeavor that aims to eliminate barriers preventing organizations from using Spark with deep learning frameworks like TensorFlow and MXnet. It's tou Read more…

How Disney Built a Pipeline for Streaming Analytics

The explosion of on-demand video content is having a huge impact on how we watch television. You can now binge watch an entire season's worth of Grey's Anatomy at one sitting, if that suits your fancy. For a media giant Read more…

Presto Use Surges, Qubole Finds

Don't look now, but Presto, the SQL engine developed by Facebook as a follow-on to Hive, is starting to catch on in a big way. According to a new survey of big data-as-a-service customers by Qubole, Presto logged impress Read more…

Making Hadoop Relatable Again

There has been much debate over the future of Hadoop in recent months. Should it work more like a cloud object store? Should it support GPUs and FPGAs, Docker or Kubernetes (or both)? Should compute and storage be separa Read more…

Weighing Open Source’s Worth for the Future of Big Data

The open source software movement began in earnest 20 years ago, when a group of technology leaders in Silicon Valley coined the term as an alternative to the repugnant "free software." Fast forward to 2018, and the conc Read more…

DataTorrent Glues Open Source Componentry with ‘Apoxi’

Building an enterprise-grade big data application with open source components is not easy. Anybody who has worked with Apache Hadoop ecosystem technology can tell you that. But the folks at DataTorrent say they've found Read more…

The Hybrid Database Capturing Perishable Insights at Yiguo

Yiguo.com is the largest B2C fresh produce online marketplace in China, serving close to 5 million users and more than 1,000 enterprise customers. We have long devoted ourselves to providing fresh food for ordinary consu Read more…

ParallelM Aims to Close the Gap in ML Operationalization

A startup named ParallelM today unveiled new software aimed at alleviating data scientists from the burden of manually deploying, monitoring, and managing machine learning pipelines in production. Dubbed MLOps, Parall Read more…

Snowflake Taps Qubole for Deep Machine Learning in the Cloud

Organizations storing big data in Snowflake's cloud data warehouse can now run machine learning and deep learning algorithms against that data thanks to a new partnership with Qubole. The two companies today announced Read more…

Dr. Elephant Leads the Performance Parade

I started working on big data infrastructure in 2009 when I joined Cloudera, which at the time was a small startup with about 10 engineers. It was a fun place to work. My colleagues and I got paid to work on open source Read more…

MapR to Autoscale Spark and Drill Via Prebuilt Kubernetes Containers

A Decade Later, Apache Spark Still Going Strong

Data Engineering Continues to Move the Employment Needle

Microsoft Invests in Databricks

Presto Backers Bolster Its Open Source Origins

Build on the AWS Cloud with Your Eyes Wide Open

Movie Recommendations with Spark Collaborative Filtering

Nvidia Platform Pushes GPUs into Machine Learning, High Performance Data Analytics

Machine Teaching Will Drive Crowdsourced Cognition into the AI Pipeline

Project Hydrogen Unites Apache Spark with DL Frameworks

How Disney Built a Pipeline for Streaming Analytics

Presto Use Surges, Qubole Finds

Making Hadoop Relatable Again

Weighing Open Source’s Worth for the Future of Big Data

DataTorrent Glues Open Source Componentry with ‘Apoxi’

The Hybrid Database Capturing Perishable Insights at Yiguo

ParallelM Aims to Close the Gap in ML Operationalization

Dr. Elephant Leads the Performance Parade

August 8, 2025

August 7, 2025

August 6, 2025

August 5, 2025

Sponsored Partner Content

Build Trustworthy AI Workflows with Cube D3

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

Unlocking Unstructured Data with GenAI
No Comments

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Tag: Spark