Quote: Core Data Engineering Terms

A recent viral thread broke down over 30 essential data engineering terms. The glossary covers everything from ACID and APIs to modern stack tools like dbt, Delta Lake, Spark, Kafka, and Airflow, serving as a comprehensive refresher on foundational concepts.

The company behind dbt, dbt Labs, raised $222 million in a Series D funding round, reaching a $4.2 billion valuation with backing from investors like Altimeter, Andreessen Horowitz, Sequoia, and notably, Databricks and Snowflake. This investment aims to expand dbt Cloud's capabilities and build out the dbt Semantic Layer. While dbt Core remains free and open-source, the paid dbt Cloud Team plan costs around $100 per developer per month. Apache Spark's 4.0 release, announced in May 2025, introduces significant updates including a new lightweight Python client, native plotting capabilities, and a Python Data Source API. This version enhances machine learning workflows by allowing ML model training on Spark Connect, which decouples the client from the cluster, and provides mature support for distributed deep learning with PyTorch through the TorchDistributor API. For actuaries and underwriters, machine learning is revolutionizing risk assessment by analyzing vast, complex datasets to identify nuanced patterns in lifestyle, genetics, and even unstructured data from policy documents. This shift moves the actuarial role from manual modeling towards becoming a bridge between data scientists and C-suite decision-makers, translating advanced models into actionable business strategy. The transition from an individual contributor (IC) to an engineering manager requires a fundamental mindset shift from personal achievement to team success. New managers often struggle with delegating tasks, learning to communicate effectively across different departments, and establishing authority with former peers. The focus moves from hands-on coding to empowering and unblocking the team. In consumer tech, AI-powered recommendation systems are a key driver of sales, with 80% of consumers more likely to buy from brands offering personalized experiences. These systems use techniques like collaborative and content-based filtering to analyze browsing history, past purchases, and demographic data to predict customer interests in real-time. Google's recent AI advancements, showcased at its I/O conference, include integrating the Gemini model into products like Chrome and Android for on-device features and launching "AI Overviews" in Search. The company is also developing Project Astra, a real-time, multimodal AI assistant. New York's tech scene is thriving, with an ecosystem value exceeding $694 billion and over 25,000 startups. Recent significant funding rounds in Q4 2024 included AI-powered data security platform Cyera raising $300 million and B2B payments company Melio securing $150 million. Companies like Flatfile and Bubble are actively hiring in the city. For strength training, recent science points towards the importance of mechanical tension for muscle hypertrophy. This involves lifting challenging weights and progressively overloading muscles over time. While various rep ranges can be effective, focusing on proper form and lifting close to muscular failure are key principles for maximizing growth.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.