Skip to main content

This hackathon has ended

This event is no longer accepting registrations or submissions. Explore upcoming hackathons

CS

COMM STEM X SUDATA HACKATHON

COMM STEM X SUDATA

0 of 0
COMM STEM X SUDATA HACKATHON banner

Prize Pool

$1.3K

Location

Online

Status

Ended

Days Left

Ended

Date Range

Apr 18, 2026 - Apr 19, 2026

Submission Period

Apr 18 - 19, 2026

Categories

About the Hackathon

A data driven hackathon where teams aim to build innovative products to pitch to venture capital firms.

SUBMISSION DEADLINE PUSHED BACK TO MONDAY 20TH 8 AMImportant note:Users may combine their chosen prompt with external data sources or APIs to enhance their solution, provided that all incorporated data is authentic and not synthetically generated. However, the prompt itself must remain the clear primary driver of the solution, meaning it is used significantly, directly, and in an obvious way rather than being overshadowed by external inputs.Theme 1: Decoding Virality in 2026: What makes a YouTube Trend?The Problem Statement/Challenge:Every day, millions of videos compete for one of the most coveted spots on the internet - YouTube's Trending page. Some videos explode overnight. Others quietly accumulate millions of views before anyone notices. However, most disappear entirely.Your challenge: Use real 2026 YouTube trending data to uncover what separates the videos that succeed from the ones that don't - and build something useful with that knowledge.This isn't about finding the highest view count. It’s about trying to understand the mechanics behind what makes a video viral - from its tags and topics to its viewer engagement. Whoever cracks that code holds real power in the modern attention economy.What You’re Doing:You are a product team at a startup that's just been handed a goldmine of YouTube trending data. Your investors aren't asking for a report. They're asking for a working product - something tangible a user could open, interact with, and walk away from with a meaningful advantage.Your job is to identify a real problem, use the data to understand it deeply, and then build a product that solves it. That product could be a web app, a browser tool, a dashboard, a recommendation engine, an API, or anything else your team can ship in the time available. The data informs the product - but the product is the deliverable.Build for a real user. Solve a real problem. Show it working.Your Dataset:You are working with a snapshot dataset of YouTube's trending videos from 2026, collected across multiple trending cycles for 11 different countries. This dataset provides country-wise YouTube trending video snapshots with engagement metrics and metadata.YouTube Trending Videos Dataset (2026):https://www.kaggle.com/datasets/bsthere/youtube-trending-videos-stats-2026Your Tasks:There is no single correct answer here. You choose the problem, the angle, and the solution. Below are suggested directions — treat them as starting points, not constraints.1. Define Your User & Their ProblemWho are you building for — a solo creator, a brand marketing team, a media agency, a platform analyst?What specific question are you answering with this data?Why does it matter - what's the real-world consequence of getting it right or wrong?2. Design & Build Your ProductWhat does your product actually do? Define its core feature set clearly.What frustration, inefficiency, or blind spot does your target user have today?Your product should be interactive or demonstrable. A live demo, a functional prototype, or a working interface will always beat a static slide deck.3. Validate & Pitch Your SolutionWho would use this, and how would they discover it?How do you know it works? Show evidence - test cases, sample outputs, user scenarios.What do future versions look like? A credible roadmap shows you've thought beyond the hackathon.Examples & Suggestions(Optional Section, a bit extra)These are just some illustrative ideas, to get your creative juices flowing. You are encouraged to go beyond them:Trend Velocity Analysis: Using publish_time and trending_date, identify how quickly different categories trend and whether speed correlates with longevity on the trending page.Engagement Fingerprinting: Build category-level "engagement profiles" using like, dislike, and comment ratios to identify which content types provoke action vs. passive consumption.Tag & Title Intelligence: Apply NLP to tags, title, and description to identify linguistic patterns (word choice, length, punctuation, capitalisation) that appear more frequently in trending videos.Creator Tier Analysis: Segment channels by size using channel_id and examine whether smaller creators’ trend differently (faster, shorter, different categories) than established ones.Creator's Cheat Sheet: Synthesise your findings into a practical, data-backed guide: the best category, optimal publish window, recommended engagement targets, and title strategies most associated with trending.Controversy vs. Positivity: Explore whether high dislike ratios help or hurt trending performance, and what that means for content strategy.Theme 2: The Geography of EverythingPick any decision a business makes and you'll find geography underneath it. Where to open the next store. Who the competitors really are. How long the delivery driver actually spends in traffic. Which suburbs are saturated with cafés and which ones haven't seen a new gym in five years. Until recently, this kind of information lived inside Google, Apple, and a handful of paid data vendors who charged tens of thousands a year for the privilege. Then Foursquare quietly open-sourced their entire database of points of interest, all 100 million of them, and the playing field shifted.Your job for this hackathon is to take that dataset and turn it into something a real person or business would actually want to use. The brief is deliberately wide. There is no fixed user, no fixed product, no right answer. A franchise expansion manager has one question, a tourist arriving in Sydney has another, a council planner thinking about service deserts has a third, and a logistics team mapping delivery zones has a fourth. Your job is to pick a question worth answering, build the thing that answers it, and show it working.The product is the deliverable, not the analysis. A web app, a dashboard, a recommendation engine, a planning tool, an API, a search experience: anything that someone could open and walk away from with an answer. How sharply you scope the user and the question matters more than how clever your model is. A focused product solving a specific problem will beat a sprawling piece of analysis every time. Australia is the obvious starting point, but the dataset covers the entire planet, and if your product makes more sense pointed at Tokyo, Jakarta, or Cape Town, defend that choice and run with it.Your Data Playground: Foursquare Open Source Places (Advanced)https://huggingface.co/datasets/foursquare/fsq-os-places100 million real global points of interest, refreshed monthly. Restaurants, gyms, clinics, retailers, schools, parks, hotels, anything with a name on a map. Each record carries latitude and longitude, a hierarchical category, an address, websites and social handles where available, and a confidence score for whether the place still exists. Filtered to Australia, you'll get somewhere between 2 and 4 million rows. The same data quietly powers products at Uber, Coca-Cola, and hundreds of location-based companies.The dataset ships as Parquet rather than CSV, which is fine because Python handles Parquet natively through libraries like polars, pandas, and pyarrow. Hugging Face gates the dataset behind a one-click access request that auto-approves within a few minutes, so create a free account and request access before you start coding. Read the schema documentation here before you write a single line:https://docs.foursquare.com/data-products/docs/places-os-data-schemaTheme 3: Improving Transportation in NSWPrompt:Urban transportation systems are designed to move millions of people efficiently, yet everyday journeys are often unpredictable, fragmented, or inconvenient.Delays, missed connections, and inconsistent information can disrupt even the simplest trips. Travellers frequently rely on multiple sources to plan and adjust their journeys, but these tools do not always reflect the real-world experience of navigating the network. Thus, commuters must constantly make decisions on the go—when to leave, which route to take, and how to adapt when plans change—often with incomplete or unreliable information.Use open transport and mobility data, design a functional product, tool, or prototype (e.g. website, app, or interactive dashboard) that helps people better navigate, understand, or improve their journeys throughout the state. Your solution should demonstrate clear user interaction and practical usability.To start, you can try thinking about…Identify the problem:What transport-related problem are you investigating?Why does this problem matter economically, socially, or operationally?Identify the user/use-case:Who experiences this problem (commuters, planners, operators)?Realistically, how would they use your solution?Identify your solution:What features and functionality does your tool provide to solve this problem?Would data help power these features?Datasets:Although Open Data Transport NSW has a wide array of datasets that can help your product, not all of them are clean and easy to work with. The following are some datasets we suggest you start with, as they’re more up-to-date and are suitable for coding — but feel free to browse through the database at your own leisure to discover more exciting connections and opportunities.The * links have APIs, which can be used to build dashboards, if you’re feeling ambitious. Though not all are updated in real time, so read carefully!Routes (mostly shapefiles, so use in combination with other datasets!) :All*,General Bus,Major Event Bus,Train,Bicycle CyclewaysTraffic Related:Live Traffic Hazards*,Roads Traffic Volume Counts*,NSW Crash DataTimetables & Public Transport Trackers :Realtime Vehicle Positions v2* ,Realtime Public Transport Alerts v2*,Realtime Trip Updates v2*,Trip Planner APIs*,TimetablesInfrastructure :Bike Parking Locations (Sheds & Lockers)Bus Shelter LocationsRest AreasEV Charging LocationsStreet Light LocationsTransport Station Locations :All,TrainsSurveys/Reports:Fare Compliance SurveyHousehold Travel SurveyOpal Recorded Trips (aggregated from tap on/offs) : All, Bus, Light Rail, Train and Metro, FerryExamples / Suggestions to help you get started:A route optimisation tool that suggests better routes based on congestion + delaysA commuter dashboard that predicts delays and suggests alternativesA planner tool that identifies underserved areas and simulates new routesA safety app highlighting crash hotspots and safer travel pathsA fare optimisation tool helping users minimise travel costs