Alberto Gimeno Judges YC × DeepMind Multimodal Hackathon

Table of contents

Evaluating the next generation of multimodal AI projects
Selected finalist: Solus Forge
Exploring the next generation of multimodal AI
Supporting innovation in applied AI

Recently, Invofox CEO Alberto Gimeno participated as a judge at the YC x Google DeepMind Multimodal Frontier Hackathon, an in-person event held in San Francisco that brought together founders, engineers, and researchers working at the cutting edge of multimodal AI.

The event, hosted by Y Combinator in collaboration with Google DeepMind, focused on exploring new types of applications enabled by the latest generation of multimodal AI systems. Participants were encouraged to build products that combine audio, video, and image generation capabilities, moving beyond the traditional “chatbot” paradigm that has dominated much of the current AI landscape.

More details about the event can be found on the official page.

Participants during the YC x Google DeepMind Multimodal Frontier Hackathon in San Francisco

Evaluating the next generation of multimodal AI projects

As part of the judging process, Alberto joined a group of founders and engineers, including Y Combinator founders and Google DeepMind engineers, to review projects built during the hackathon.

In the initial evaluation round, Alberto reviewed and scored several submitted projects across a number of criteria, including:

Technical feasibility
Innovation and novelty
Real-world applicability
Market potential and fundability

As part of the evaluation process, Alberto provided independent scoring and feedback on the projects based on their technical execution, originality, and potential real-world impact.

This early review helped determine which teams would move forward to the final stage of the competition. The final judging panel consisted exclusively of Google engineers, who selected the overall winners from the shortlisted projects.

During the preliminary judging round, Alberto evaluated several emerging ideas exploring new applications of multimodal AI, including:

AI systems that generate cinematic video narratives from user-provided images
Tools for dynamically generating alternative story outcomes for media content
AI-powered contract interpretation and legal analysis
Robotics-focused applications
Accessibility tools designed to support navigation for visually impaired users

These projects demonstrated how quickly multimodal AI capabilities are evolving and how developers are experimenting with entirely new categories of software built on top of these models.

Selected finalist: Solus Forge

Based on the preliminary evaluation process, Solus Forge was selected as a finalist to advance to the final round of judging.

The project stood out for its strong combination of technical feasibility, innovation, and real-world applicability, and was among the top-scoring projects in Alberto’s evaluation.

Exploring the next generation of multimodal AI

The hackathon encouraged teams to experiment with advanced multimodal AI technologies developed by Google DeepMind, including:

Gemini 3.1, featuring expanded long-context reasoning and native agentic vision capabilities
Lyria, DeepMind’s model for high-fidelity music and audio generation
NanoBanana 2, designed for advanced image composition, character consistency, and detailed text rendering

By building with these tools, participants explored new forms of multimodal applications that combine visual, audio, and language-based reasoning.

For industry practitioners like Alberto — who works closely with AI systems that extract and structure information from complex documents at Invofox — participating in events like this provides a valuable opportunity to evaluate emerging ideas and contribute expertise to the broader AI ecosystem.

Supporting innovation in applied AI

Hackathons like the YC x Google DeepMind Multimodal Frontier Hackathon serve as important environments for experimentation and collaboration between founders, engineers, and researchers.

By bringing together experts from leading organizations and early-stage builders, these events help accelerate the development of new AI applications and highlight the expanding possibilities of multimodal systems.

For Invofox, staying closely connected to these developments helps inform how next-generation AI capabilities may shape the future of document automation and data extraction technologies.

Alberto Gimeno Judges YC × DeepMind Multimodal Hackathon

Evaluating the next generation of multimodal AI projects

Selected finalist: Solus Forge

Exploring the next generation of multimodal AI

Supporting innovation in applied AI

Start automating document workflows today.

Keep reading

Utility Bill OCR and Parsing: What Actually Makes It Hard

The Problems You'll Run Into Using Google Document AI

What Is Intelligent Document Processing (IDP)?