This Week in NLP #319
Keep up with what happened in NLP in the week ending Friday 27th December 2024.
Above the Fold
Overwhelmed? Here’s our pick for five things to know about this week.
OpenAI announced two new AI models, ChatGPT o3 and o3-mini, featuring improved performance across coding, mathematics, and science benchmarks, with public release expected in January. [AI Tools Club]
Gary Marcus argues that OpenAI's presentation of o3’s ARC test results created misleading impressions about AGI progress by inadequately explaining pretraining methods and omitting key comparative data. [Substack]
Google released an experimental Gemini 2.0 AI model to Advanced subscribers, offering enhanced capabilities in coding, math, and reasoning, though with limited features and access. [The Decoder]
Microsoft's agreement with OpenAI defines AGI achievement as generating US$100B in profits. [TechCrunch]
xAI has raised US$6B from 97 investors, including major firms like Andreessen Horowitz and BlackRock, doubling its total funding to US$12B. [TechCrunch]
Now read on for everything else that happened in NLP this week.
Making News:
What makes this newsletter different?
This Week’s Topics:
If you’re reading this in a desktop web browser, you can access a navigation menu at the left-hand edge of this window.
The Generative AI Wars
Accenture achieved record-breaking generative AI bookings of US$1.2B in Q1 FY25, bringing total bookings to US$4.2B since September 2023. [Analytics India Magazine]
Alibaba is dividing its AI operations between consumer products and cloud services, following competitors ByteDance and Baidu in restructuring to commercialize AI technology. [South China Morning Post]
Google accelerated its AI releases to compete with OpenAI's '12 days’ campaign, overwhelming users with rapid-fire product launches and updates. [Ars Technica]
Google released Gemini 2.0 Flash Thinking Experimental, a reasoning-based AI model that self-checks responses but shows accuracy issues in early testing. [Ars Technica]
Google's Gemini Deep Research tool, now available in 100+ countries, creates comprehensive web-based reports using AI-powered research planning and analysis. [Engadget]
Google Gemini is using Anthropic's Claude to improve its AI. [TechCrunch]
Google's rumored Gemini Live integration into Chrome browser would provide AI-powered assistance through a background-running interface with microphone and location access. [TechRadar]
Microsoft is seeking alternative AI models for its underwhelming 365 Copilot, citing OpenAI's GPT-4 as too expensive and slow for enterprise customers. [Gizmodo]
Nvidia and AMD's investment in xAI creates potential tension with their existing AI customers Microsoft, Google, and Amazon. [Yahoo Finance]
Here’s the Ars Technica recap of the 12 days of OpenAI. [Ars Technica]
OpenAI concluded its ‘12 days of shipmas’ event by unveiling o3, its most advanced reasoning model, alongside various features and improvements across multiple products. [ZDNet]
OpenAI granted ChatGPT Plus subscribers unlimited access to Sora, its AI video generator, during the holiday season as a surprise bonus gift. [TechRadar]
OpenAI’s GPT-5 project, code-named Orion, is behind schedule and running up huge bills. [The Wall Street Journal]
OpenAI's move away from nonprofit control could cost billions of dollars. [The Decoder]
OpenAI CEO Sam Altman criticized former co-founder Elon Musk, calling him a bully who seeks conflicts with other billionaires. [Fortune]
Salesforce has launched Agentforce 2.0, expanding its AI platform’s capabilities with workflow automation, Slack integration, and enhanced reasoning abilities across enterprise systems. [The Decoder]
Tencent is partnering with smartphone maker Honor for cloud computing and AI integration, while Apple still seeks a Chinese partner for its AI features. [Yahoo Finance]
xAI is expanding access to its Grok chatbot through a new iOS app and upcoming website, offering AI features including text generation, Q&A, and unrestricted image creation. [TechCrunch]
xAI's supercomputer has received a 150MW power boost from the Tennessee Valley Authority, despite local concerns about grid stability and community impact. [TechRadar]
Australia’s government plans to develop a national AI strategy by late 2025, aiming to boost capabilities, attract investment, and unlock US$600B in GDP growth. [TechRepublic]
Feature Creeps
Anthropic's updated Claude Analysis tool now handles larger Excel files, offers mobile support, and enables easier data visualization while maintaining a streamlined user experience. [AI Tools Club]
Google expanded Gemini’s in-depth research mode to 40 languages. [TechCrunch]
Google's Files app now allows Gemini Advanced subscribers to ask questions about PDFs they’re viewing, alongside similar context-aware features for web pages and videos. [The Verge]
Meta plans to incorporate small displays into its Ray-Ban smart glasses by late 2025 to show notifications and AI assistant responses. [Engadget]
Microsoft Teams Rooms enhances hybrid workspaces with digital signage, AI features, and third-party platform integration for improved collaboration and communication. [TechRadar]
OpenAI has expanded ChatGPT desktop app integrations, adding support for numerous IDEs, terminals, and text apps while moving towards computer automation capabilities. [VentureBeat]
And OpenAI’s ChatGPT Mac app update introduces voice commands and direct integration with coding and notetaking applications. [TechRadar]
Hardware
SemiAnalysis conducted a five-month evaluation finding AMD's MI300X GPU underperforms Nvidia's H100/H200 in real-world AI training despite better specs, due to significant software stack limitations and poor out-of-box experience. [SemiAnalysis]
Apple aims to develop its own AI chip by 2026, seeking independence from Nvidia following decades of tense relations marked by disputes and mutual distrust. [AppleInsider]
Broadcom has revealed that three hyperscalers plan to deploy one million AI accelerators each by 2027, with two more customers in development. [TechRadar]
Intel has added the Jaguar Shores AI accelerator to its Gaudi AI Accelerator roadmap as it seeks to compete more fiercely against AMD and Nvidia. [TechRadar]
Nvidia has unveiled the GB200 NVL4, an advanced platform designed to meet the needs of modern data centers and computational workloads. [TechRadar]
Nvidia enlisted Supermicro and Dell to investigate AI chip smuggling into China through spot checks on Southeast Asian customers’ servers. [Yahoo Finance]
Pocket, a $79 AI device from Open Vision Engineering, offers affordable conversation recording, transcription, and organization through a compact, smartphone-attachable design. [TechRadar]
2025 will be smart glasses all the way down. [Wired]
It’s Only a Model
Alibaba’s QVQ, an open-weight model built on Qwen2-VL-72B, achieves superior visual reasoning capabilities with a 70.3 MMMU score. [Qwen]
DeepSeek has released DeepSeek-V3, a 671B-parameter AI model that outperforms leading open-source competitors and rivals closed-source models from major companies. [VentureBeat]
Hugging Face researchers demonstrated how small language models can outperform larger ones, with their 3B Llama model surpassing the 70B version in math problems. [VentureBeat]
Hume Research's OCTAVE combines speech and language capabilities to generate diverse AI personalities and voices from brief prompts or recordings. [Hume]
Meta plans multiple Llama 4 releases in 2025, focusing on reasoning and voice capabilities. [The Decoder]
This piece looks at how small LMs can beat their bigger, resource-intensive cousins. [VentureBeat]
Whose Data?
Creative industry representatives strongly oppose the UK government’s plan to allow AI companies to use copyrighted works without permission, demanding proper licensing and payment instead. [The Guardian]
Here’s Computer Weekly’s top 10 data and ethics stories of 2024. [Computer Weekly]
The LLM Ecosystem
Apple's ReDrafter technology, integrated with Nvidia GPUs, accelerates ML token generation by 2.7 times, reducing hardware costs and improving AI response times. [AppleInsider]
Equinix is partnering with Dell Technologies to offer a private AI infrastructure solution using Nvidia technology across 260+ data centers for secure, scalable enterprise deployment. [AiThority]
Google's Android XR operating system, launching with Samsung in 2025, will power AR/VR devices with AI integration, spatial computing features, and dedicated development tools. [InfoQ]
Microsoft launched a free AI Skills Initiative certificate program with LinkedIn, offering multilingual generative AI training modules and professional certification through 2025. [TechRepublic]
Mindgard is develop specialized testing solutions that protect against vulnerabilities in AI systems during runtime. [TechCrunch]
Pinokio's 3.0 update enhances the open-source AI model browser with a customizable interface, improved package management, and automated browser interactions. [The Decoder]
Tray.ai, provider of an AI-ready composable integration platform, announced Tray Merlin Agent Builder to speed the creation and deployment of high-value, production-ready AI agents. [MarTech Series]
Writer's new RAG tool enables developers to build production-ready AI applications with automated data retrieval through simple API calls to their Knowledge Graph. [Writer]
Other LLM Sightings
Bridgeline Digital, a provider of AI-driven marketing technology, has announced a new Smart Response feature for HawkSearch that analyzes PDF content and delivers specific answers to user queries. [MarTech Series]
Certivo, a new spinout from Pioneer Square Labs, is using AI to rethink compliance management. [GeekWire]
Instagram plans to launch AI video editing tools, allowing creators to modify video elements using text prompts powered by Meta's Movie Gen technology. [TechCrunch]
Interact Marketing has expanded its AI-powered marketing solutions for 2025, focusing on campaign optimization, video production, and content generation services. [PRWeb]
Promeo's new AI-powered marketing tool creates complete campaigns and promotional materials using local processing for enhanced privacy and speed. [TechRadar]
Zoom is transforming from a video conferencing company into an AI-first work platform, expanding its AI capabilities across multiple services to enhance productivity and user experience. [Computer Weekly]
Risks
Anthropic research reveals that AI safety guardrails can be easily bypassed through automated methods like randomized capitalization, spelling errors, and audio/visual modifications. [404 Media]
ChatGPT’s confident, detailed responses are increasingly cited as authoritative sources despite being unreliable prediction tools, sparking a concerning trend in public discourse. [The Verge]
Edelman’s AI Center leader Gary Grossman warned that Trump’s 2024 election victory empowers AI accelerationists, potentially doubling existential risks from rapid development. [VentureBeat]
Meta's Llama AI model has been repurposed by Chinese military researchers to create ChatBIT, an unauthorized military intelligence tool, despite Meta’s explicit non-military licensing terms. [TechRadar]
A 15-year-old autistic teen became emotionally attached to an AI chatbot girlfriend, raising concerns about vulnerable users’ ability to distinguish between AI and reality. [The Atlantic]
Responses
Google Chrome is testing an on-device AI tool that analyzes web pages to detect potential scams while maintaining user privacy. [TechRadar]
IBM's red team successfully used AI to expedite vulnerability detection in a tech manufacturer’s HR portal. [The Register]
The Saudi Data & AI Authority has launched an online self-assessment tool helping organizations evaluate their AI systems’ ethical compliance across seven fundamental principles. [EIN Presswire]
Regulation
The European Data Protection Board published guidance on AI data protection, outlining requirements for model anonymity, legitimate data processing without consent, and consequences of GDPR violations. [TechRepublic]
Google proposes relaxing its default search engine agreements with Apple and others while rejecting broader government demands to sell Chrome following the recent antitrust ruling. [Yahoo Finance]
Nvidia's acquisition of Run:ai, an Israeli GPU orchestration software provider, received European Commission approval after investigation found no competition concerns. [TechRepublic]
OpenAI faces a €15m fine from Italian regulators for ChatGPT privacy violations, including inadequate data processing and child protection measures. [Reuters]
A bipartisan US Congressional task force released a 253-page report with 89 recommendations for US AI policy, balancing innovation with safeguards against potential harms. [The Register]
Voice News
ElevenLabs has introduced Flash, a fast speech generation model supporting multiple languages, designed for building interactive voice agents with low latency. [ElevenLabs]
Home Assistant launched Voice PE, a $59 privacy-focused voice control device that operates locally and supports over 50 languages for smart home control. [The Verge]
Returned.com launched AI voice technology that handles customer service calls for online shopping returns, eliminating hold times and automating the entire process. [PRWeb]
Voiseed's Revoiceit platform has introduced a new AI dubbing features including emotion detection, video mixing, and translation integration to enhance global content localization. [Slator]
Translation
Netix Ltd has launched Popnie, an AI language assistant offering translations, summaries, and communication tools across 100+ languages with context-aware capabilities. [EIN Presswire]
Wilby.AI has launched Think1.AI, offering instant video translations in 165 languages while preserving speakers’ voices through advanced voice-cloning technology. [EIN Presswire]
This piece asks whether large language models are ready for legal translation. [Slator]
Search
Apple rejects creating its own search engine due to high costs and time investment, while defending its lucrative Google search deal amid DOJ antitrust scrutiny. [The Verge]
Google plans to add an AI Mode to Search, offering conversational answers from a Gemini-like chatbot alongside traditional results, joining other companies launching AI search features. [PYMNTS]
Here’s a detailed analysis comparing OpenAI’s ChatGPT search with Google's performance. [Search Engine Land]
Perplexity acquired Carbon's retrieval framework to help enterprises connect their internal data sources to AI search capabilities. [VentureBeat]
Health Tech
Butterfly Wellness has launched Shammah, an AI-powered chatbot offering 24/7 psychological support for people with skin conditions through psychodermatology-based care. [EIN Presswire]
Casect launched a HIPAA-compliant, AI-powered platform that helps medical professionals efficiently document and track surgical cases. [PR Newswire]
Ed Tech
Turnitin's AI detection tool has processed 130 million papers and flagged 3.5 million as AI-written, but faces criticism over reliability and bias against certain students. [The Guardian]
Unbound Academy will replace human teachers with AI for two daily hours of personalized instruction at an Arizona charter school, testing claims of accelerated learning. [TechRadar]
Funding
Anysphere, creator of coding assistant Cursor, raised US$100m Series B at a US$2.6B valuation. [TechCrunch]
Backflip raised US$30m to develop AI technology that converts text and photos into 3D-printable models, aiming to revolutionize physical product design and manufacturing. [AiThority]
Blue J, a Toronto-based AI tax research platform serving major accounting firms, secured Series C funding to expand globally and enhance capabilities. [Feed the AI]
Coralogix acquired AI observability platform Aporia to integrate AI and software application insights. [Verdict]
AI startup Decart raised US$32m Series A funding for its GPU optimization software and AI gaming platform, reaching US$500m valuation two months after launch. [TechCrunch]
EzDubs raised US$4.2m seed funding for its real-time call translation app supporting 20 source and 34 target languages, with voice cloning and emotion preservation. [Slator]
While AI startups command high valuations, other tech companies struggle to raise funds, with Carta data showing significant valuation disparities and only 9% of Series A companies securing Series B funding. [TechCrunch]
There’s More
Ehud Reiter has a new book on Natural Language Generation that focuses on enduring principles rather than just current technology trends. [Springer]
Former Twitch CEO Emmett Shear has launched Stem AI, a startup developing AI that aligns with human behavior. [TechCrunch]
Sam Altman revealed he had indirect OpenAI ownership through Sequoia and Y Combinator funds, contradicting previous claims of having no equity in the company. [TechCrunch]
Macron’s 2025 AI Summit in France will help shift global AI discourse from nationalism toward cooperation, reflecting similar changes at the UN and US-China level. [Wired]
And Wired argues that generative AI still needs to prove its usefulness. [Wired]
The Back Cover
Here’s Herbie Hancock and Norah Jones with a cover of Joni Mitchell’s Court and Spark.
If you know of a great cover version of a back-catalogue classic, drop an email to us at backcover@language-technology.com and we’ll consider it for inclusion here.
Got this from someone else?
Did you find this newsletter useful? if so, you might forward it to a friend; or you could email us at news@language-technology.com to tell us what you want more of.
Did you hate this newsletter? if so, you could forward it to an enemy; or you could email us at news@language-technology.com to tell us why – make it better!