ZeroNoise Logo zeronoise

AI in EdTech Weekly

Active
Public Weekly at Monday 5:00 AM Agent time: 8:00 AM GMT+03:00 – Europe / Istanbul

by avergin 92 sources

Weekly intelligence briefing on how artificial intelligence and technology are transforming education and learning - covering AI tutors, adaptive learning, online platforms, policy developments, and the researchers shaping how people learn.

Guided AI Use Gains Evidence as Schools Set Guardrails
Mar 30
7 min read
1670 docs
AI in Education Podcast
Ethan Mollick
Sal Khan
+13
Research and implementation are drawing the same line: supervised, purpose-built AI can help learning, while open-ended answer tools often undermine it. This brief covers new tutoring evidence, NYC's traffic-light policy, narrow deployments in reading and counseling, and the global turn toward stronger edtech evaluation.

Structured AI is pulling ahead of generic chat

The clearest signal this week is not that AI tutoring works in the abstract. It is that constrained, teacher-mediated AI looks very different from open-ended answer machines. That distinction matters even more because model releases are now moving faster than traditional efficacy studies: panelists pointed to a flood of new GenAI studies, capability jumps every 5-7 months, and the practical problem that a model can change before a long RCT is even finished .

In one of the strongest classroom findings this week, ED reported a randomized trial comparing human tutoring with a supervised AI tutor on its platform. Across more than 3,200 conversations, students tutored by the human-in-the-loop AI did better on the next math question than students tutored only by humans. The AI exchanges were longer and more Socratic, with more questions that surfaced student thinking and misconceptions .

A very different result showed up in the Wharton math study shared by Ethan Mollick: students given ChatGPT during practice solved more problems, but the basic ChatGPT group later scored 17% worse on a no-AI exam than the no-tech group. Researchers found many students were simply asking for the answer and later believed that had not hurt their learning .

At the same time, Mollick pointed to a separate RCT showing that well-prompted AI tutors can boost learning, reinforcing the idea that prompt design and use constraints are not minor details; they are the difference between scaffolding and shortcutting .

“Learning needs to feel like a struggle. If you're struggling, you're learning. If it feels easy, you're not learning.”

That principle also surfaced in practitioner commentary: panelists argued that narrow, teacher-mediated AI can help with scaffolding, reading, writing, math, and teacher time-saving, while wide, unscaffolded student-facing AI use can undermine cognitive and social-emotional development and make cheating easier .

Policy is moving from abstract debate to usable rules

New York City's Education Department moved the conversation from general concern to an actual framework. Its preliminary guidance uses a traffic-light system: green-light uses include brainstorming lesson plans and drafting non-critical communications; yellow-light uses include finding trends in student data, translation, and adapting materials for students with disabilities with trained human review; red-light uses ban AI from grading, special education and 504 planning, discipline, counseling and crisis intervention, and academic placement decisions .

The city also drew a hard line on privacy. Personal student information cannot be entered into AI tools, approved products must go through a formal vetting process, and final guidance is due in June after public feedback. One unresolved issue: free tools do not go through the same contract review process .

School-level practice in New York is already converging around the same supervised-use logic. East Side Community School prohibits the unsupervised use of generative AI for schoolwork and assessments, while Brooklyn Collaborative asks teachers to label each assignment with green, yellow, and red AI permissions. Many English and social studies teachers have also moved back toward in-class handwritten writing to reduce AI-assisted cheating, despite the time costs .

The broader U.S. policy picture remains fragmented. Tina Austin, who advises on California education AI policy, described a landscape of framework fever, uneven district access to enterprise tools, and widespread confusion about using consumer AI with student data under FERPA and COPPA. Her practical advice is to start with local problems and school-approved tools rather than chase generic frameworks .

The most credible deployments are narrow, grounded, and workflow-specific

A useful counterweight to the hype came from EdSurge's conversations with 17 teachers: most are not reorganizing their classrooms around generative AI. They are using it first for productivity — lesson planning, newsletters, and administrative drafting — while testing instructional use cases more cautiously .

Where classroom use does look promising, it is usually tied to a specific learning job. Google Read Along is a good example. The tool's AI tutor, Diya, supports phonemic awareness, phonics, fluency, vocabulary, and comprehension through leveled and decodable texts, read-aloud/silent/listen modes, real-time feedback, and comprehension checks .

Inside Google Classroom, teachers can see accuracy, phonics gaps, fluency, comprehension patterns, and progress over time. Gemini can also help teachers create custom stories, re-level text, generate quizzes, and add their own content in multiple languages .

Just as important are the limits. Read Along is framed as a supplement, not a replacement for teachers, and its strongest value comes from targeted practice and feedback rather than open-ended conversation . Google says the product has already supported hundreds of millions of stories read by tens of millions of learners, and highlighted pilots in India and the Philippines that found significant reading improvement, along with differentiated deployments in Pakistan, Malaysia, and Australia .

Outside direct instruction, North Kitsap School District in rural Washington is using AI to strengthen multi-tiered systems of support. Staff use AI across well-being, academic, attendance, and behavior data to spot patterns, synthesize long plans, identify outlier interventions, and generate action steps. The district paired that with tiered professional development, including 27 lighthouse teachers who support classroom adoption .

That kind of data work still needs human interpretation. Another Tech & Learning analysis warned that AI can surface patterns in underused school data, but schools should stay data-informed rather than data-driven because the same pattern can mean very different things depending on context, and leaders still need direct observation and professional judgment to validate what AI finds .

Student support is also becoming a serious AI use case. High schools in New York are piloting CounselorGPT and EVA to answer procedural college-going questions, surface labor-market information, link students to resources, and give counselors better visibility into what students are asking. The goal is explicitly to free humans for fit, trust, and encouragement — not to automate the relationship itself .

Global edtech is getting more evidence-conscious

A second important shift this week is around how systems decide what to fund and scale. At UNESCO's Global Education Coalition meeting, participants argued for pedagogy-driven edtech transformation grounded in the science of learning and backed by evidence-informed investment, especially as AI-driven labor-market changes increase pressure on education systems to respond .

That logic is starting to show up in financing and evaluation. ICEI launched an EdTech Financing Advisory Facility to help governments assess cost-effectiveness, learning outcomes, equity, ethics, and environmental considerations when making edtech decisions .

UNICEF's Blue Unicorn portfolio is an even clearer signal. Its first cohort of edtech tools will be deployed across Egypt, Ghana, Malaysia, Rwanda, Uzbekistan, and Zimbabwe, with an explicit focus on foundational literacy, numeracy, teacher effectiveness, and inclusion. ICEI is running a quasi-experimental evaluation with about 600 lower-primary learners per intervention, using EGRA/EGMA-style measures, teacher surveys, and implementation data such as dosage, fidelity, and engagement .

For education leaders and edtech investors, the question is moving from 'Does this tool have AI?' to 'Can this tool show learning gains, equitable access, and realistic implementation conditions?' .

What This Means

  • For schools: Treat AI as a design choice, not a category. The best results this week came from systems that kept AI narrow, supervised, and tied to a clear instructional or support role .
  • For policy and procurement: Low-risk drafting, medium-risk support, and high-risk student decisions need different rules. NYC's traffic-light model is one concrete template, but privacy and vendor review still need as much attention as pedagogy .
  • For teachers and learning designers: The current sweet spot is selective use — revision support, guided reading practice, data synthesis, and procedural advising — while keeping humans responsible for interpretation, motivation, and relationship-building .
  • For higher ed and workforce learning: Institutions should expect private AI use and redesign around explanation, coaching, and real judgment. Mollick argues students already use AI quietly, universities are still figuring out how to teach in that world, and the traditional apprenticeship model is starting to fray when interns route first-draft work through AI .
  • For researchers and investors: Evidence cycles will have to speed up. If model capabilities change every few months, long trials alone will not be enough; faster research sprints and implementation-aware evaluations are becoming more important .

Watch This Space

  • Faster evaluation models. Investors and researchers are actively experimenting with research sprints and quasi-experimental designs because traditional RCT timelines no longer match model update cycles .
  • AI-native mastery platforms. Khan Academy says its reimagined product is rolling out with clearer learning paths, a more central Khanmigo, proactive teacher assistance, and early pilot signals of higher skill growth when teachers assign yearlong units. Summit Public Schools argues AI only improves systems when schools are already clear about outcomes, adult roles, and the whole model .
  • Hands-on AI literacy. BBC Bitesize built an AI guide around young people stress-testing AI in real time, against a backdrop where 47% of surveyed students already use AI for homework or revision and 24% say they do not know where to find trusted information about it. Teachers in EdSurge's study are also using AI literacy lessons to teach prompting, fact-checking, and bias rather than treat AI as authoritative .
  • Higher-ed and workforce pipelines. Mollick's warning is bigger than cheating: if novices stop doing the early work that builds judgment, companies and universities may need new ways to develop deep knowledge, wide knowledge, taste, and agency .
Structured AI Tutoring Shows Gains as Schools Build Capacity and Workflow Tools
Mar 23
6 min read
1968 docs
Austin Way
Andrej Karpathy
Sarah Guo
+12
Structured AI tutors and coaches delivered some of the week’s strongest evidence, while districts and universities focused on teacher capacity and workflow integration rather than novelty alone. The bigger gap now is evaluation, consistent policy, and helping learners use AI without losing human judgment.

Structured AI tutoring is starting to show measurable gains

The clearest learning signal this week came from structured AI support, not generic chat. A five-month randomized controlled experiment across 770 students in 10 Taipei high schools found that a GPT-4o-powered tutor that personalized problem sequencing improved final exam performance by 0.15 SD — roughly six to nine months of additional schooling by some estimates. Effects were larger for beginners, the gains appeared to come from stronger engagement and more productive AI use, and the result came without increasing instruction time or teacher workload.

AI support is also showing up beyond academic content. A preregistered study of 968 people found almost no relationship between feeling empathic and communicating empathy, but a single practice session with an AI coach made people measurably better at expressing empathy .

On the product-building side, a 17-year-old Alpha High student said he used Qwen 3 8B models with simulated human memory to teach 100,000 fake students social science content. Their average AP practice score reportedly rose from 3 to 4.43 in two weeks. That is not evidence from human learners, but it does point to a new kind of curriculum-testing loop for edtech teams .

Educator capacity is becoming infrastructure

The second major shift is that institutions are putting real weight behind educator enablement. The NSF awarded CSTA $11M to expand AI professional development for U.S. K-12 teachers .

At the district level, Mead School District’s four-part AI PD series starts with AI literacy before tool choice, then moves through cheating and assessment redesign, student use, and teacher workflow. Post-training data showed a 50% increase in teacher confidence using AI with students and a 48% boost in preparedness to teach AI ethics .

Higher ed is testing lighter-weight models. At the University of Michigan-Dearborn, “No-Prep” GenAI sessions combine a quick intro, about 20 minutes of tinkering, and discussion using a Four T framework — touch base, tinker time, talk, transition — designed to work for both skeptical and enthusiastic faculty .

"faculty are hungry to talk to each other about GenAI"

Those conversations are not just about adoption. Faculty also raised concerns about workload from fabricated citations, trust gaps between students, faculty, and administrators, and the risk of offloading too much thinking to AI .

In K-12 practice, educators interviewed by ISTE described moving from fear of AI-written student papers to using Gemini as a thought partner for lesson design, NotebookLM to turn dense readings into podcasts, and explanation-based assessment to check understanding — while still emphasizing ethics, bias checks, and the need to preserve the human element .

AI is moving into the daily workflow — for teachers, students, and support staff

The most useful product news this week was not another all-purpose chatbot. It was AI tied to specific jobs inside the learning workflow.

  • Kira 2.0 calls itself an “AI operating system for education.” Its Student Atlas maps skills and gaps and can generate interventions, lessons, and IEP drafts; Course Studio can build standards-aligned courses; and its assessment builder feeds grading and feedback back into the same system. The upside is consolidation. The caveat, from early district leaders, is that deployment requires strong instructional leadership and broad AI literacy to keep use consistent across teachers .
  • A university pilot of an AI support platform inside the LMS and student portal reduced repetitive queries, freed staff for harder cases, and gave students 24/7 help with routine issues. But it was intentionally restricted to approved institutional content, every answer had a traceable source, and the team stressed that it complements rather than replaces human advisors — and only works with a well-managed knowledge base .
  • ChatDOC lets users chat with PDFs, summarize dense texts, search specific sections, generate quiz questions, and click through to cited passages. Its limitation is the same one many educators worry about: summaries can become repetitive or too abstract, and students may rely on the summary instead of reading the source .
  • NotebookLM rolled out Cinematic Video Overviews to all Pro users in English, extending note synthesis into shareable video summaries. Useful scope expansion; still limited by plan and language availability .

Adoption is outrunning policy — and many learners are still unconvinced

The 2026 EDUCAUSE Students and Technology Report, based on about 8,600 students across 41 institution types, found that only 14% expect to use generative AI to a great extent in their future careers. Students also reported feeling less prepared in AI and related technological competencies than in other professional skills, often because of restricted use, limited exposure, and inconsistent course policies .

The practical message is not simply “add more AI.” Students want AI in the disciplines, clearer guidance on good and bad uses, technology simplicity, and strong instructor presence. They also want fewer tools and more intentional integration across courses .

National policy is not closing that gap yet. A White House AI legislative framework highlighted child safety and federal preemption of state AI laws, but an independent science AI evaluator noted that it still does not tell schools whether a classroom AI tool is scientifically accurate, whether it fails silently, or how to evaluate tools already in use .

What This Means

  • For school systems: this week’s most concrete implementation signals came from PD models and support structures — district series, no-prep faculty sessions, and NSF-backed teacher training — not just feature launches .
  • For instructional designers and teachers: AI looks strongest when it sequences practice, coaches communication, surfaces sources, or forces explanation. It looks weaker when it simply replaces reading, writing, or judgment .
  • For buyers and product teams: grounding, source traceability, leadership requirements, and human fallback are now core product questions, not edge cases .
  • For higher ed and workforce leaders: do not assume students already see AI as career-critical. They may need discipline-specific examples and more consistent policy before access turns into real skill-building .

Watch This Space

  • AI-native technical education. New curricula like Beyond Vibe Code are being built specifically for learners who already use AI coding apps, with 35 modules/projects and 250+ interactive lessons designed to work alongside those tools while going deeper under the hood . Andrej Karpathy is pushing the same idea further: education may need to teach humans to instruct agents to write software, not just write every line themselves .
  • Sustainability and discernment. One education podcast guest said data centers now consume more than 50% of Dublin’s electricity, and argued schools should teach students to ask whether an AI use genuinely improves learning or simply reduces productive struggle .
  • Human feedback as a differentiator. Students in multiple conversations said they still value teacher feedback over AI-generated responses, and educators warned against losing the human relationship at the center of learning .
  • Evaluation frameworks. Policy is moving faster than school-level evaluation, especially for subject-specific classroom tools .
Ohio’s AI Policy Mandate and the New Proof-of-Learning Problem
Mar 16
8 min read
1580 docs
Ethan Mollick
Luis von Ahn
Justin Reich
+19
This week’s clearest shift was governance: Ohio became the first state to require school districts to adopt formal AI policies, while other systems wrestled with guidance, transparency, and student voice. At the same time, AI tools became more workflow-specific — and the hardest questions moved to reliability, assessment, coaching, and proof of learning.

The lead — formal AI governance arrives

Ohio became the first U.S. state to require traditional public school districts, community schools, and STEM schools to adopt an official AI policy by July 1, backed by a state model policy covering AI literacy, ethical use, and data privacy . Columbus City Schools CIO Christopher Lockhart’s implementation advice is notably practical: secure superintendent-level backing, build a cross-functional working group that includes teachers, administrators, experts, and students, keep the policy general rather than naming tools, and plan for ongoing professional development as the technology changes .

"If we’re not teaching them the proper ethical safe way to use it, they’re going to just be out there on their own."

The same governance pressure is showing up elsewhere. New York City is proposing its first public high school focused on AI and computer science, but families and Panel for Educational Policy members are pushing back over unclear AI involvement, limited community engagement, and the lack of citywide AI guidance; the Education Department says guidance is expected in the coming weeks, followed by a 45-day feedback window .

The broader lesson is that schools still do not have settled “best practices” to copy. Justin Reich argues schools should adopt an experimental mindset and test policies and instructional practices with humility rather than pretend the right model is already known . Lance Eaton makes a parallel point in higher ed: many classrooms are adapting, but institutions are hesitating, and students should be part of defining responsible use instead of being left to navigate inconsistent rules across courses .

Theme 2 — The tool stack is getting more specific about the learning job it serves

This week’s most useful product news was less about generic chat and more about purpose-built learning workflows.

  • Microsoft Teach is positioning itself as a single hub inside Microsoft 365 for lesson plans, quizzes, standards alignment, content modification, and study aids such as flashcards and fill-in-the-blanks . It supports lesson planning from prompts or files, standards from 35+ countries, editable Word outputs, and Forms-based quizzes that can be used in Teams or an LMS . Boundary: access requires an educator login and Copilot Chat; some study-aid features need grounding content rather than a loose prompt, and student self-creation is limited to users 13+ who have Copilot Chat access .
  • Lincoln AI is being marketed as a curriculum-driven K-12 coach that guides inquiry rather than giving direct answers. It offers worksheet upload, voice or text interaction, teacher dashboards, safety alerts, and automatic adjustment to student Lexile/mastery levels . Boundary: it is intentionally designed not to write essays or simply provide answers; Lincoln Learning also reports a 99.7% “no hallucinations” rate because the model is trained on its own curriculum .

  • NotebookLM continues to expand its study workflow with ePub uploads, upgraded quizzes and flashcards, and custom infographic styles . Boundary: a science-education audit found that broken EPA/NOAA URLs and image-only PDFs could appear as loaded sources with no warning, meaning a notebook may look grounded when it is not; the same audit said NGSS alignment still needs subject-matter verification and some 5th-grade material pulled from middle-school content .

  • OpenAI’s new interactive visual explanations bring a different kind of learning support into ChatGPT: learners can manipulate variables and watch formulas and graphs change in real time across 70+ core STEM topics . Current scope: the rollout begins with those 70+ STEM topics rather than a broader subject range .

Theme 3 — Reliability and proof of learning are the real bottlenecks

AI-powered cheating remains a live classroom problem. Chalkbeat notes that AI-powered cheating remains rampant and that most teens say peers cheat using AI at least “somewhat often” . Teacher accounts this week describe students defaulting to AI for essays, homework, and even basic sentence-level work, pushing some teachers toward paper-based writing, in-class assessments, process grading, and student conferences to establish what work is actually theirs .

But a simple retreat to pen and paper is not a full strategy. Another Tech & Learning piece argues that banning AI repeats the old laptop debate: AI changes when thinking happens, so the more durable response is learning design that asks students to brainstorm, test ideas, revise drafts, critique outputs, and ask better questions . Higher ed is running into the same issue from a different angle. One analysis argues that generative AI has exposed how much colleges rely on completion, grades, and polished outputs as proxies for learning; the proposed fix is explicit competencies, calibrated rubrics, and durable artifacts such as portfolios, capstones, clinical evaluations, and research presentations .

Some schools are answering the reliability problem by teaching verification directly. At Kensington Health Science Academy in Philadelphia, students built Project FACTS — “find out where a post is from, analyze it, challenge it, think for yourself before you share” — into homeroom/advisory lessons, assemblies, and a student club tackling AI slop, medical misinformation, and political rhetoric .

Educators are also using imperfect media generators as literacy tools. One teacher experimenting with Google’s VEO found its history and science clips inaccurate enough to become useful for classroom critique, including spotting historical mistakes and discussing deepfakes and misinformation . Boundary: VEO currently sits behind Gemini Pro at $19.99/month with three video prompts per day, requires much more specific prompting than text chat, and that teacher said they would share teacher-generated videos rather than give students direct access .

Theme 4 — AI is moving into coaching, accessibility, and system operations

The most concrete system-level deployment came from Broward County Public Schools, which said it rolled out 20,000 Microsoft 365 Copilot licenses to staff and teaching-and-learning teams . Teachers report using Copilot to complete assignments more quickly and reinvest time in differentiated support and challenge . Students are also building with it: one student created an AI agent to help seniors understand graduation requirements, enrollment steps, and reminders for students, parents, and counselors . Beyond teaching and learning, district leaders estimate a conservative $40 million to $50 million in facilities savings over five years from AI-assisted analysis of inefficient operations .

That same pattern — AI handling structured support so humans can focus on higher-value interaction — appears in adult learning too. New York City Public Schools’ partnership with BetterUp offers optional human and AI coaching to central-office staff; some younger leaders prefer AI role-play because it feels like a safe, nonjudgmental space, and leaders report stronger work products and stronger connections between central offices and schools . Andrew Ng argues this broader division of labor is likely to matter: when AI or digital media take on more content delivery, teachers can spend more time on social-emotional support and more child-centered experiences .

In higher ed, Notre Dame’s evaluation of Meta smart glasses shows what accessibility-first AI can look like in practice. A PhD student with a visual impairment used them to identify ingredients, medicine, and mail, translate Korean instructions, summarize Latin texts, and explore ways to route captured text to a Braille device . Boundary: translation output is still clunky, film-production experiments ran into phone tethering and short recording limits, and privacy concerns remain around recording people and exposing sensitive documents .

What This Means

  • For K-12 leaders: policy is becoming infrastructure, not paperwork. Ohio’s mandate and NYC’s debate suggest districts will need living AI governance with student voice, general principles, and frequent administrative updates rather than school-board policies tied to today’s tool names .

  • For buyers and edtech teams: specificity is winning over generic chat. Lesson planning, worksheet coaching, standards alignment, study aids, and visual explanations are more actionable than all-purpose assistants — but only if products make grounding, grade-level control, and guardrails visible to the user .

  • For assessment design: the question is shifting from “Did the student submit something polished?” to “What can the student actually demonstrate?” That points toward process evidence, live explanation, portfolios, and performance artifacts rather than overreliance on AI detection alone .

  • For tutoring and coaching: the strongest upside still appears to be structured support with humans in the loop. Ethan Mollick points to large impacts from AI tutoring in World Bank work in Nigeria and Turkey and says the opportunity is big enough to justify policy attention, especially in settings where teachers remain part of the system .

  • For self-directed and lifelong learners: the promising pattern is deliberate practice, not content dumping. Duolingo says short daily sessions beat cramming, close-reading notebooks keep questions attached to context, and newer coaching and visualization tools appear most useful when they extend practice rather than replace it .

"The future belongs to schools that use AI to amplify teachers, not sideline them."

Watch This Space

  • Will other states follow Ohio? The combination of formal mandates, pending district guidance, and moratorium pressure suggests AI governance is moving from optional to expected .

  • Can AI tutor and coach systems prove impact at scale? Products like Lincoln AI are getting more structured, while Mollick is calling for public or nonprofit investment in universal tutoring systems rather than leaving the field entirely to commercial actors .

  • Will source-grounded study tools fix validation gaps as they expand formats? NotebookLM is adding ePub support and better flashcards, but the audit shows grounding UX is now mission-critical .

  • Will accessibility use cases push wearables into mainstream education workflows? Smart glasses already show promise in reference work, translation, and lab support, but privacy and accuracy norms are unresolved .

  • Will student-led AI literacy programs spread? Project FACTS offers one concrete model for teaching students to question sources, algorithms, and AI-generated media rather than only banning tools .

  • Will evidence-building become a bigger part of edtech scaling globally? Latin America’s Brilla competition is a useful signal: Umaximo and Swarmob used funding and mentoring to run studies, build certifications, and strengthen AI-enabled products before broader expansion .

AI in Learning Turns Toward Guided Practice as Schools Scale and Safety Rules Tighten
Mar 9
8 min read
1769 docs
Machine Learning Street Talk
Sal Khan
Justin Reich
+20
This week’s strongest signal is a design shift: the most credible uses of AI in learning are moving away from instant answers and toward guided practice, teacher control, and clearer guardrails. The brief also covers new school and university deployments, rising workforce pressure for AI competence, new tools such as NotebookLM video overviews, and the policies now shaping student-facing AI.

The lead — AI is being redesigned around guided practice, not just answers

Several sources this week converged on the same design principle: AI helps learning most when it keeps the learner thinking, gives teachers more control, and fits inside a human system of practice and accountability .

Chris Yue Fu’s eight-week study of 15 undergraduates found students often used AI summaries as the thing they read, not as support for reading, and only 4.3% of their prompts used effective strategies even after instruction . Fu’s takeaway was not to remove AI from reading, but to redesign it so teachers can set goals in advance and the system pushes students into higher-order questions instead of ending the task after one answer .

"When a student asks for a summary and gets one, the system has done its job, but the student hasn’t done theirs."

The same pattern showed up elsewhere. Ethan Mollick argues that learners gain when AI supports coding, but not when it replaces the intellectual work; he also pointed to findings that vibecoding can hurt developers’ ability to read, write, debug, and understand code, without producing a statistically significant speed gain . Justin Reich’s tutoring work frames the distinction similarly: good tutors do not just answer questions; they question answers, and schools still need small experiments rather than sweeping assumptions about best practice . One cited high-school math study made the tradeoff stark: ChatGPT access increased correct practice problems by 48% but lowered actual test performance by 17% .

Some schools are already building around that insight. In Italy’s GEMI project, teachers who began skeptical of Gemini ended up using it as a thinking partner, including generating deliberate errors in a literature text for students to detect, while keeping the educational relationship central and using NotebookLM to support students with specific learning needs .

Theme 2 — AI is becoming part of the school operating model

The most concrete AI-school expansion this week came from Alpha School, which is opening a new K-8 campus in The Woodlands in fall 2026. The operational detail is notable: leadership says demand is not the main growth constraint; real estate is. Alpha also added Nate Eliason to expand AI and entrepreneurship at the high-school level .

Elsewhere, the infrastructure is getting more local. A school in China reportedly repurposed M1 Ultra Macs, clustered them with Exo, ingested its full school corpus, and gave each student and teacher a personalized, free, private AI agent grounded in real school data . Alpha leadership highlighted that example as part of global benchmarking for what school AI may look like outside the U.S. mainstream .

Khan Academy’s recent signals show how much implementation still matters. Khanmigo is framed as immediate-feedback tutoring in math and writing, but Sal Khan says the learning gains come from more practice at a student’s learning edge with teachers in the loop, not from AI alone . Reported thresholds of roughly 18 hours a year or 60 skills to proficient are associated with meaningful gains, an India study found a 0.44 effect size at reasonable dosage, and one Newark district area reported twice average state-test growth . The district playbook is decidedly human: training, leadership, support for teachers, and soft accountability rather than punitive mandates, with support priced at about $15 per student per year .

Theme 3 — AI competence is moving into curricula, credentials, and workforce training

Higher education is moving from generic AI talk to domain-specific expectations. Purdue now requires AI competency for all graduates, with each discipline defining what job-ready use looks like . The University of Sydney and UTS partnered with Harvey AI to prepare law students for a legal AI system already used in professional practice . The University of Manchester says it is rolling out Microsoft 365 Copilot and training to all 65,000 staff and students .

Self-directed and professional learning are following the same path. DeepLearning.AI released a free AI Skill Builder to help learners assess what to learn next, alongside a new JAX course on building and training a 20M-parameter MiniGPT-style model and a broader roadmap centered on agents, external data, evaluation loops, alignment with human intent, and interaction with tools . EMERGai opened applications for an NSF-funded institute that gives early- and mid-career STEM education researchers at U.S. resource-limited institutions stipends, training, and support to use GenAI ethically across literature review, data collection, analysis, interpretation, and writing .

Corporate training is scaling too. Gauntlet AI says it has worked with more than 80 training and hiring partners in its first year and expects to more than double that figure, while also finding that bringing product managers alongside engineers matters because the shift is cultural and tooling-related, not just technical . The labor-market pressure behind these moves is showing up in anecdotes from computing education as well: in one account of a Berkeley CS cohort, 31 of 340 majors had offers, while postings increasingly asked for AI/ML experience and the ability to review AI-generated code .

Theme 4 — Guardrails are becoming a first-order product requirement

Child safety and governance are moving from side discussions to product requirements. Under the UK Online Safety Act, Ofcom says services must use age assurance for harmful content, prevent algorithms from recommending that content to children, and carry out child risk assessments when they launch generative AI features. Regulators also say some services should simply not be available to children .

Experts in the same discussion flagged newer harms that are harder to regulate cleanly: emotional dependency on chatbots, harmful advice, deepfakes, explicit chatbot conversations, bias, and personal-data exposure. Their recommendation was that schools discuss AI ethics and safeguarding explicitly and bring parents into those conversations, especially since AI was the top issue in one survey of 800 schools about online-safety conversations with families .

That regulatory lens is colliding with product reality. Google’s Gemini API terms say developers must not use the service in apps directed toward or likely to be accessed by people under 18 . In student mental health, Alongside is now used in more than 200 U.S. schools and costs about $10 per student per year; one Florida counselor credited it with surfacing a severe self-harm alert and helping handle routine problems so human staff could focus on crises . But clinicians and researchers cited by EdSurge warn that AI lacks human discernment, should not substitute for counseling, and can encourage parasocial attachment if it signals emotional reciprocity .

In higher ed, adoption is already ahead of policy. A survey discussed on the AI in Education Podcast found 73% of respondents using AI daily or weekly, more than half using tools not provided by their institution, and only 13% saying their university measures ROI . A complementary argument from edtech researchers is that AI tools should be judged on efficacy, effectiveness, equity, ethics, and environment — not adoption alone .

Theme 5 — The tool layer keeps expanding, but capability and limitation are arriving together

NotebookLM’s biggest education-facing release this week was Cinematic Video Overviews. Gemini chooses a format and visual style, critiques its own footage, and turns source material into bespoke videos from a user’s sources . The limitation is equally clear: the feature is fully rolled out only to Ultra users in English for now, with Pro users still waiting .

Groovelit shows the opposite end of the stack: a free grades 4-10 writing platform where students write in timed rounds and get live AI feedback on grammar, relevance, vocabulary, and engagement. Teachers can align prompts to curriculum, review aggregated data, and support English language learners with adjusted difficulty .

At the curriculum-engineering layer, Austen Allred says he is testing AI against 21 learning-science requirements, including spaced repetition, retrieval practice, semantic tree traversal, and mastery-based progression . His own limitation note is blunt: the software harness was easy; getting the AI to follow it reliably — and stop inventing fake reviews or user numbers — was not .

What This Means

  • For K-12 leaders: prioritize tools that preserve productive struggle. The consistent pattern across reading, tutoring, coding, and classroom examples is that AI is strongest when it asks better questions, provides feedback, or scaffolds practice — not when it substitutes for reading, writing, or reasoning .
  • For school systems and edtech operators: treat AI adoption as an operating-model problem, not just a product choice. Reported gains around Khanmigo depend on training, leadership, dosage, and teacher engagement, while Alpha’s next bottleneck is physical expansion capacity, not interest .
  • For higher ed and workforce programs: move from generic AI literacy to domain-specific workflows. Purdue, Harvey AI in law, Manchester’s Copilot rollout, and Gauntlet’s mixed PM/engineer cohorts all point to AI competence becoming contextual, team-based, and tied to real tools learners will encounter at work .
  • For edtech buyers and investors: under-18 access rules, child-risk design, and measurable impact are moving to the center of procurement. Gemini’s age terms, Ofcom’s expectations, Alongside’s limits in mental-health use, and the low rate of ROI measurement in higher ed all point to a more demanding buying environment .
  • For self-directed learners: build sooner, but keep ownership of the thinking. DeepLearning.AI warns against staying in tutorial mode, and multiple sources this week warned that outsourcing cognition to AI weakens learning even when it makes the task feel easier .

Watch This Space

  • Teacher-controlled tutors that keep the conversation going: especially tools that ask follow-up questions, set teacher goals, and steer students toward reasoning rather than one-shot answers .
  • Whether private, local school agents grounded in institutional data move beyond isolated examples: the China deployment is a concrete model to monitor .
  • Whether discipline-specific AI requirements move beyond the current university rollouts: this is already visible in graduation requirements, campus-wide copilots, and workplace-linked tools .
  • How youth-facing companions and mental-health-adjacent bots are governed: one reported figure says 72% of teens have used AI for companionship at least once and 52% do so daily, even as regulators and clinicians flag emotional dependency as an emerging harm .
  • How multimodal study aids are used in practice: from NotebookLM video generation to dual-voice podcasts for text comprehension, AI is expanding how source material gets remixed for learners .
Google’s 6M-teacher AI push meets a new reality: prove learning outcomes, not just adoption
Mar 2
7 min read
2253 docs
liemandt
Dario Amodei
MacKenzie Price
+15
This week’s biggest signal: Google is rolling out free AI literacy training and tools (Gemini, NotebookLM) to all 6 million U.S. educators—accelerating adoption while raising the stakes on outcomes and governance. We also track AI-native school models (Alpha, Flourish), assessment redesign to handle AI, and new evidence on impact (including an RCT suggesting AI can narrow skill gaps).

The lead — AI is going mainstream for educators, but outcomes and governance are becoming the gating factors

Google is rolling out free AI literacy training and tools (including Gemini and NotebookLM) to all 6 million K–12 and higher-ed teachers in the U.S.. At the same time, multiple signals this week point to the same friction point: adoption is moving fast, but schools are increasingly demanding training, visibility, and evidence of learning impact—not just “time saved.”

Theme 1 — AI literacy is shifting from optional PD to core infrastructure

Teacher training and policy readiness are still uneven

A synthesis of 25 studies on K–12 generative AI use (2023–2025) found that 25–87% of teachers report using AI (depending on how the question is asked) . But institutional support is lagging: ~50–52% report no formal AI training, and 60% of schools offer no guidance.

The same synthesis argues that teacher use is primarily back-end work (planning, creating assessments, editing, communications), with minimal student-facing use (Utah: 17.3% personalization; 9.5% chatbots) . It also highlights a major evidence gap: none of the studies in that set measured whether teacher AI use improves student learning outcomes .

A district example: “companion, not compliance”

Franklin Township Schools (Indiana) described building an AI culture where AI is treated as a learning companion rather than “another initiative to fear” . Their approach started with a year of foundational professional development explaining how tools relate to underlying large language models , and then a pilot of School AI with special education and English language learners (including help drafting thesis statements and translating for a Punjabi-speaking third-grader) .

They chose School AI partly because teachers can see student chat interactions, unlike Google’s Gemini integration (as characterized in the article) .

“We don't view AI as an initiative, we view it as a companion.”

Higher ed is formalizing norms too

In higher ed, Lance Eaton maintains an AI syllabi policy repository with 200+ policies and argues teaching needs to move away from a “production line” model toward relationship-building—because generative AI will outcompete humans on speed, scale, and efficiency .

Theme 2 — “AI-native school design” is getting more concrete (and more expensive)

Alpha School: closed-loop personalization, with strict controls to reduce cheating

New details from an Alpha School interview describe a K–12 model delivering a full academic program in ~2 hours/day using AI tutors (not chatbots) and mastery-based lesson plans on its Timeback platform, with $100M+ invested.

Operationally, Alpha described:

  • A closed-loop data cycle: implement learning-science ideas, generate lessons, measure learning, and adapt based on standardized test results .
  • Screen monitoring via vision models, with reported spend of ~$10,000 per student in AI tokens to detect guessing/scrolling/skipping explanations and coach self-driven learning habits .
  • A gamified attention metric: 1 XP = 1 minute of focused learning.
  • Guardrails around chat: chat disabled during morning academics due to cheating risk; framed as useful/expected in afternoon workshops .

Alpha also reported high-stakes outcome claims for SAT performance: 1410 average across the high school (freshmen through seniors) and 1535 for seniors .

Flourish microschools: AI as Tier 1 foundations to create “teacher luxury time”

Flourish Schools (AI-native microschools for grades 6–8) reported using conversational AI tutors for Tier 1 foundational skills (reading, writing, math) . The stated design goal is to free teacher time during a one-hour “foundations block” so teachers can work 1:1 with students who need extra support (including special ed and ELL students) while others progress with AI .

Theme 3 — Assessment and integrity: schools are redesigning the work, not just detecting the output

“You can’t do that anymore”: new assessment patterns

Jon Bergmann’s “Mastery Flip” framework argues traditional assignments (like research papers) no longer work because AI can generate them instantly . His proposed response combines:

  • AI Engines for independent learning (with teacher-controlled tutors that ask questions) .
  • Analog Roots in class time to protect “productive struggle” .
  • Human Checks that validate the cognitive journey (e.g., students build a trebuchet, then explain the physics live on a whiteboard) .

Classroom reality: trust, cheating, and false accusations

Across teacher and student accounts, a recurring pattern is that unclear expectations create friction:

  • A teacher described students rejecting an explanation until ChatGPT confirmed it: “Ok, he’s right.”
  • A Year 9 student reported being accused of AI use; even after a Chromebook history check showed no evidence, the teacher capped the grade because they “couldn’t prove” the student didn’t use AI . Suggested remedies included version history and in-person writing samples .
  • MIT TeachLab’s interviews (90+ teachers, 30+ students) describe student ambivalence: performance pressure + temptation to offload work, paired with confusion from contradictory messaging and a request for clearer boundaries .

A blunt warning from AI builders

Anthropic CEO Dario Amodei described students having AI write essays as “basically just cheating on homework” and said internal studies showed deskilling in coding can occur depending on how models are used .

Theme 4 — Tools moving into daily workflows (with clear limitations)

NotebookLM: “Slide Revisions” reaches everyone (including mobile)

NotebookLM announced that Slide Revisions have rolled out 100% to all users and are now fully rolled out on the mobile app as well .

Classroom creation tools: branching narratives and film-based micro-lessons

  • SceneCraft: teachers can create interactive branching narratives designed to explore cause/effect, character motivation, and ethical dilemmas; it includes AI-assisted scenario generation and is often framed as engaging for students who like game-like formats .
  • Reel Genius: a platform combining micro-lessons with film clips and AI-based reflection questions, designed for in-class high school use where teachers assign lessons (including “entrepreneurial modules”) .

Teacher productivity still needs structure and safeguards

  • A university reading/writing instructor described using Claude to generate an HTML portfolio template, then using a Claude-generated Google Apps Script to push daily prompts from Google Sheets into individual Google Docs portfolios .
  • Practical guidance for teachers emphasized process (role + task + format; multiple versions; constraints), plus privacy practices like avoiding identifiable student info in prompts .

Theme 5 — Evidence is emerging, but “show me the outcomes” is now the standard

A new RCT result: AI may narrow skill gaps

A randomized experiment discussed by Ethan Mollick reported AI reduced the gap between more- and less-educated participants on a business task by 75%—with the caveat raised in the same post: it’s worth asking whether AI is simply doing the work . (Paper link shared: https://www.nber.org/papers/w34851)

Literacy: an outcomes-first bar is being set explicitly

One analysis argued AI can strengthen literacy instruction only when it accelerates mastery of foundational skills (e.g., identifying decoding gaps early and enabling progress monitoring), and that tools should be held to measurable improvements in phonemic awareness, decoding, fluency, and comprehension. It warns against AI used as a shortcut for engagement or text generation without evidence of independent reading ability .

What This Means (practical implications)

  • For district and school leaders: The next phase is less about “allow vs. ban” and more about building capacity (training + clear expectations) and selecting tools that provide visibility into student use where appropriate . Google’s 6M-teacher rollout increases baseline access, but it doesn’t resolve local governance decisions .

  • For edtech builders and investors: The market is rewarding products that connect to outcomes and operational constraints (teacher workflow, student engagement, guardrails), not just model access . Alpha and Flourish highlight an “AI-native school” category where product, staffing model, and assessment design are inseparable .

  • For higher ed and L&D: Policy repositories and institutional engagement are growing, but the deeper shift is designing learning around what humans still uniquely provide—relationships, coaching, and high-integrity assessment—while acknowledging AI changes the cost and shape of producing work .

  • For learners and parents: Students are already using AI broadly (including for homework and advice in some contexts), and confusion about “what counts” drives both misuse and unfair accusations . Clear norms and process-based evidence (e.g., drafts/version history, live explanations) are becoming practical protections .

Watch This Space

  • AI literacy at scale: whether mass training efforts (like Google’s) translate into consistent classroom practice—and whether schools pair training with clear, student-facing expectations .
  • Assessment redesign becoming mainstream: more “human checks” and productively constrained in-class work, alongside AI-supported independent practice .
  • AI-native school economics: high-touch personalization models (including expensive real-time monitoring) versus leaner “AI foundations + teacher time” designs .
  • Proof of learning, not proof of non-AI: growing use of process artifacts (drafts, transcripts, live explanation) to reduce both cheating and false accusations .
Two-hour mastery models surge, as safety benchmarks and workflow AI reshape adoption
Feb 23
7 min read
1716 docs
Dario Amodei
Dario Amodei
MacKenzie Price
+22
This week’s most consequential signal: Alpha School’s two-hour, AI-powered academic model is now paired with unusually strong NWEA MAP achievement-and-growth claims, intensifying interest in “time back” learning architectures. We also track the hardening of child safety and integrity into benchmarks, AI moving deeper into teacher workflows, and global scale-ups centered on India.

The lead — Alpha School’s 2-hour academic model is now backed by unusually strong achievement and growth claims

Alpha School’s mid-year results (as reported via NWEA MAP data) describe K–12 students scoring at the 99th percentile across Math, Reading, Language Usage, and Science, with the school landing between the 99.5th and 99.9th percentile when compared at the school level—roughly top 130 to 650 out of ~130,000 U.S. K–12 schools .

The same report emphasizes something harder to produce at the top end: continued growth despite the “ceiling effect,” including kindergarteners growing 4.36 standard deviations above predicted in one semester and continued gains in grades 9–11 where reading growth is often described as zero or negative nationally .

Operationally, Alpha’s model is consistently described as:

  • ~2 hours/day on core academics (math, science, reading, language), broken into short bursts with breaks—finishing academics by lunch
  • AI tutor personalization “under the surface” (not student-facing chatbots during academic time) to reduce cheating risk while still adapting instruction
  • A teacher role shift toward motivation and emotional connection—teachers as guides rather than primarily lecture/grading engines
  • Afternoons devoted to other skills (e.g., public speaking, financial literacy, leadership), described as additive rather than a trade-off

A related set of signals (from Alpha’s principal and parents) reinforces that the model is being treated as measurable and auditable at the individual student level, not just as a school-wide narrative: parents reportedly took the prompt used to analyze results across Alpha schools and ran it on their own child’s data .


Theme 1 — “Time back” models are expanding from outcomes claims to new markets

The “two-hour mastery” framing is no longer confined to K–12 private-school experiments:

  • A Reddit team is explicitly building an adult version of Alpha School: “pure-online 2-Hour Mastery” using adaptive AI learning on “high-ROI skills,” with pre-sales and a “Wizard-of-Oz” pilot planned in weeks . An Alpha School engineer endorsed the direction while noting Alpha’s core focus on K–12 scale (~1B students) .
  • Alpha-aligned messaging has increasingly centered “TimeBack” as a productable philosophy—2 hours of focused AI learning with “2x the outcomes,” freeing the remainder of the day .

This expansion matters because it shifts the competitive set: instead of “edtech tools vs. classrooms,” it becomes “new time architectures for learning” across K–12, higher ed, and workforce upskilling.


Theme 2 — AI is moving “into the workflow” for teachers and knowledge workers (not just into apps)

Several updates this week point to AI becoming a layer inside planning, authoring, and synthesis workflows.

Classroom planning and content creation

  • Khan Academy inside ChatGPT: Khan Academy says it’s one of the first education apps integrated into ChatGPT, enabling teachers to generate standards-aligned math questions directly where they plan . Usage is framed as “Khan Academy + your prompt” .

Synthesis and presentation tools (NotebookLM)

NotebookLM continues to push into “turn sources into deliverables”:

  • In chat, users can ask NotebookLM to create an infographic summarizing points, with the Q&A context used for customization; the same workflow is pitched for audio/video overviews, slide decks, flashcards, and quizzes.
  • Prompt-based slide revisions are rolling out broadly (tweak text/color/visuals by prompting) and NotebookLM also added PPTX export for slide decks .
  • The mobile app now supports customizing video overviews.

Agentic tools for teacher productivity (with cautions)

Tech & Learning reviewed OpenClaw, a free/open-source agentic assistant designed to run on a personal computer (positioned as more private/controllable but also harder to set up) . Even in a browser-based version, the reviewer found it strong at research and class prep (e.g., summarizing lesson plans and categorizing research topics), while emphasizing it’s worth exploring cautiously on personal devices rather than school-issued ones .


Theme 3 — Integrity and child safety are hardening into measurable requirements

A new benchmark: KORA (AI child safety)

KORA describes itself as the first public benchmark for AI child safety. Two findings are especially relevant to schools:

  • Educational integrity is a major blind spot: models were inadequate in 76% of cheating/academic dishonesty scenarios .
  • Avoiding anthropomorphism correlates with emotional safety (r = 0.84): models that maintain clear boundaries (not “pretending to be human”) score better across safety categories .

On-the-ground cheating signals (and anti-cheating product responses)

  • A teacher reported suspect quiz cheating based on odd notation (e.g., m\*G\*H where m is mass), with commenters pointing to LLM escape characters and copy/paste artifacts as the likely source .
  • Wayground AI reportedly added anti-cheating settings.
  • Ethan Mollick argued that some AI-generated student work is straightforward to identify and that educators will shift toward methods that evaluate student—not AI—performance.

Governance and “responsible AI” infrastructure

Institutions are also responding at policy/process level:

  • IIT Delhi described creating a committee to integrate responsible AI and ethical use by faculty and students , alongside a School of AI and expanded programs .
  • EDUCAUSE discussions highlight governance anxiety around accountability and data handling (e.g., retention policies and FERPA data) .

Theme 4 — Global deployments: India as a focal point for “AI at scale” in education

Multiple sources emphasized India as a center of gravity for education-oriented AI deployments:

  • Google DeepMind’s Demis Hassabis said DeepMind is partnering with Atal Tinkering Labs to bring GenAI assistance to 10,000+ Indian schools and 11 million students, focused on robotics and coding in classrooms .
  • In a fireside chat at IIT Delhi, Sam Altman cited India as OpenAI’s second largest market, claiming 100 million ChatGPT users with one-third students.
  • Anthropic CEO Dario Amodei described partnerships with nonprofits including Pratham and Central Square Foundation to use Claude models to advance education (alongside digital infrastructure, agriculture, and health) across the Global South . Anthropic also described benchmarking Claude’s performance on India’s regional languages for practical tasks including educational content.
  • Anthropic also signed an MOU with the Government of Rwanda to bring AI to health, education, and other public sectors .

Theme 5 — Evidence updates: what current research says works (and what fails) in learning workflows

A research roundup (8 papers) surfaced patterns that map cleanly to practical adoption:

  • AI grading isn’t reliable as a sole grader: in grading 184 university student projects, ChatGPT gave the highest marks and was an outlier vs. peers and lecturers; the conclusion was don’t use ChatGPT alone for grading, especially for final marks—use it for formative feedback and structured checks, with humans for final synthesis .
  • AI tutoring can improve writing when students ask targeted questions: students using an AI chatbot asked more direct, specific questions than with a human tutor and produced higher quality essays; results were tied to the quality of questions.
  • “Question-only” AI for planning: a custom GPT that only asked sequential questions (and didn’t write) helped 17 high school students plan writing by pulling thinking “out of you” .
  • Disclosure penalty: across 16 experiments with 27,000 participants, identical creative writing was rated lower when labeled as AI-written; the bias was hard to remove .

A separate higher-ed framework argued for more intentional GenAI use—especially protecting “meaning-making” as a human responsibility and watching when AI removes learning-relevant friction .


What This Means (practical implications across learning contexts)

  • For K–12 operators and investors: Alpha’s reported NWEA MAP outcomes (99th percentile across subjects; top-of-scale growth) raise the bar for “AI school” claims: compression + growth is the differentiator, not just personalization language . If these results hold up over time, expect more “two-hour mastery” competitors and adult-market adaptations .

  • For district leaders: Safety and integrity aren’t abstract—KORA’s benchmark shows models often fail in cheating scenarios (76% inadequate), and emotional safety correlates with avoiding anthropomorphism . Procurement checklists are likely to evolve from “does it have guardrails?” to “what’s your measured performance on integrity/safety benchmarks?”

  • For teachers: The cheating thread illustrates that enforcement is often about practical signals (copy/paste artifacts, escape characters) rather than perfect detection . At the same time, research suggests a productive alternative: design AI supports that keep students doing the work (question-only planning; formative feedback; targeted questions) .

  • For higher ed: EDUCAUSE points to a “messy middle” where pedagogy and governance are lagging fast tool adoption, especially around assessment validity and data handling . The most stable near-term pattern is hybridization: AI for drafts/checks, humans for judgment and meaning.

  • For L&D and workforce upskilling: Talent pipelines are being built around AI fluency as a skill. Gauntlet AI describes a highly selective training funnel (thousands screened, 10-week “gauntlet,” tiny completion rate) alongside broader team training offerings . This aligns with broader advice emphasizing staying current with tools and building projects, not just learning static technical knowledge .


Watch This Space

  • “Two-hour mastery” architectures moving from boutique schools into adult upskilling products (and whether outcomes can be measured credibly outside controlled environments) .
  • Benchmarks as procurement inputs: KORA-style child safety and integrity scoring becoming a standard part of vendor evaluation .
  • AI in-chat “app” distribution (e.g., Khan Academy in ChatGPT) reshaping how teachers discover and use trusted content .
  • India-first education deployments: large-scale school partnerships (10,000+ schools) and language benchmarking for educational tasks .
  • Assessment redesign via workflow choices: more “question-first” planning, formative AI feedback, and human synthesis—rather than betting on automated grading or detectors .
OECD’s call for Socratic AI, school-wide platforms, and education-specific model benchmarks
Feb 16
9 min read
1851 docs
Austen Allred
Anthropic
Andrew Ng
+11
This week’s developments center on a clear message from the OECD: AI improves education outcomes only when it’s designed with learning guardrails (Socratic tutoring, process visibility, and age-appropriate behavior). We also cover new “AI inside the workflow” moves (Khan Academy in ChatGPT, MagicSchool’s AI Operating System), education-specific model benchmarking, early browser/agent tools for educators, and AI’s expanding role in advising and career guidance.

The lead — The OECD’s message: AI in classrooms needs learning guardrails, or outcomes can fall

The OECD’s Digital Education Outlook 2026 (as discussed on the AI in Education podcast) frames a core tradeoff schools are already feeling: AI can help teachers with planning, but it can also undermine academic integrity and learning if it’s used as an “answer engine.”

Key points highlighted:

  • Teacher impact (mixed): 6 in 10 teachers say AI helps with writing or improving lesson plans; 7 in 10 believe it can harm academic integrity by enabling students to pass AI work as their own .
  • Student learning impact depends on design: “Unguided” generative AI use can lead to lower exam results because students substitute AI for their own learning; Socratic tutors (e.g., systems instructed not to give answers) can improve exam results by guiding learning instead .
  • A named risk: the podcast describes “metacognitive laziness”—where learners stop “thinking about thinking” because AI does that work for them .
  • Practical recommendations: use AI with pedagogical intent; build tools with learning guardrails (not generic ChatGPT); pursue stronger regulation (including age-appropriate behavior for under-16s); and ensure equitable access as digital learning expands .

Policy and implementation are moving in the same direction: the podcast also summarizes UK DfE AI standards (Jan 2026) that call for content filtering/monitoring/alerts, distress detection with human referral, and “no cognitive substitution,” alongside rules discouraging persuasive design and anthropomorphic “personhood” cues (e.g., names/avatars) . It also notes £23M in funding to support development of AI tutoring tools, with a stated goal that disadvantaged students in Years 9–11 have access to an AI tutor by the end of 2027 .


Theme 1 — “AI inside the workflow”: trusted content and school-wide platforms

Khan Academy appears in ChatGPT (planning flow integration)

Khan Academy says it is one of the first education apps integrated into ChatGPT, bringing trusted, standards-aligned math questions directly into teachers’ planning flow . The post positions the Khan Academy + OpenAI partnership as “less prep time, more teaching” with trusted content .

Khan Academy also launched a Writing Coach essay prompt library, letting teachers filter prompts by grade/subject and assign them instantly .

MagicSchool’s “AI Operating System for schools” framing

MagicSchool AI announced an AI Operating System for schools built around four “pillars”: classroom-designed safety, a context-aware educator workspace, a student workspace that meets learners where they are, and a district workspace connecting the system .

It also emphasized moving “from time-saving to outcome-driving,” naming writing feedback, Magic Quizzes, personalized learning profiles, and a knowledge graph intended to explain “why” something happened and suggest next actions . MagicSchool reported 1,200+ educators and district leaders attended the webinar unveiling it, and said 7 million educators have been “building this alongside us” .

What to watch in this category: these announcements shift the conversation from “try this chatbot” toward embedded, end-to-end systems (planning → instruction → assessment → district oversight), where governance and classroom constraints become product requirements rather than add-ons .


Theme 2 — Process over product: making thinking visible (and assessable)

Classroom evidence: assess the interaction, not just the output

A 4-week Grade 12 pilot in Switzerland used Comparative Transcript Analysis (CTA), treating AI chat transcripts as the assessable artifact—focusing on prompts, reasoning, and reflection rather than only the final student product .

Self-reported results from the 21-student pilot included:

  • 85.7% reported changing their approach to AI use
  • 47.6% reported becoming significantly more strategic in interactions (e.g., thinking more before hitting “send”)
  • 81% endorsed continuing the method in schools

Practical moves described include comparing “strong vs. weak” transcripts in class and requiring students to add rationale and ask at least one “why” question back to the AI . The author notes limitations: it was a small, self-report pilot in one classroom measuring short-term shifts, and should be read as preliminary .

A complementary message in broader commentary

An EdSurge piece argues that with AI now widely accessible in classrooms, tasks like summarizing and drafting are becoming baseline capabilities rather than reliable indicators of mastery . It suggests learning measures should move upward toward interpreting nuance, evaluating credibility, and connecting ideas across disciplines .

Teacher reality check: “paper-only” isn’t a universal fix

Teacher discussions continue to reflect pressure to shift assessment formats because students can photograph assignments and ask AI for answers . At the same time, another thread argues “just do everything on paper” can be a logistical and equity challenge (absences, accommodations, and the volume of writing), even for teachers actively policing AI misuse .


Theme 3 — Benchmarking what models can actually do on education tasks (cost included)

Edtech Insiders highlights that The Learning Agency launched an AI and Education Leaderboard evaluating LLMs on education-relevant tasks using zero-shot prompting to show “out of the box” performance . It includes cost comparisons intended to reflect school budget constraints .

Two initial benchmarks:

  • ASAP 2.0 (automated essay scoring): Gemini 2.5 Pro leads (QWK 0.585), while Gemini 2.0 Flash is presented as best cost-performance (QWK 0.562, $0.25 per 1,000 essays, 0.73s latency) . The post notes “thinking models” show no clear advantage for essay scoring and may drift from rubrics .
  • Eedi MAP (math misconception annotation): thinking models dominate; GPT-5 Mini (thinking) is cited as best value among top performers (MAP@3 0.622, $1.26 per 1,000 inferences), but all models trail competition winners significantly .

The same piece cites a RAND survey: in 2025, 54% of students and 53% of ELA/math/science teachers reported using AI for school (a 15-point increase over the prior 1–2 years) .


Theme 4 — Agents and “delegate-able work” for educators (useful, but uneven)

Google’s Auto Browse agent: early promise, mixed execution

Tech & Learning describes Google’s Auto Browse as an AI agent integrated with Chrome and Gemini that can browse the web and use information from Google services (like Calendar and email) to complete tasks . In one educator’s test, it successfully pulled travel dates from Google Calendar, searched email for contacts, and generated a useful list of writing professor jobs, but failed to provide working Airbnb links even when it identified plausible options . The author concludes it’s “fun to experiment with,” but not yet especially useful—and suggests most educators may want to wait for maturity .

Claude Cowork: reusable “skills” and delegated tasks

Educators experimenting with Claude Cowork describe turning tasks into reusable skills—then reusing the skill by sharing a folder of content for automation . Another post describes using it for tasks like adding accessibility tags for images and other processing work, while limiting its access to large folders or full desktops .

Teacher role shift, captured in an interview clip

A Getting Smart interview suggests that as AI can increasingly “know a lot about a subject,” the teacher role may shift toward mentorship—“moving alongside” learners and caring about where they get to next .


Theme 5 — Advising, counseling, and admin: AI as a scaling layer (and a trust test)

Career/college counseling in the face of counselor shortages

EdSurge reports that counselor shortages are a driver for AI experimentation, citing 378 students per counselor in Georgia (vs. a recommended 250:1 ratio) . It profiles EduPolaris AI’s “Eddie” platform (counselor/student/parent portals), which is being piloted in some Title I high schools and raised $1M in early investments .

Reported use cases include dashboards that let counselors track progress (e.g., whether students completed reference letters) and send nudges—reducing the number of meetings required . The article also includes skepticism: one example describes a general-purpose chatbot veering into irrelevant advice when asked about schools strong in dermatology , and a counselor argues the work is primarily relational and not easily replicated by AI .

Enrollment/advising “digital safety net” in higher ed

An EvoLLLution piece describes WSU Tech using AI in the enrollment CRM/tech stack to remove administrative friction and free advisors for human connection—while explicitly cautioning against over-automation . It argues AI can help make sense of data across ERPs, financial aid, LMS, attendance, and engagement to intervene “with the right student at the right time” .

In-school evaluation workflows (and privacy concerns)

Teachers also report administrators using AI to generate observation write-ups aligned to district standards, sometimes described as buzzword-heavy and not personalized . A related comment describes a school recommending uploading SPED IEPs to MagicSchool for goal-writing and data tracking via district prompts, raising concerns about individualized plans and privacy .


Theme 6 — New pathways for AI skills (from college partnerships to high-intensity bootcamps)

  • Anthropic + CodePath: Anthropic announced a partnership with CodePath to bring Claude and Claude Code to 20,000+ students at community colleges, state schools, and HBCUs . Details: https://www.anthropic.com/news/anthropic-codepath-partnership.

  • Gauntlet AI (engineer training): Austen Allred describes a free 10-week program covering travel, housing, food, and laundry, aimed at training engineers to use AI and connecting graduates to $200k+ jobs; he also describes it as exclusive and intense (80–100 hour weeks) . He notes Gauntlet generally accepts around 2% of applicants .

  • Curriculum and app-building with AI: Allred argues AI course-building can match or exceed a team of 25 full-time curriculum developers with the right prompt, and he shared an “OpenClaw” course that Claude turned into a formatted free online course . He also described a “breakthrough” where an AI system built multiple mobile apps end-to-end (some fully one-shotted, some requiring a small human tweak) .

  • Workforce framing: Andrew Ng writes that “workers who use AI” will replace workers who don’t, and says developers proficient with AI coding tools are increasingly in demand .


What This Means (practical takeaways)

  • For K–12 and district leaders: The OECD framing points to a concrete design requirement: if student-facing AI behaves like an answer engine, learning outcomes can suffer; if it behaves like a tutor with guardrails, outcomes can improve . Procurement and policy are starting to encode this (e.g., monitoring, distress escalation, and “no cognitive substitution”) .

  • For teachers and instructional designers: The strongest classroom-aligned pattern this week is process visibility—from transcript-based assessment of AI interaction to writing workflows that preserve student drafting before AI feedback enters .

  • For edtech builders and investors: Education-specific benchmarks plus cost/latency data are becoming a practical decision layer—not just “which model is best,” but “which model is affordable and reliable enough for this task” .

  • For higher ed and student support teams: AI can reduce admin load (nudges, checklists, early-warning signals), but the trust boundary matters—especially where advising is relational or where data sensitivity is high .


Watch This Space

  • In-chat “education apps” and workflow embedding (e.g., standards-aligned content inside ChatGPT) as a distribution channel for classroom materials .
  • School-wide AI platforms that bundle safety, educator tools, student tools, and district oversight into one operating model .
  • Guardrailed tutoring norms (Socratic-by-default, age-appropriate behavior) moving from recommendations into policy and product requirements .
  • Education-specific model evaluation becoming a standard step in adoption decisions, especially when cost/latency tradeoffs are explicit .
  • Agentic tooling for staff productivity (browser agents, delegated “skills”)—useful today for some tasks, but still unreliable enough to require careful human verification .
From AI detection to observable thinking: assessment redesign, ‘time back’ schools, and safer student-facing AI
Feb 9
8 min read
1796 docs
Sal Khan
Justin Reich
MacKenzie Price
+20
This week’s biggest shift is how institutions are responding to AI’s impact on assessment: moving from detection to designs that make thinking observable (live defenses, in-class work, interactive evaluation). We also cover ‘time back’ learning models (Alpha School, Khanmigo), curriculum accessibility tools, student-safety and data infrastructure, and new guardrails for student-facing AI.

The lead — Assessment is shifting from “did you make this?” to “show me how you think”

Across K–12, higher ed, admissions, hiring, and even corporate compliance, multiple sources converge on the same problem: AI has severed the link between producing an artifact and demonstrating understanding, making “cheating” easier and harder to detect . Evidence cited this week includes:

  • 84% of high school students using generative AI for schoolwork
  • A UK university study where 94% of AI-written submissions went undetected and scored half a grade boundary higher than real students
  • Teachers reporting rampant AI-assisted submissions (including many “0” grades), with some moving assessments back to pen-and-paper/in-class work

In response, the most practical pattern isn’t better detection—it’s more observable thinking: live defenses, in-class work, and interactive assessment designs that require students (or candidates) to explain and justify their work in real time .


Theme 1 — “Observable cognition” is becoming the new baseline

Detection is a dead end (and creates its own harms)

One argument is explicit: you won’t be able to reliably detect AI use in homework, so schools should stop building policies around it . Related evidence includes AI-written submissions passing undetected at high rates and educators describing how quickly students learn to route around enforcement (or how enforcement is constrained by grading policies) .

What replaces detection: defendable work

Several concrete “defense” patterns surfaced:

  • CalTech admissions: applicants who submit research projects appear on video and are interviewed by an AI-powered voice; faculty and admissions staff review recordings to assess whether the student can “claim this research intellectually” .
  • Anchored samples in admissions: Princeton and Amherst requiring graded high school writing samples as a baseline for authentic writing .
  • Classroom moves that build friction and visibility:
    • Boston College professor Carlo Rotella brought back in-class exams (“Blue books are back”), arguing the “point of the class is the labor” and that the “real premium” is “friction” .
    • A high school Spanish teacher had students use AI to text-level Spanish sources (still reading in Spanish) and required a link to their chat history in the bibliography .

A related higher-ed complaint: AI-generated student email is described as “rampant” and “inauthentic,” prompting strategies like focusing on the content (“what do you mean by ‘reliable time’?”) rather than trying to prove origin .


Theme 2 — Personalized “time back” learning models are scaling (but governance choices matter)

Alpha School: 2-hour academics + human motivation layer

Alpha School is described as a network of private K–12 schools using AI to deliver 1:1 mastery-based tutoring and compress core academics into ~2 hours/day, with the rest of the day focused on projects and life skills supported by human guides . A recurring design choice: no chatbots (“chatbots…are cheat bots”) .

Operational details shared this week include:

  • A “Time Back” dashboard that ingests standardized assessments (NWEA/MAP) to build personalized lesson plans and route students into specific apps (e.g., Math Academy; Alpha Math/Read/Write) .
  • A vision model monitoring engagement patterns (e.g., scrolling to the bottom, answering too fast) and nudging students (e.g., “slow down…read the explanation”) .
  • A reported platform cost of roughly $10,000 per student per year.

Alpha School’s model also got mainstream attention: a TODAY show segment highlighted a Miami campus pilot program described as “teaching kids with AI instead of teachers,” with reported admissions demand spiking after the segment .

Khan Academy: “Socratic” tutoring with testing and error tracking

Khan Academy’s Khanmigo is positioned as an AI tutor/teaching assistant that nudges learners without giving answers (a “Socratic tutor”) . The team describes building infrastructure around difficult evaluation edge cases and tracking error rates (reported sub-5%, in many cases sub-1%) . They also cite efficacy research: 30–50% learning acceleration with ~60 minutes/week of personalized practice over a school year .

Self-directed learning at scale: “use AI to figure stuff out”

OpenAI shared a usage claim that 300M+ people use ChatGPT weekly to learn how to do something , and that more than half of U.S. ChatGPT users say it helps them achieve things that previously felt impossible . In parallel, Austen Allred argued there’s an “extreme delta” between people who plug their questions into AI and those who don’t .


Theme 3 — Curriculum and content are being redesigned for comprehension and inclusion

Math word problems, rewritten for comprehension without reducing rigor

M7E AI described an AI-powered curriculum intelligence platform that evaluates and revises math content to remove unintentional linguistic and cultural barriers while maintaining standards alignment and mathematical rigor . The team framed the problem as a “comprehension crisis,” citing 61% of 50M K–12 students below grade level in math and noting 1 in 4 bilingual students .

The platform produces district-level summaries, deep evaluations, and revisions (including pedagogical/formatting recommendations and image/diagram feedback), and is offered free for district leaders/schools to use .

Localization and translation as distribution

  • Google’s Learn X team described YouTube auto-dubbing as a way to expand global access to education content by letting learners watch videos in their own language .
  • Canva described “Magic Translate” as localization beyond language—ensuring template elements reflect local festivals and people students recognize .

Theme 4 — District “plumbing” and student safety: more AI depends on more data (and transparency)

A key operational claim from an edtech infrastructure discussion: there is an “insatiable appetite” for more student data (beyond basic rostering) to make AI systems like tutoring and safety tools work . Examples cited:

  • Attendance and family engagement: TalkingPoints described using attendance data to message families when students miss school/periods and to help schools intervene before chronic absenteeism/truancy . They also described an AI feature (“message mentor”) that suggests improvements to teacher-family communications .
  • Student safety: Securely described using AI to scan student Google Docs for potential suicide notes and raise flags quickly, while emphasizing privacy/transparency and framing a benefit as “no human has to ever become aware of the student’s private thoughts” unless a flag is raised .
  • Admin reduction in special needs: Trellis described transcribing child plan meetings and drafting a child’s plan/minutes (with time-bound, measurable actions), piloting across Scottish councils to reduce the 1.5–2 hour teacher write-up burden and improve teacher presence/eye contact in meetings .

A separate classroom-side warning: one educator described a “tech-powered system that never sleeps,” where AI is already embedded (text-to-speech, translation, writing supports) and constant measurement/feedback can erode pause and reflection, increasing pressure on students .


Theme 5 — AI literacy is being reframed: less “prompting,” more domain knowledge + visible practice

Two complementary takes stood out:

  • Evaluate output through domain knowledge: Justin Reich argued that what’s hard is not using AI, but evaluating outputs—and that domain knowledge is a bigger differentiator than AI-specific tricks .
  • Treat AI chats as texts: Mike Kentz proposed teaching AI use via comparative textual analysis of chat transcripts (students compare two AI interactions, identify differences, vote using a partially built rubric, then refine the rubric together) . He reports “promising” results across middle school through college but highlights gaps (transcript design, facilitation quality, and adapting beyond humanities) .

Teacher reality check: 79% of teachers reportedly have tried AI tools in class (up from 63% last year), while “less than half of schools” have provided training .

Student-facing AI: “instructional tool, not a companion”

MagicSchool AI released a white paper arguing student-facing AI should function as instructional technology, not a companion, to reduce risks like companionship and sycophancy . Their framing aligns with a broader principle that role clarity matters as AI enters classrooms .

Policy signals touched this too: Pennsylvania Gov. Josh Shapiro directed his administration to explore legal options requiring AI chatbot developers to implement age verification and parental consent.


What This Means (practical takeaways)

  • For K–12 leaders: If AI use is widespread and hard to detect , the most actionable lever is assessment design—more in-class work, live explanation, and structured reflection (rather than relying on detectors) .

  • For higher ed: Expect more hybrid “artifact + defense” models (e.g., video interviews, oral exams, anchored writing) to become normal ways to validate ownership .

  • For edtech builders and investors: The next wave of defensibility may be less about a chatbot UX and more about: (1) measurable learning loops (practice, feedback, progress), and (2) reliable integration into district workflows and data standards—plus clear transparency promises when products touch sensitive domains like safety .

  • For L&D / employers: The same authenticity problem shows up in hiring (AI-written résumés; rising cost/time to hire), reinforcing a shift toward early, live validation of skills .

  • For learners: Advantage goes to people who can ask good questions, verify outputs, and use AI as a scaffold rather than outsourcing thinking—skills echoed across classroom practice and workforce framing .


Watch This Space

  • Live/interactive assessment spreading from admissions to everyday classroom practice (video defenses, oral exams, transcript-based evaluation) .
  • AI “time back” models that combine personalization with human motivation layers (and how they handle engagement, cheating, and trust) .
  • Student-facing safety and role clarity—instructional tool vs companion—and whether age-gating and consent become baseline requirements .
  • Curriculum accessibility tooling (especially for multilingual and low-context learners) moving upstream into procurement and publisher workflows .
  • Data governance under load as more AI products demand extended data for tutoring, attendance, and safety use cases—and districts push for transparency .
AI learning tools expand—while evidence sharpens the case for guardrails, verification, and real mastery
Feb 2
9 min read
1846 docs
DeepLearning.AI
OpenAI
Andrej Karpathy
+16
This week’s signal: AI is accelerating learning workflows (tests, flashcards, simulations, and agent-driven building), but evidence and practitioner commentary sharpen the warning that speed can come at the cost of understanding without deliberate guardrails. We cover what’s new in classroom practice tools, simulation-based learning, and the policy/safety pressures that are increasingly shaping adoption.

The lead: AI can speed up work—but it can also reduce learning if you don’t design for understanding

A randomized-controlled trial by Anthropic found that junior engineers using AI assistance completed a novel coding task slightly faster (about two minutes; not statistically significant) but scored 17% lower on a concept quiz (roughly two letter grades) . In the same study, participants who still scored highly while using AI tended to ask conceptual and clarifying questions rather than delegating the task to the model .

This learning tradeoff is showing up across the week’s coverage: leaders are shipping more “practice and feedback” tools into everyday workflows, while practitioners warn that guardrails, verification, and human judgment aren’t optional.


Theme 1 — Mastery learning with guardrails: Alpha School’s “bright spot” framing

Geoffrey Hinton is cited praising Alpha School as a potentially positive use of AI in education—described as notable given his usual warnings about AI risks . Alpha School’s positioning emphasizes that AI is:

  • Harmful when it becomes “screens everywhere” and chatbots become “CheatBots
  • Powerful when used as a focused “1:1 mastery system” with “strong guardrails

"This frees adults to do the human work - coaching, relationships, and life skills - while kids gain superpowers in learning."

From Alpha’s own description, its AI tutor runs in the background as a personalized, mastery-based platform that adapts lessons by level and pace, measures learning, and fills knowledge gaps—while explicitly saying it does not use a GPT or a chatbot that kids interact with . The same post claims Alpha schools have less screen time than a traditional school and “way better results” .

Operational signals also showed up in social posts:

  • A weekend hackathon at Alpha School reportedly had students building impressive apps “after a few hours,” with a response: “AI gives kids superpowers” .
  • Alpha School shared that students use AI to pursue passions, e.g., one student learning to code a cooking app .
  • Alpha School is described as bringing 100 Stanford and MIT students to Austin for an intensive summer to build AI apps aimed at transforming education for 1 billion kids .

Theme 2 — Practice and feedback at scale: tests, flashcards, and bite-sized skill builders

Gemini expands standardized test practice (SAT + JEE)

Google says Gemini now offers full-length practice SATs and mock JEE Main tests at no cost, with feedback and study tips . The JEE practice is described as grounded in “rigorously vetted content” in partnership with Physics Wallah and Careers360, with immediate feedback on strengths and study needs .

Microsoft rolls out AI-powered flashcards across M365 (with classroom insights)

Microsoft has rolled out AI-powered flashcards in the Learning Activities app across Microsoft 365 apps for students and educators . Teachers can generate flashcards from text (up to 50,000 characters) and from Word documents or PDFs, choose language and card types, add hints, and pull images via Bing .

For classroom use, it also supports sharing by link/join code and provides educator-facing insights (e.g., how many students started/completed, average score, challenging cards) .

Limitations to keep in mind: The flow is highly generative (create → regenerate → tweak), which can speed up production—but it also means review and editing are central to quality control .

AI-generated “minigames” as a practice format

Ethan Mollick shared a prompt to Claude Code to “figure it out” and create something “awesome,” resulting in a set of 21 minigames intended to teach a broad list of practical skills .


Theme 3 — Simulation-first learning: role-play, field verification, and realistic training environments

A multimodal agent in medical simulation

Mollick highlighted a paper testing a multimodal AI agent (using Gemini 2.5) in a realistic medical simulation used to train physicians, reporting it matched or exceeded 14,000 medical students in case completion and secondary outcomes like time and diagnostic accuracy .

Higher ed role-play: where guardrails have to be “castle walls”

In a Substack interview, one contributor argued that high-risk domains (clinical psychology, nursing, drug-abuse counseling) require more than guardrails—“castle walls”—including HIPAA compliance and making sure what a student says “never, ever leaves the classroom” and “can never be used in court against them,” plus extensive testing . The same discussion suggests chatbots open the door to cognitive simulations and role-plays across fields like criminal justice and interviewing, with an LMS role-play that can look things up on the internet and behave in character (including languages) .

A concrete example: nursing faculty using role-play so students practice assertive communication with a simulated coworker that adapts responses, followed by debriefing with a communication coach and in-class discussion .

AI as a “mirror” for student thinking in public health

At Duquesne University, Dr. Urmi Ashar described a public health assignment where students adopted personas and used chatbots to explore whether someone should move to the Sheraden neighborhood, then compared outputs against Google Maps and a “windshield survey” (experiencing the neighborhood firsthand) . The exercise surfaced student assumptions and emphasized verification: “the map is not the terrain” .

Ashar describes AI as “more like a mirror” reflecting questions, assumptions, and blind spots, with the instructor shifting from expert to coach .


Theme 4 — Governance and safety: deepfakes, bias, and misinformation literacy become operational concerns

“AI is like corn syrup”: districts treating AI as unavoidable in procurement

An EdSurge piece quotes a K–12 CTO: “AI is like corn syrup; it’s going to be in everything,” framing AI as embedded in edtech whether districts are ready or not . The same piece notes districts are pushing harder on data governance and asking students to learn prompting and critical consumption of information .

AI, education, and the law: bias + deepfake risk

A Tech & Learning practitioner guide flags legal and ethical challenges including algorithmic bias—citing evidence that AI detection tools can be “near perfect” for native English speakers while falsely flagging 61% of essays by non-native speakers as AI-generated . It also cites data that nearly half of students and more than a third of teachers are aware of school-related deepfakes .

The same piece points to a “human in the loop” approach and suggests leaders ask whether systems have biases, whether student data is used to train third-party models, and whether tools minimize data collection .

Parallel discussion in teacher communities tracked enforcement challenges alongside policy:

  • The “Take It Down Act” is described as making revenge porn and AI deepfakes a federal crime, and the Senate is described as having passed a related bill unanimously .
  • South Korea is described as passing a 2024 law in response to deepfake pornographic videos of teachers and students, with 5–7 years for creating/distributing and penalties for watching/possessing .

Misinformation literacy: AI-generated “pink slime” news

Tech & Learning described “pink slime journalism” as sites masquerading as local news while pushing an agenda, and reported Yale research in which just under half of participants preferred AI-generated fake local news sites over legitimate ones . Recommended responses include teaching students to check “About Us,” assess authorship and sourcing, and apply a cybersecurity-style skepticism to unfamiliar content .

Governance friction in practice: NYC votes down AI contracts

Chalkbeat reported that NYC’s Panel for Educational Policy repeatedly bucked City Hall in recent months, including voting down “millions worth of AI contracts” .


Theme 5 — Agents as a workforce skill: management, reusable skills, and new workspaces

“Programming in English” and the need for oversight

Andrej Karpathy described moving rapidly to a workflow of ~80% agent coding and ~20% manual edits, calling it the biggest change to his coding workflow in ~two decades . He also warned that current “agent swarm” hype is too much: models still make subtle conceptual errors and often run with wrong assumptions without seeking clarifications, requiring careful oversight in an IDE . He noted early signs of atrophy in manual code-generation ability (distinct from reading/reviewing) .

“Management as AI superpower” in higher ed entrepreneurship

In an experimental University of Pennsylvania executive MBA class, students built working startup prototypes from scratch in four days, using Claude Code and Google Antigravity for coding and ChatGPT/Claude/Gemini for idea generation, market research, pitching, and financial modeling . Mollick attributed much of the success to management skills—scoping problems, defining deliverables, and recognizing when outputs were off—turning “soft” skills into the hard ones .

Reusable “skills” for agents

Andrew Ng and DeepLearning.AI promoted a short course, “Agent Skills with Anthropic,” describing “skills” as structured folders of instructions that agents load on demand, designed to move workflow logic out of prompts and into reusable components . The course description highlights deploying across Claude.ai, Claude Code, the Claude API, and the Claude Agent SDK .

PLTW: treating AI as a “colleague” and building AI literacy into STEM pathways

Project Lead The Way described a one-semester high school course (“Principles of AI”) as the foundation of a four-pillar AI framework, covering AI/ML history, how data and LLMs work, and ethical reasoning . In the same conversation, PLTW described an organizational expectation of treating AI “as a colleague, as a team member,” while emphasizing judgment and ethical boundaries—especially for educator- and student-facing content .

Research workspaces also get “AI-native”

OpenAI introduced Prism, a free cloud-based LaTeX-native workspace “powered by GPT-5.2” for scientists to write and collaborate on research, with GPT-5.2 working inside projects with access to paper structure, equations, references, and surrounding context . Prism is described as removing version conflicts and setup overhead, and is available on the web for ChatGPT personal accounts (with Education plans “coming soon”) .


What This Means

  • For K–12 leaders: The “AI tutor” conversation is shifting from whether to use AI to how to design it—toward mastery systems with explicit guardrails and adult-led coaching, and away from unsupervised chatbots . At the same time, legal and reputational risk is rising (deepfakes, detection bias, data practices), making “human in the loop” governance and procurement questions practical requirements .

  • For higher ed and workforce learning: Simulations and role-plays are emerging as high-leverage use cases—but only where privacy and safety requirements can be met (HIPAA “castle walls,” classroom containment, and testing) .

  • For product builders and investors: The learning tradeoff in AI assistance is now harder to ignore: tools that help people finish faster may reduce understanding unless they’re built to elicit conceptual questions and reflection . Features that produce practice + insight loops (full-length tests with feedback; classroom flashcard analytics) are one concrete path to value .

  • For learners: Expect “AI literacy” to look less like memorizing prompts and more like building the habit of verification, asking clarifying questions, and treating AI output as draft work that needs judgment and editing .


Watch This Space

  • Learning-first AI design: whether more products adopt patterns that push learners to ask clarifying/conceptual questions (instead of “answer now”), reflecting the Anthropic study’s high-performer behavior .

  • Standardized test prep inside general AI assistants: Gemini’s full-length SAT/JEE tests suggest “assessment-as-a-feature” will spread beyond dedicated test-prep platforms .

  • Deepfake enforcement vs. school reality: policy is tightening, but teacher discussions point to prosecution and enforcement gaps in practice .

  • Simulation ecosystems: medical, nursing, and public health examples are converging on a theme—AI can simulate scenarios, but educators still need the debrief, verification, and judgment layer .

  • Agent skills as the new professional development layer: reusable skills, structured workflows, and “AI as colleague” expectations are turning into training products and curricula (from PLTW to short courses to MBA classes) .

AI embeds into core school workflows as safety rules and assessment redesign accelerate
Jan 26
8 min read
1913 docs
Austen Allred
Andrew Ng
Anthropic
+8
This week’s developments show AI shifting from “chatbots in class” to platform-native workflows, new safety and policy constraints, and assessment redesign. We highlight what’s changing in Google Classroom, Microsoft’s new Learning Zone, child-safety guardrails, and where evidence of impact is starting to surface.

The lead: AI is being embedded into the “systems of school” (not just used as a chatbot)

This week’s clearest signal isn’t a new model—it’s where AI shows up.

  • Google is pushing AI deeper into the tools teachers already use (Google Classroom + Gemini) with admin controls and privacy commitments .
  • Microsoft is shipping an on-device lesson builder (Learning Zone) tied to Copilot+ PCs, plus assignment, reporting, and content libraries .
  • “Non-classroom” applications—like master scheduling optimization—are being positioned as high-leverage AI because they shape student experience without students interacting with AI directly .

Theme 1 — Platform-native AI: planning, differentiation, and content generation inside existing workflows

Google Classroom + Gemini: from prompts to pre-built teacher tools

A teacher-facing walkthrough reports 28 pre-prompted AI tools inside Google Classroom’s Gemini dashboard, organized by planning, instructional materials, assessments, student support, and administrative tasks . Examples include outlining lesson plans, releveling text, generating quizzes, drafting newsletters, and creating PD plans .

Two features that stood out in coverage:

  • Classroom context inside Gemini: educators can connect Google Classroom to the Gemini app so Gemini can reference class roster, assignments, and grades when helping adjust lessons .
  • Audio lessons: Google described a Classroom feature that turns content into a student–teacher dialogue audio lesson designed to go deeper into misconceptions (distinct from podcast-style audio) .

Google also described Gemini for Education as free access to its “highest-end reasoning model” for Google for Education customers, with a data-protection claim that student data isn’t used for training .

A separate pilot example cited up to 10 hours/week of time savings for educators in Northern Ireland after rolling out Gemini .

Limitations to keep in mind: a teacher blog explicitly frames outputs as drafts—useful for skipping the blank page, but still requiring review and editing .

Microsoft Learning Zone: on-device lesson generation + classroom analytics

Microsoft introduced Learning Zone, an AI-powered app for Copilot+ PCs that uses a local small language model to generate interactive lessons in minutes .

Key workflow pieces from the demo:

  • Grounding with sources: teachers can upload Word/PDF files, attach OneDrive files, or use vetted resources such as OpenStax .
  • Editability: lessons generate as a mix of content and practice “slides,” but teachers can edit slides, add question types, generate distractors, and simplify language .
  • Assignment + LMS hooks: share via join code/link/QR and share into Teams assignments or Google Classroom; Microsoft also said Learning Zone lesson attachment in Teams/LTI is being worked on for spring 2026 .
  • Reports: per-lesson and per-student performance insights (e.g., % correct, time), drill-down by exercise type, and identification of students needing support .

Theme 2 — Safety, privacy, and regulation: guardrails are becoming product requirements

Two policy “fronts” reshaping AI + edtech

Edtech Insiders highlighted two broad vectors:

  • California AI + minors: OpenAI and Common Sense Media announced plans for the “Parents & Kids Safe AI Act,” including age assurance, a ban on targeted advertising to minors, limits on sharing children’s data without parental consent, and content safeguards against harmful AI content. The piece notes these rules would also apply to AI-powered educational tools . Enforcement would flow through the Attorney General and financial penalties (moving away from a private right of action) .
  • Screentime scrutiny spilling into edtech: an NTIA inquiry is questioning whether federal subsidies are pushing schools toward more screens without evidence of learning benefit . The same article links a broader political trend (e.g., Kids Off Social Media Act proposals) to increasing regulation of what happens on school-issued devices .

“Red lines” in product design: LEGO Education’s stance on generative AI

LEGO Education described “red lines” for bringing AI into classrooms:

  • Generative AI tools may be made safer, but they “cannot be guaranteed to be safe,” so LEGO Education will not bring them into classrooms until that gap can be closed .
  • They avoid anthropomorphizing AI (no faces/names; not describing AI as creative) .
  • They emphasize local processing and keeping child data from leaving the classroom or being transmitted over the internet .

Procurement reality: compliance-first vs “public” tools

A SchoolAI community manager emphasized COPPA/FERPA compliance, stating SchoolAI does not use student data to train models or sell data . The same comment warns that using public-facing tools with student identifying information (e.g., student names for a seating chart) can break federal law .


Theme 3 — Assessment is being rethought for an AI era (and vendors are rushing in)

BETT: assessment shifts from “pattern recognition” toward what’s harder to test

At BETT, one discussion argued that the most valuable things are becoming harder to assess, while the least valuable are increasingly easy for AI (pattern recognition). The takeaway was the need to “measure what we treasure” .

In the same coverage, Vicki Merrick described pilots using machine learning-enabled comparative judgment (holistic pairwise comparisons by teacher judges) for more reliable assessment of subjective Key Stage 3 work. In one pilot: 40 judges assessed 2,000 Year 7 art items across 14 academies in less than an hour and achieved a 0.89 reliability score. Teachers reported greater confidence because their judgments were one of many .

AI tooling for assessment creation + feedback loops

  • Kahoot AI Generator: creates quizzes from prompts, slides, or PDFs, with “over 13” question types and modes like Accuracy Mode (points for correctness rather than speed) . Kahoot also cited “almost 200 independent research studies” and claimed grade increases by a letter grade on an average test .
  • Red Pen AI: a formative assessment workflow that starts with uploading photos of handwritten student work, identifies urgent curriculum gaps, generates editable feedback, and tracks class progress on a dashboard—aiming to reduce teacher workload without requiring 1:1 devices .
  • Teacher feedback prompts: Monica Burns shared copy-and-paste prompts to draft student-friendly feedback faster, emphasizing drafting + revising with professional judgment .

Theme 4 — AI literacy is shifting toward fundamentals, agency, and durable human skills

From “how to use AI” to “how to understand and judge it”

  • LEGO Education’s new Computer Science and AI product line aims to teach AI/CS/robotics fundamentals from kindergarten, including probability, statistics, machine representation, and algorithmic bias—explicitly pushing away from “throwing conversational chat bots in front of children” .
  • The National Literacy Trust launched a “National Year of Reading” campaign after reporting that only 1 in 3 children said they like reading and 1 in 5 read every day in a survey of 17,000 children . In the same coverage, an expert argued literacy becomes more important in an AI-driven world because students must write accurate prompts and evaluate whether AI output is accurate and truthful .

Evidence emerging on what teachers actually build with AI

A SchoolAI study analyzing 23,000 teacher-created AI learning experiences reports that over 75% were anchored in core curriculum and designed to prompt students to reason, evaluate, and make decisions—not just recall information .

“Human skills” framing is hardening into leadership language

A Tech & Learning piece proposed the C.A.R.E.S. framework (cultural competence, adaptability, relationships, ethical judgment, scholarly discernment) as the “irreplaceable” human core as AI drafts lessons, analyzes student work, and generates feedback .


Theme 5 — Upskilling is scaling: educators, engineers, and whole workforces

  • Anthropic + Teach For All: a partnership to bring AI training to educators in 63 countries, enabling teachers serving over 1.5 million students to use Claude for curriculum planning, customized assignments, and tool-building, and to provide feedback to shape Claude’s evolution .
  • Gauntlet AI: positions itself as free immersive training for engineers (travel to Austin plus covered housing/food) with employer matching for $200k+ roles; it states participants never pay under any circumstances .
  • Gemini CLI training: Andrew Ng promoted a DeepLearning.AI short course on Gemini CLI (an open-source agent) focused on multi-step workflows from the terminal, including orchestrating tools via MCP and automating coding tasks .

What This Means

  1. For K–12 and district leaders: AI adoption is accelerating where it can be governed—inside platforms with admin controls, protected data terms, and teacher-facing “draft” workflows (e.g., Classroom+Gemini, Learning Zone). Expect procurement to increasingly center on privacy posture and control surfaces, not feature lists .

  2. For assessment and curriculum teams: “AI in assessment” is splitting into two lanes: (a) automating creation and feedback loops (quizzes, formative feedback), and (b) redesigning what’s assessed (contextual, non-deterministic work) using methods like comparative judgment .

  3. For edtech builders and investors: The regulatory environment is converging on child-focused requirements (age assurance, data sharing constraints, content safeguards) that will apply to AI edtech, not just social platforms . Products with explicit safety “red lines” and local processing claims (or on-device models) may gain advantage in K–12 contexts .

  4. For learners and L&D professionals: The “AI capability” gap is widening—multiple sources frame value as judgment, curation, and the ability to evaluate output quality (not just generating text quickly) .


Watch This Space

  • Age assurance + AI edtech compliance: whether California’s proposed standards become a de facto requirement for AI products used by minors .
  • On-device education AI: tools that rely on local models (e.g., Copilot+ PC workflows) as a response to privacy, cost, and offline constraints .
  • Assessment redesign at scale: comparative judgment pilots and other methods that claim reliable evaluation of subjective work without over-indexing on what AI can do best .
  • AI literacy as fundamentals + agency: product lines and curricula that emphasize how systems work (and how to judge them) rather than putting chatbots “in front of children” .
  • Training models for the AI workforce: partnerships and “free training + job outcomes” models expanding across educators and engineers .