T-Twice — Informal Test Observations

01

Dataset currently available

Small sample, real product traces.

The POC database gives early usage signals. These numbers are useful for product validation, but they are not statistical evidence of learning impact.

60

Logged sessions

Student sessions recorded in the POC database.

284

Messages

143 student/user messages and 141 assistant replies.

55

Reasoning events

Recorded error or reasoning classifications.

19

Keyboard conversions

Natural-language-to-math notation events.

Scope of the POC

The environment includes four calibrated mathematical levels and several demo or anonymised student identifiers. It was built to test the interaction model: students write reasoning first, T-Twice diagnoses the reasoning, then asks a targeted question.

TerminaleIntroductory, intuitive

Licence 1Intermediate, broad support

Master 1Recurrence, proof training

Master 2Formal rigor

Ethical reading of the data

The profiles stored in the POC are learning-behaviour profiles, not medical diagnoses. They indicate how students interact with mathematical reasoning, writing, notation and proof structure.

The data should be read as qualitative design evidence — not as proof that the tool improves grades or long-term learning outcomes.

02

Recorded error patterns

The problem was not only “wrong answers”.

The most frequent events were about writing, hypotheses, theorem use, and rigor. This supports a core T-Twice idea: reasoning failure needs a finer vocabulary than right or wrong.

Top recorded reasoning/error categories

Poor writing

14

Missing hypothesis

8

Theorem misuse

7

Definition confusion

5

Logical jump

5

Case forgotten

4

Calculation error

3

Quantifier error

3

Interpretation

The largest category was poor mathematical writing. That does not mean students were incapable of understanding the mathematics. It means the POC repeatedly surfaced a gap between partial understanding and rigorous expression.

This validates the need for guided proof structure, contextual notation support, natural-language conversion, and a clear separation between reasoning quality and writing quality.

In mathematics, students may fail not because they do not think, but because they cannot yet formalise what they think.

03

Observed learning-behaviour profiles

Patterns became visible over time.

The POC stores profile categories that describe learning behaviour. They are not medical labels and should not be interpreted as diagnoses.

Profile

Intuitive but not rigorous

Good mathematical intuition, but insufficient formal precision in writing.

Profile

Blocked in writing

Conceptual understanding appears stronger than proof production.

Profile

Procedural, weakly conceptual

Procedures are applied before hypotheses are understood.

Profile

Formal logic difficulties

Frequent confusion around quantifiers, implications or equivalences.

Profile

Not enough data

Some students did not have enough repeated events for a profile.

Central product insight

T-Twice does not need to label the student. It needs to identify the friction in the reasoning process and reduce it.

04

What we observed

Early qualitative signals worth testing further.

These are design observations from the POC. They are useful because they show where the product creates a different learning behaviour from answer-giving AI.

Observation 01

Students stayed inside the reasoning process longer.

Partial feedback such as “your reasoning is correct at 80%” reopened the exercise instead of closing it. Students were pushed to inspect one precise weakness rather than receive the final answer.

Observation 02

Socratic questioning changed behaviour.

Targeted questions encouraged rereading, correcting a missing hypothesis, identifying a forgotten case, and reformulating the proof more rigorously.

Observation 03

The main friction was not always conceptual.

Many blocks appeared at the level of formalisation, notation, proof structure or writing effort — not only at the level of mathematical understanding.

Observation 04

The keyboard became part of cognition.

Natural-language-to-symbol conversion reduced input friction. The keyboard is not only an interface feature; it protects the continuity of mathematical thought.

Observation 05

Teacher calibration changed the environment.

The POC includes calibrated classes, professor comments and professor corrections, confirming that T-Twice is not just a chatbot but a teacher-aligned learning space.

Observation 06

Recognition mattered more than gamification.

The strongest engagement signal was not badges or points. It was the student seeing how they reason, where they repeatedly block, and what they can improve.

05

Implemented vs planned

What is testable in the POC, and what remains roadmap.

This section separates live, inspectable product work from documented concepts that are not yet implemented as working POC features.

Implemented and testable

Engagement systems

The core engagement systems are implemented in the POC and can be inspected through the live platform and logged interactions.

Student engagement: Socratic dialogue, partial feedback, reasoning-first interaction.
Teacher engagement: calibrated classes, professor dashboard logic, comments and corrections.
Interface and keyboard engagement: natural-language-to-symbol conversion and reduced notation friction.
Reasoning/error tracking: stored categories, sessions, messages and learning-behaviour profiles.

Documented, not yet implemented

Neuroadaptive system

The neuroadaptive system is designed and documented, but it is not yet implemented as an automated real-time adaptation engine in the current POC.

Automatic dys / attention / cognitive-friction adaptation remains planned.
Real-time adjustment of interface, rhythm and modality remains roadmap work.
Dys and neurodivergent effectiveness still requires dedicated validation.
The POC currently shows the logic and data traces needed to build this layer, not the finished adaptive engine.

Therefore: the engagement architecture is implemented and testable; the neuroadaptive architecture is documented and planned. This distinction is intentional, so the submission does not overclaim what the POC currently proves.

06

Honest limits

What this document does not claim.

This is the credibility section. The point is to show discipline: T-Twice has promising qualitative signals, not proof of impact at scale.

What the POC suggests

Refusing immediate answers can work when the feedback is well designed.
Reasoning errors can be classified more precisely than right or wrong.
Writing and notation are major cognitive frictions in mathematics.
Longitudinal patterns reveal learning difficulties hidden in isolated answers.
Teacher calibration can make AI tutoring closer to the real course context.

What we cannot conclude

No statistically significant learning improvement can be claimed.
No exam performance gain has been demonstrated yet.
No long-term retention gain has been measured.
No validated causal effect on dys or neurodivergent profiles is claimed.
No superiority over other AI tutoring systems has been proven.

07

Next validation milestone

From promising traces to rigorous evidence.

Large-scale empirical validation is one of the first roadmap milestones once the project is funded.

I

Controlled comparison

Compare T-Twice against generic AI and no-AI conditions with pre-defined outcome measures.

II

Longitudinal tracking

Measure whether reasoning quality improves across several weeks and whether AI dependency decreases.

III

Independent review

Run the evaluation with academic supervision and publish results regardless of outcome.

Current position

T-Twice is not yet a proven large-scale educational intervention. It is a working proof of concept with early qualitative evidence suggesting that AI can be designed to protect reasoning instead of replacing it.

The purpose of this document is transparency: to distinguish clearly between what is implemented, what has been observed, and what remains to be scientifically validated.