GRID+: a structured framework for multi-dimensional research evaluation

A framework for transforming qualitative prompts into structured, repeatable evaluation pipelines.

Purpose

GRID+ addresses the instability and opacity of prompt-based evaluation using large language models. Prompt-driven assessments are often difficult to reproduce, compare, or audit, and can be highly sensitive to phrasing and context. Existing evaluation approaches tend to rely either on rigid quantitative indicators or on loosely specified qualitative judgement. GRID+ is designed as a middle layer between these extremes: a structured, multi-dimensional framework that preserves interpretability while improving consistency and reuse.

Structure

Dimension	Description
G	General characteristics of the research output, such as scope, type, or positioning
R	Research quality and rigour, including methods, validity, and robustness
I	Influence or impact, including scholarly, societal, or translational reach
D	Directionality or novelty, including originality, risk, or divergence from existing work
+	Extension layer for context-specific dimensions, adjustments, or refinements

The framework is modular. The four core dimensions provide a baseline structure, while the extension layer allows adaptation to domain, dataset, or evaluation objective.

From prompt to pipeline

GRID+ formalises a shift from single, unstructured prompts to decomposed evaluation workflows. Rather than asking a model to produce a holistic judgement in one step, evaluation is distributed across explicit dimensions, each treated as a separate analytical operation. This creates a pipeline in which reasoning is segmented, outputs are structured, and intermediate assessments can be inspected, compared, and reused. The result is a system that is less sensitive to prompt variation and better suited to reproducible evaluation.

How it is applied

Application follows a staged process. A set of research outputs is first selected, such as publications, projects, or other research artefacts. Each output is then assessed across the GRID+ dimensions using prompts or instructions defined for each dimension rather than for the object as a whole. The results are captured in a structured format, producing a multi-dimensional profile for each item. These profiles can then be compared, aggregated, or analysed further depending on the evaluation objective.

Worked example

Output	G	R	I	D
Paper A	Broad interdisciplinary scope	Strong methodological clarity	Moderate citation uptake	High conceptual novelty
Paper B	Narrow domain focus	Moderate rigour	High policy relevance	Low novelty

This example is simplified and does not represent full implementation.

Interpretation

GRID+ outputs are inherently multi-dimensional. They are not intended to be collapsed into a single composite score. Instead, they support comparative reasoning across dimensions, making differences in profile explicit, such as high novelty with limited uptake, or strong influence with lower originality. This allows evaluation to remain differentiated rather than flattened into aggregate indicators.

Scope and limits

The framework depends on the quality and completeness of the underlying data. Dimension definitions introduce an element of interpretation, especially for concepts such as novelty, influence, or directionality. Results may not transfer directly across domains without adaptation through the extension layer. Decomposing evaluation into dimensions improves clarity, but it can also reduce coherence if interactions between dimensions are ignored.

Implementation status

Full scoring logic, pipelines, and operational details are not public.

GRID+ is related to broader efforts to structure LLM-based reasoning into explicit workflows rather than single prompts. It also connects to work on AI-assisted metascience, evaluation systems, and validation tools such as AVA, where assessment is decomposed into inspectable stages.

Depth

overview: available
methodological detail: partial
full protocol and implementation: not public

GRID+: a structured framework for multi-dimensional research evaluation ​

Purpose ​

Structure ​

From prompt to pipeline ​

How it is applied ​

Worked example ​

Interpretation ​

Scope and limits ​

Implementation status ​

Related work ​

Depth ​