CommonGround

Safe and effective AI. Our focus is to help guide the use of safe and effective artificial intelligence by:

assisting organizations in their understanding and implementation of AI in the workplace
helping influence and craft legislation around AI’s (specifically LLM’s) development and how AI can be safely released to the public
creating AI solutions to improve the experience of working with and within the government

Commonground.dev

AI Latest

Establish A U.S. Government Hub for AI Benchmarking

Mar 2025

AI benchmarks are more than measurement tools; they drive investment and influence global standards. To this end, this RFI concerning the Development of an Artificial Intelligence Action Plan recommends that the U.S. government establish a government hub for AI benchmarking. This should be done by cultivating an ecosystem of independent AI benchmarking organizations. Though this RFI is mainly focused on capability benchmarks, it’s important to remember to encourage the development of trust and safety benchmarks as well.

COMPL-AI Framework: An LLM Benchmarking Suite for the EU AI Act

Oct 2024

COMPL-AI aggregates a set of state-of-the-art benchmarks to assess compliance of LLMs to the EU AI act. This benchmark suite might not be perfect, but in deriving a taxonomy of hazards based on something concrete (the EU AI Act), they have created a test that has immediate applicability to helping advance and enforce AI regulations.

MLCommons releases v1.0 of their AILuminate Benchmark

Dec 2024

In an effort to take benchmarks out of the realm of academia and move them into the realm of reliable industry measurement, MLCommons released v1.0 of AILuminate, their benchmark to assess the safety of LLMs used to power generalized ChatBots such as ChatGPT. Covering a taxonomy consisting of 12 hazard categories and evaluating 13 state-of-the-art LLMs using this taxonomy, AILuminate is a major milestone in the effort to make LLMs safer and more trustworthy. The benchmark works by evaluating a set of prompts, recording the responses, and then using a set of “safety evaluator models” to determine which of the responses are violations according to their Assessment Standard guidelines.

EUREKA: Evaluating and Understanding Large Foundation Models

Sept 2024

EUREKA is an open-source framework from Microsoft for standardizing evaluations of the performance of multimodal and text-only LLMs. A very interesting component of this framework is the analysis in Section 6 of the non-determinism of LLMs. “...we observe that very few large foundation models are fully deterministic and for most of them there are visible variations in the output — and most importantly in accuracy — when asked the same question several times, with generation temperature set to zero..."

The benefits of AI are enormous, but so are the risks. The great news is that there are hundreds of organizations and thousands of smart people around the globe working to ensure AI is, and will be, used safely and effectively. We are here to help.

Let’s move forward, together.

Reach Out