The proposal introduces Neural Seal, an AI transparency solution that creates a standardized labeling framework of “nutrition facts” that inform users on the level of AI involvement in products or services. Visibility for AI involvement is limited but necessary in many domains like finance, healthcare, and social media, and Neural Seal aims to create a universal labeling standard for AI transparency. The solution involves a standardized structural evaluation with a multi-step questionnaire on the level of AI usage and generating a color-coded rating (A/B/C/D) that represents AI involvement in a digestible manner. The team showcases a demo build of the questionnaire and mockups of the labeling system, along with a yearly plan for building a prototype, integrating interpretability techniques, and widespread adoption of the standardization metrics.
I think this is a great idea! The proposal highlights an important need and the team understands that the consumer-facing aspect of their product requires simple and intuitive metrics and ratings. Some things to consider that can greatly strengthen the proposal:
1. Different industries and use-cases can be classified as low/medium/high risk and might require different rubrics to assess the impact of AI involvement
2. Some guidelines or information on how the level of AI involvement is categorized can be helpful for companies filling out the proposal
3. Details on the breakdown of different areas and how the AI usage in these areas is categorized, along with how the impact score can be calculated would add a strong algorithmic component to the proposal
4. The vision of creating a singular standardized metric across all industries might require extended research. I would suggest starting with a few use cases and showing some proof of concept for those areas (which might require different metrics, language, and jargon that the area-specific target users are familiar with) and using the insights to inform what a universal standardized metric might look like.
5. Some visual indication of the AI involvement (like chemical/bio/fire hazard symbols on instruments and chemicals) can be an additional way to showcase the rating in an accessible manner.
6. On the technical side, using a react-based framework can be relatively easy to implement and flexible to modify and build upon, maybe using Claude can be helpful since it can natively render react components.
7. For explainable AI methods, it might be important to consider other interpretability approaches that have shown great promise for large language models (like patching, probing, SAEs, etc) since future systems will most definitely include an LLM/multimodal AI-based agent, and explainable AI methods like LIME/SHAP that are compliance supported might not be sufficient to explain the inner working of these highly capable systems.
I might have more comments and suggestions, so feel free to reach if you need any further feedback. Good luck to the team!