Policy Playbook Configuration

11 min

policy centric moderation in nima nima is built around a policy centric moderation model in this model, moderators do not directly choose a moderation action (e g "remove," "warn," "restrict access") instead, they select a policy such as "hate speech", "harassment", or "misinformation" that represents the platform's rule being enforced policy centricity helps nima deliver consistent, transparent, and adaptable moderation flow as every moderation decision is grounded in clear, documented policies rather than inconsistent, ad hoc actions this builds trust and transparency, ensures fairness and consistency, and keeps platforms ready for new risks and regulations from policy → action to policy → strike engine as detailed in actions docid\ lsfkyoqsi365x alowit1 , each policy or sub policy can be linked to a single moderation action this policy → action model ensures consistent enforcement once a policy is matched, the assigned action is automatically applied however, this model has limitations it does not take into account whether a user has previously violated other policies, nor does it support progressive enforcement for repeated violations of the same policy the strike system addresses these gaps it allows nima to track users’ violations over time and apply progressive consequences based on their history for example, a first violation might trigger a warning, a second violation could result in a temporary suspension, and repeated violations could escalate to permanent account restrictions strike system glossary before detailing how to set up your strike system, a few key notions term description strike system the mechanism in nima that tracks user violations over time and applies progressive consequences the strike sustem can have multiple strike systems, each with its own rules, allowing different areas of the platform to escalate violations independently example a “community forum” strike logic and a “marketplace” strike logic can coexist in the same strike system with separate criteria tiers levels within a strike system representing the severity of enforcement each tier contains a sequence of policy violations and associated actions milder policies map to lower tiers requiring more violations to escalate, while severe policies map to higher tiers requiring fewer violations example tier 1 (mild) may require 5 violations before the final action; tier 3 (severe) may escalate after just 1 violation number of violations the count of times a user has triggered a policy mapped to a specific tier when the user reaches the required number of violations for that tier, the system applies the final action for that tier example if tier 1 requires 5 violations, the user moves through each step (e g , warning → content restriction → temporary suspension) until the 5th violation triggers the final action actions each policy violation within a tier is linked to a specific moderation action nima automatically applies this action whenever a violation occurs, ensuring consistent enforcement example a tier 2 violation for “hate speech” might automatically remove the content and issue a warning to the user reset time the period after which strikes and violations expire if no further policy violations occur this allows a user’s strike count to decrease over time, giving them the opportunity to “reset” their enforcement history example a strike with a reset time of 90 days will be removed if the user has no additional violations during that period how to set up a strike system strike systems are configured in settings > account strike system define the scope if you want the strike system to target a specific subset of content, use the criteria similarly to the rule engine you can define the scope based on detection source, labels, content types, or any custom attributes this allows you to create strike systems for different parts of your platform independently if you will be using only one strike system, you do not have to set up criteria create tiers each strike system can have multiple tiers , representing different levels of severity of the policies you will map to the strike system for each tier, configure actions – the moderation action applied when a user reaches a precise number of violations you can select any moderation action you have selected reset time – how long a tier remains active before violations expire, allowing the user’s strike count to decrease over time (e g , 30 days) map your policies to tiers once your strike system is configured, map your policies to the tiers of the strike system in policy configuration for each policy or sub policy, choose the tier that matches its severity within the strike system milder policies should map to lower tiers (requiring more violations to escalate), while severe policies map to higher tiers (escalating after fewer violations) when a user violates a mapped policy, they will either enroll in the tier or continue progressing within it according to the number of accumulated violations example mapping policies to tiers and applying the strike system policies and tiers tier 1 (lower severity) hate speech, discrimination tier 2 (higher severity) harassment, csam strike system a – tier 1 1 infringement → mute/chat 1 day 2 infringements → mute/chat 3 days 3 infringements → mute/chat 5 days 4 infringements → ban/game 7 days 5 infringements → permanent ban reset time 30 days a user who accumulates 2 hate speech violations and 3 discrimination violations reaches 5 violations in tier 1, triggering a permanent ban strike system a – tier 2 (higher severity) 1 infringement → ban/game 7 days 2 infringements → permanent ban reset time 30 days tier 2 allows users to start at a higher strike level for more severe violations for example, accumulating 2 tier 2 violations (e g , 1 harassment + 1 csam) triggers a permanent ban, so fewer violations are needed compared with tier 1 setting up policies and moderation outcomes to enable this model, the platform’s trust & safety team must translate its existing policy playbook or enforcement guidelines into the nima system this involves two key steps nima adopts a policy centric moderation model content is assessed against your policies when a policy is selected, either through a proactive detection rule docid\ hfx29ae9o5ktdj2 dokxb or by moderators reviewing content, nima automatically applies the action docid\ lsfkyoqsi365x alowit1 assigned to that policy therefore, moderators and rules don’t choose outcomes or actions directly they determine which policy applies (for example, violence or hate speech) to a case where to configure policies setting up your policies is among the key steps to start using nima you can configure policies and subpolicies in settings > policies each policy or sub policy is mapped to two key values moderation actions docid\ lsfkyoqsi365x alowit1 or tiers docid\ imksiysjewzvi5ne93o7n the moderation outcome that will automatically be triggered when each policy is applied (e g , removal, warning, restriction) this is a mandatory requirement to ensure your policies are enforced through nima dsa infringement category to classify every decision for backend storage and reporting, enabling automatic inclusion in dsa transparency reporting docid\ ncsbihsujuk1eaeey bh2 the policies configured will appear in the moderation console docid\ angnoc xkytjfghpyhxmz , in the queues docid\ azsiick6eehflzwytql v , in the rule engine and all the other relevant areas of nima note that if a policy has subpolicies , then actions, infringement categories, and rules will be at the subpolicy level how policies are enforced policies are enforceable through automated moderation when a content is considered violative according to a proactive detection rule docid\ hfx29ae9o5ktdj2 dokxb (classified by the ai engine in the 🔴 threshold), nima automatically applies a policy or a subpolicy and the associated action or tier human moderation when a moderator reviewing a case in a queue selects a policy or a subpolicy moderators receive cases in queues according to the queue routing rules docid\ hfx29ae9o5ktdj2 dokxb or proactive detection rules docid\ hfx29ae9o5ktdj2 dokxb (when the content is classified by the ai engine in the 🟠 threshold) for step by step guidance on creating rules, go to rule configuration docid\ hfx29ae9o5ktdj2 dokxb how to visualise policies in moderation queues in human moderation, policies are selected by moderators when triaging cases in queues in nima, you can set up what policies appear in what queue through queue configuration docid\ azsiick6eehflzwytql v to delete a policy, first remove all rules assigned to it once the policy has no attached rules, you will be able to delete it from the admin interface set up moderation outcomes and link to policies https //app archbee com/docs/wx67zxrs3fmge1bguui2t/lsfkyoqsi365x alowit1 define all possible moderation outcomes within your platform (e g , remove content, suspend user, add label, escalate to human review, etc) to ensure consistent enforcement, nima offers two distinct methods for defining moderation outcomes and linking them to your policies direct moderation actions docid\ lsfkyoqsi365x alowit1 a streamlined approach where specific policies or sub policies are tied directly to a fixed outcome or action docid\ lsfkyoqsi365x alowit1 this is ideal for "zero tolerance" violations where a specific breach (e g , hate speech) always triggers a specific response (e g , immediate content removal) the strike engine (tiered response) docid\ imksiysjewzvi5ne93o7n a more flexible, progressive model designed to manage recurring behavior instead of a single fixed outcome, violations are tracked over time users move through escalating tiers docid\ imksiysjewzvi5ne93o7n —such as starting with an initial warning, followed by a 24 hour suspension, and eventually a permanent ban if the behavior persists