Policy Playbook Configuration
11 min
policy centric moderation in nima nima is built around a policy centric moderation model in this model, moderators do not directly choose a moderation action (e g "remove," "warn," "restrict access") instead, they select a policy such as "hate speech", "harassment", or "misinformation" that represents the platform's rule being enforced policy centricity helps nima deliver consistent, transparent, and adaptable moderation flow as every moderation decision is grounded in clear, documented policies rather than inconsistent, ad hoc actions this builds trust and transparency, ensures fairness and consistency, and keeps platforms ready for new risks and regulations from policy → action to policy → strike engine as detailed in docid\ lsfkyoqsi365x alowit1 , each policy or sub policy can be linked to a single moderation action this policy → action model ensures consistent enforcement once a policy is matched, the assigned action is automatically applied however, this model has limitations it does not take into account whether a user has previously violated other policies, nor does it support progressive enforcement for repeated violations of the same policy the strike system addresses these gaps it allows nima to track users’ violations over time and apply progressive consequences based on their history for example, a first violation might trigger a warning, a second violation could result in a temporary suspension, and repeated violations could escalate to permanent account restrictions strike system glossary before detailing how to set up your strike system, a few key notions true 125,462 unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type 1 1 unhandled content type 1 1 unhandled content type 1 1 unhandled content type 1 1 unhandled content type how to set up a strike system strike systems are configured in settings > account strike system define the scope if you want the strike system to target a specific subset of content, use the criteria similarly to the rule engine you can define the scope based on detection source, labels, content types, or any custom attributes this allows you to create strike systems for different parts of your platform independently if you will be using only one strike system, you do not have to set up criteria create tiers each strike system can have multiple tiers , representing different levels of severity of the policies you will map to the strike system for each tier, configure actions – the moderation action applied when a user reaches a precise number of violations you can select any moderation action you have selected reset time – how long a tier remains active before violations expire, allowing the user’s strike count to decrease over time (e g , 30 days) map your policies to tiers once your strike system is configured, map your policies to the tiers of the strike system in policy configuration for each policy or sub policy, choose the tier that matches its severity within the strike system milder policies should map to lower tiers (requiring more violations to escalate), while severe policies map to higher tiers (escalating after fewer violations) when a user violates a mapped policy, they will either enroll in the tier or continue progressing within it according to the number of accumulated violations true 62,318 1346153846154,320 86538461538463 1 unhandled content type 3 1 unhandled content type setting up policies and moderation outcomes to enable this model, the platform’s trust & safety team must translate its existing policy playbook or enforcement guidelines into the nima system this involves two key steps nima adopts a policy centric moderation model content is assessed against your policies when a policy is selected, either through a docid\ hfx29ae9o5ktdj2 dokxb or by moderators reviewing content, nima automatically applies the docid\ lsfkyoqsi365x alowit1 assigned to that policy therefore, moderators and rules don’t choose outcomes or actions directly they determine which policy applies (for example, violence or hate speech) to a case where to configure policies setting up your policies is among the key steps to start using nima you can configure policies and subpolicies in settings > policies each policy or sub policy is mapped to two key values docid\ lsfkyoqsi365x alowit1 or docid\ imksiysjewzvi5ne93o7n the moderation outcome that will automatically be triggered when each policy is applied (e g , removal, warning, restriction) this is a mandatory requirement to ensure your policies are enforced through nima dsa infringement category to classify every decision for backend storage and reporting, enabling automatic inclusion in dsa docid\ ncsbihsujuk1eaeey bh2 the policies configured will appear in the docid\ angnoc xkytjfghpyhxmz , in the docid\ azsiick6eehflzwytql v , in the rule engine and all the other relevant areas of nima note that if a policy has subpolicies , then actions, infringement categories, and rules will be at the subpolicy level how policies are enforced policies are enforceable through automated moderation when a content is considered violative according to a docid\ hfx29ae9o5ktdj2 dokxb (classified by the ai engine in the 🔴 threshold), nima automatically applies a policy or a subpolicy and the associated action or tier human moderation when a moderator reviewing a case in a queue selects a policy or a subpolicy moderators receive cases in queues according to the docid\ hfx29ae9o5ktdj2 dokxb or docid\ hfx29ae9o5ktdj2 dokxb (when the content is classified by the ai engine in the 🟠threshold) for step by step guidance on creating rules, go to docid\ hfx29ae9o5ktdj2 dokxb how to visualise policies in moderation queues in human moderation, policies are selected by moderators when triaging cases in queues in nima, you can set up what policies appear in what queue through docid\ azsiick6eehflzwytql v to delete a policy, first remove all rules assigned to it once the policy has no attached rules, you will be able to delete it from the admin interface set up moderation outcomes and link to policies https //app archbee com/docs/wx67zxrs3fmge1bguui2t/lsfkyoqsi365x alowit1 define all possible moderation outcomes within your platform (e g , remove content, suspend user, add label, escalate to human review, etc) to ensure consistent enforcement, nima offers two distinct methods for defining moderation outcomes and linking them to your policies docid\ lsfkyoqsi365x alowit1 a streamlined approach where specific policies or sub policies are tied directly to a fixed outcome or docid\ lsfkyoqsi365x alowit1 this is ideal for "zero tolerance" violations where a specific breach (e g , hate speech) always triggers a specific response (e g , immediate content removal) docid\ imksiysjewzvi5ne93o7n a more flexible, progressive model designed to manage recurring behavior instead of a single fixed outcome, violations are tracked over time users move through escalating docid\ imksiysjewzvi5ne93o7n —such as starting with an initial warning, followed by a 24 hour suspension, and eventually a permanent ban if the behavior persists