Key Responsibilities:
- Extract structured variables from unstructured legal text datasets using regular expressions (regex) and other text-parsing techniques.
- Manually review extracted variables to ensure accuracy, correcting any misclassifications or incomplete extractions.
- Develop and apply validation tests to systematically check that extracted data aligns with legal definitions, case contexts, and expected formats.
- Iterate on extraction rules based on observed patterns, edge cases, and evolving dataset needs, refining regex and other text-processing approaches.
- Maintain detailed documentation of extraction logic, validation procedures, and any manual review criteria for transparency and reproducibility.
- Collaborate with analysts and researchers to align text extraction with broader research goals, ensuring outputs are fit for downstream analysis.
- Identify and address anomalies by systematically checking for missing, incorrect, or inconsistent values in extracted datasets.
- Automate parts of the extraction process where possible, while ensuring that human review remains a core quality-control step.
- Ensure zero-tolerance for errors in final outputs through a combination of automated checks, peer review, and thorough manual validation.
Application Requirements:
- Familiarity with specific software, tools, and languages:
- Proficient in R, including dplyr, stringr, and other relevant text-processing packages.
- Fluent in leveraging ChatGPT, Claude, and similar AI tools for troubleshooting, automating data extraction, and refining regex-based methods.
- Fluent in using GitHub.
- R scripts / github codebase for review
- Location:Remote.
- Compensation: Unpaid
- Time Commitment (expected number of hours per week): 10-20.
- Hiring Manager Contact Email Address:odedoren@scrutinize.org