In this package, you can use LLM decontaminator to quantify a dataset's rephrased samples relative to a benchmark. Based on the detection results, you can estimate the contamination of rephrased ...
You might want to split the extracted audio into multiple parts for sampling purposes (e.g. training AI voice models or like Tortoise-tts) ...