file-deduplicatorFind and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.
Install via ClawdBot CLI:
clawdbot install Michael-laffin/file-deduplicatorVernox Utility Skill - Clean up your digital hoard.
File-Deduplicator is an intelligent file duplicate finder and remover. Uses content hashing to identify identical files across directories, then provides options to remove duplicates safely.
clawhub install file-deduplicator
const result = await findDuplicates({
directories: ['./documents', './downloads', './projects'],
options: {
method: 'content', // content-based comparison
includeSubdirs: true
}
});
console.log(`Found ${result.duplicateCount} duplicate groups`);
console.log(`Potential space savings: ${result.spaceSaved}`);
const result = await removeDuplicates({
directories: ['./documents', './downloads'],
options: {
method: 'content',
keep: 'newest', // keep newest, delete oldest
action: 'delete', // or 'move' to archive
autoConfirm: false // show confirmation for each
}
});
console.log(`Removed ${result.filesRemoved} duplicates`);
console.log(`Space saved: ${result.spaceSaved}`);
const result = await removeDuplicates({
directories: ['./documents', './downloads'],
options: {
method: 'content',
keep: 'newest',
action: 'delete',
dryRun: true // Preview without actual deletion
}
});
console.log('Would remove:');
result.duplicates.forEach((dup, i) => {
console.log(`${i+1}. ${dup.file}`);
});
findDuplicatesFind duplicate files across directories.
Parameters:
directories (array|string, required): Directory paths to scanoptions (object, optional):method (string): 'content' | 'size' | 'name' - comparison methodincludeSubdirs (boolean): Scan recursively (default: true)minSize (number): Minimum size in bytes (default: 0)maxSize (number): Maximum size in bytes (default: 0)excludePatterns (array): Glob patterns to exclude (default: ['.git', 'node_modules'])whitelist (array): Directories to never scan (default: [])Returns:
duplicates (array): Array of duplicate groupsduplicateCount (number): Number of duplicate groups foundtotalFiles (number): Total files scannedscanDuration (number): Time taken to scan (ms)spaceWasted (number): Total bytes wasted by duplicatesspaceSaved (number): Potential savings if duplicates removedremoveDuplicatesRemove duplicate files based on findings.
Parameters:
directories (array|string, required): Same as findDuplicatesoptions (object, optional):keep (string): 'newest' | 'oldest' | 'smallest' | 'largest' - which to keepaction (string): 'delete' | 'move' | 'archive'archivePath (string): Where to move files when action='move'dryRun (boolean): Preview without actual actionautoConfirm (boolean): Auto-confirm deletionssizeThreshold (number): Don't remove files larger than thisReturns:
filesRemoved (number): Number of files removed/movedspaceSaved (number): Bytes savedgroupsProcessed (number): Number of duplicate groups handledlogPath (string): Path to action logerrors (array): Any errors encounteredanalyzeDirectoryAnalyze a single directory for duplicates.
Parameters:
directory (string, required): Path to directoryoptions (object, optional): Same as findDuplicates optionsReturns:
fileCount (number): Total files in directorytotalSize (number): Total bytes in directoryduplicateSize (number): Bytes in duplicate filesduplicateRatio (number): Percentage of files that are duplicatesconfig.json:{
"detection": {
"defaultMethod": "content",
"sizeTolerancePercent": 0, // exact match only
"nameSimilarity": 0.7, // 0-1, lower = more similar
"includeSubdirs": true
},
"removal": {
"defaultAction": "delete",
"defaultKeep": "newest",
"archivePath": "./archive",
"sizeThreshold": 10485760, // 10MB threshold
"autoConfirm": false,
"dryRunDefault": false
},
"exclude": {
"patterns": [".git", "node_modules", ".vscode", ".idea"],
"whitelist": ["important", "work", "projects"]
}
}
const result = await findDuplicates({
directories: '~/Documents',
options: {
method: 'content',
includeSubdirs: true
}
});
console.log(`Found ${result.duplicateCount} duplicate sets`);
result.duplicates.slice(0, 5).forEach((set, i) => {
console.log(`Set ${i+1}: ${set.files.length} files`);
console.log(` Total size: ${set.totalSize} bytes`);
});
const result = await removeDuplicates({
directories: '~/Documents',
options: {
keep: 'newest',
action: 'delete'
}
});
console.log(`Removed ${result.filesRemoved} files`);
console.log(`Saved ${result.spaceSaved} bytes`);
const result = await removeDuplicates({
directories: '~/Downloads',
options: {
keep: 'newest',
action: 'move',
archivePath: '~/Documents/Archive'
}
});
console.log(`Archived ${result.filesRemoved} files`);
console.log(`Safe in: ~/Documents/Archive`);
const result = await removeDuplicates({
directories: '~/Documents',
options: {
dryRun: true // Just show what would happen
}
});
console.log('=== Dry Run Preview ===');
result.duplicates.forEach((set, i) => {
console.log(`Would delete: ${set.toDelete.join(', ')}`);
});
Won't remove files larger than configurable threshold (default: 10MB). Prevents accidental deletion of important large files.
Move files to archive directory instead of deleting. No data loss, full recoverability.
All deletions/moves are logged to file for recovery and audit.
Log file can be used to restore accidentally deleted files (limited undo window).
MIT
Find duplicates. Save space. Keep your system clean. 🔮
Generated Mar 1, 2026
Content creators often accumulate duplicate photos, videos, and project files across multiple drives or cloud storage. This skill can scan directories to identify and remove redundant media, freeing up storage space for new projects and reducing backup costs. It helps maintain a clean digital workspace by keeping only the latest or highest-quality versions.
Legal firms handle numerous versions of contracts, briefs, and case files, leading to duplicate documents that cause confusion and waste storage. This skill can detect and archive or delete older duplicates, ensuring only the most current versions are accessible. This improves efficiency, reduces errors in document retrieval, and optimizes backup systems.
Developers often have duplicate source code files, libraries, or build artifacts across projects, consuming valuable SSD space. This skill can scan project directories to remove redundant files like duplicate node_modules or cached builds, speeding up development workflows. It also helps in maintaining cleaner repositories and reducing deployment sizes.
IT administrators manage backup systems that may contain redundant copies of files across multiple backups, increasing storage costs and complexity. This skill can analyze backup directories to identify and remove duplicate files, saving storage space and improving backup efficiency. It ensures critical data is preserved while eliminating unnecessary bloat.
Home users often have duplicate downloads, photos, and documents scattered across devices, leading to clutter and wasted storage. This skill can scan personal directories to find and remove duplicates, helping users reclaim disk space and organize their digital files. It's ideal for maintaining a tidy system and preventing data loss from disorganization.
Offer a basic version for free with limited scans or features, and a paid tier for advanced options like batch processing, cloud integration, and priority support. This model attracts individual users and small businesses, generating revenue from subscriptions while building a user base through free access.
Sell licenses to large organizations such as corporations or government agencies, providing custom integrations, dedicated support, and enhanced security features. This model targets industries with high storage needs, like media or IT, and generates significant revenue through one-time or annual license fees.
Package the skill as a white-label tool for other software companies to integrate into their products, such as backup software or file management systems. This model generates revenue through licensing fees or royalties, expanding market reach without direct customer acquisition costs.
💬 Integration Tip
Integrate this skill into existing file management workflows by using its API functions like findDuplicates and removeDuplicates, and ensure to test with dry-run mode first to prevent accidental data loss.
Advanced filesystem operations - listing, searching, batch processing, and directory analysis for Clawdbot
Perform advanced filesystem tasks including listing, recursive searching by name or content, batch copying/moving/deleting files, and analyzing directory siz...
Essential SSH commands for secure remote access, key management, tunneling, and file transfers.
Extract text from PDF files for LLM processing
The directory for AI agent services. Discover tools, platforms, and infrastructure built for agents.
Advanced filesystem operations - listing, searching, batch processing, and directory analysis for Clawdbot