udd

uddup

Urls de-duplication tool for better recon.

Showing:

Popularity

Downloads/wk

0

GitHub Stars

125

Maintenance

Last Commit

8mos ago

Contributors

2

Package

Dependencies

0

License

Categories

Readme

UDdup - URLs Deduplication Tool

The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive and points to the same web template.

For example:

https://www.example.com/product/123
https://www.example.com/product/456
https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

All the above are probably points to the same product "template". Therefore it should be enough to scan only some of these URLs by our various scanners.

The result of the above after UDdup should be:

https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

Why do I need it?

Mostly for better (automated) reconnaissance process, with less noise (for both the tester and the target).

Examples

Take a look at demo.txt which is the raw URLs file which results in demo-results.txt.


Installation

pip install uddup

Manual (from code)

# Clone the repository.
git clone https://github.com/rotemreiss/uddup.git

# Install the Python requirements.
cd uddup
pip install -r requirements.txt

Usage

uddup -u demo.txt -o ./demo-result.txt

More Usage Options

uddup -h

Short FormLong FormDescription
-h--helpShow this help message and exit
-u--urlsFile with a list of urls
-o--outputSave results to a file
-s--silentPrint only the result URLs
-fp--filter-pathFilter paths by a given Regex

Filter Paths by Regex

Allows filtering custom paths pattern. For example, if we would like to filter all paths that starts with /product we will need to run:

# Single Regex
uddup -u demo.txt -fp "^product"

Input:

https://www.example.com/
https://www.example.com/privacy-policy
https://www.example.com/product/1
https://www.example2.com/product/2
https://www.example3.com/product/4

Output:

https://www.example.com/
https://www.example.com/privacy-policy

Advanced Regex with multiple path filters

uddup -u demo.txt -fp "(^product)|(^category)"

Contributing

Feel free to fork the repository and submit pull-requests.


Support

Create new GitHub issue

Want to say thanks? :) Message me on Linkedin


License

License

Rate & Review

Great Documentation0
Easy to Use0
Performant0
Highly Customizable0
Bleeding Edge0
Responsive Maintainers0
Poor Documentation0
Hard to Use0
Slow0
Buggy0
Abandoned0
Unwelcoming Community0
100
No reviews found
Be the first to rate

Alternatives

No alternatives found

Tutorials

No tutorials found
Add a tutorial