![]() The results might contain duplicate tags that must be handled in the postprocessing of results. They were filtered out and using BeautifulSoup, README.md was scraped out. Over the 1000 articles from the dataset, only 870 had tags and the readme was longer than 50 characters. Output = model.generate(**inputs, num_beams= 8, do_sample= True, min_length= 10,ĭecoded_output = tokenizer.batch_decode(output, skip_special_tokens= True) Inputs = tokenizer(, max_length= 1536, truncation= True, return_tensors= "pt") Readme = readme_extractor(github_repo_url) Main Function def github_tags_generate( github_repo_url): If tag.strip() in final_tags or len(tag.strip()) <= 1: Postprocess Tags def post_process_tags( tag_string): Return "README_NOT_MARKDOWN" def clean_readme( readme): Readme_soup = BeautifulSoup(readme_html_content, "html.parser") Readme_html_content = requests.get(readme_raw_url ).text Readme_raw_url = readme_raw_url.replace( "", "") ![]() Readme_raw_url = readme_url.replace( "/blob/", "/") Soup = BeautifulSoup(html_content, "html.parser") Html_content = requests.get(github_repo_url).text _md.stripTopLevelTags = False def unmark( text): Preprocessing # Script to convert Markdown to plain text # Reference : Stackoverflow = def unmark_element( element, stream= None): Imports from transformers import AutoTokenizer, AutoModelForSeq2SeqLM Pip install transformers nltk clean-text beautifulsoup4 The Inference API here expects a cleaned readme text, the code for cleaning the readme is also given below.įinetuning Notebook Reference: Hugging face summarization notebook. While usually formulated as a multi-label classification problem, this model deals with tag generation as a text2text generation task (inspiration and reference: fabiochiu/t5-base-tag-generation). This model is a fine-tuned version of t5-small fine-tuned on a collection of repositoreis from Kaggle/vatsalparsaniya/github-repositories-analysis. Mobile Solution Architect, #Android and #Flutter Developer, #Dart, Maybe #Go, #Founder of Navoki.Machine Learning model to generate Tags for Github Repositories based on their Documentation. It will encourage me to make more videos and tutorials.Ĭomment on youtube channel for more tutorials Project Created & Maintained By SHIVAM SRIVASTAVA If you found this project helpful then show some support by ⭐ the repo and subscribe to my YoutubeChannel and Newsletter for latest updates in dev world. This project is open to all kinds of contribution in all of its categories.You can add more features and bug fixes in this code.ĭO NOT send PR for rename of file and variables, formatting code or other low-quality changes.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |