<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://me.tjh.sg/blog</id>
    <title>Jing Hua Blog</title>
    <updated>2023-04-10T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://me.tjh.sg/blog"/>
    <subtitle>Jing Hua Blog</subtitle>
    <icon>https://me.tjh.sg/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[BetterChatGPT - An amazing UI for OpenAI's ChatGPT]]></title>
        <id>betterchatgpt</id>
        <link href="https://me.tjh.sg/blog/betterchatgpt"/>
        <updated>2023-04-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Better ChatGPT is the ultimate destination for anyone who wants to experience the limitless power of conversational AI. With no limits and completely free to use for all, our app harnesses the full potential of OpenAI's ChatGPT API to offer you an unparalleled chatbot experience.]]></summary>
        <content type="html"><![CDATA[<p>See BetterChatGPT here: <a href="https://github.com/ztjhz/BetterChatGPT" target="_blank" rel="noopener noreferrer">https://github.com/ztjhz/BetterChatGPT</a></p>]]></content>
        <author>
            <name>Jing Hua</name>
            <uri>https://github.com/ztjhz</uri>
        </author>
        <author>
            <name>Ayaka</name>
            <uri>https://github.com/ayaka14732</uri>
        </author>
        <category label="chatgpt" term="chatgpt"/>
        <category label="nlp" term="nlp"/>
        <category label="deeplearning" term="deeplearning"/>
        <category label="ai" term="ai"/>
        <category label="react" term="react"/>
        <category label="tailwindcss" term="tailwindcss"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[ZilKin - The Future of Zilliqa Smart Contracts]]></title>
        <id>zilkin</id>
        <link href="https://me.tjh.sg/blog/zilkin"/>
        <updated>2023-01-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How I played a part in paving the way for a new era of innovation in the world of Zilliqa]]></summary>
        <content type="html"><![CDATA[<p><em>My Journey in Paving the Way for a New Era of Innovation in the World of Zilliqa</em></p><p><img loading="lazy" alt="zilkin" src="/assets/images/zilkin-landing-bc513e0df753c0629a5e28dd1ec1b3ff.png" width="1090" height="641" class="img_ev3q"></p><div class="theme-admonition theme-admonition-info alert alert--info admonition_LlT9"><div class="admonitionHeading_tbUL"><span class="admonitionIcon_kALy"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_S0QG"><p>ZilKin is a Scilla Contracts Deployment Tool that empowers you to deploy your Scilla contracts effortlessly via our a) Contracts Wizard (Interactive Code Generator) or b) Automatic Deployment Contract.</p></div></div><p><img loading="lazy" alt="zilhive" src="/assets/images/zilhive-6f36f95c5ae907d0d48072566cf5934b.png" width="1400" height="788" class="img_ev3q"></p><blockquote><p>Over the 2022 Winter (Dec 2022 - Jan 2023), I was beyond thrilled to have been handpicked as one of the <strong>elite 20</strong> undergraduate students from the APAC region to participate in the highly acclaimed <a href="https://www.zilhive.org/zilhive-student-practicum/" target="_blank" rel="noopener noreferrer">ZilHive Student Practicum</a>. This program was an absolute game-changer for me, providing me with unparalleled opportunities to immerse myself in the world of cutting-edge blockchain and Web3 development.</p></blockquote><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-spark-of-inspiration">The Spark of Inspiration<a class="hash-link" href="#the-spark-of-inspiration" title="Direct link to heading">​</a></h2><p>As a developer, I am driven by the burning passion to craft solutions that not only simplify the lives of my peers but also reshape the world as we know it. That's why, when I was privileged to be a part of the infrastructure track at the ZilHive Student Practicum, I knew that this was a chance to leave an indelible mark on the developer community. My objective was to conceive a tool that would make the process of deploying Scilla smart contracts a breeze, and accessible for developers of all skill levels.</p><p><img loading="lazy" alt="scilla" src="/assets/images/scilla-5258586b92c0954f2a1dda6d0d07fee7.png" width="1200" height="600" class="img_ev3q"></p><p>The inspiration for this idea came to me when I realized that <a href="https://scilla-lang.org/" target="_blank" rel="noopener noreferrer">Scilla</a>, a powerful and versatile language, was often underutilized due to its complexity and lack of resources. I realized that if I could create a tool that would make it easy for developers to deploy Scilla smart contracts, it would have a transformative impact on the developer community.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="building-a-masterpiece">Building a Masterpiece<a class="hash-link" href="#building-a-masterpiece" title="Direct link to heading">​</a></h2><p>To bring this idea to life, my team and I decided to harness the power of React, tailwindcss, and Scilla. <a href="https://github.com/facebook/react" target="_blank" rel="noopener noreferrer">React</a>, a widely adopted JavaScript library, was chosen for its user-friendly interface, while <a href="https://github.com/tailwindlabs/tailwindcss" target="_blank" rel="noopener noreferrer">tailwindcss</a>, a CSS framework, was used to infuse elegance and aesthetic appeal to the tool. <a href="https://github.com/Zilliqa/scilla" target="_blank" rel="noopener noreferrer">Scilla</a>, a smart contract language, was chosen for its powerful features and versatility. However, to fully grasp Scilla and create a successful solution, I had to immerse myself in the language, learning about smart contracts and the intricacies of the Scilla language. It was a challenging but enlightening journey.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="unveiling-zilkin-a-revolutionary-scilla-smart-contracts-deployment-tool">Unveiling Zilkin: A Revolutionary Scilla Smart Contracts Deployment Tool<a class="hash-link" href="#unveiling-zilkin-a-revolutionary-scilla-smart-contracts-deployment-tool" title="Direct link to heading">​</a></h2><p>After a journey of tireless dedication and relentless innovation, my team and I stand before you, beaming with pride as we unveil <a href="https://zilkin.tjh.sg/" target="_blank" rel="noopener noreferrer">Zilkin</a> - a revolutionary tool that shatters the barriers of Scilla smart contract deployment. With our state-of-the-art <a href="https://zilkin.tjh.sg/contracts-wizard" target="_blank" rel="noopener noreferrer">Contracts Wizard</a> (Interactive Code Generator) and <a href="https://zilkin.tjh.sg/deploy" target="_blank" rel="noopener noreferrer">Automatic Contract Deployment</a> feature, we have empowered developers to effortlessly weave their ideas into the fabric of the Zilliqa ecosystem with unprecedented ease and speed.</p><p><img loading="lazy" alt="contracts wizard" src="/assets/images/contracts-wizard-sample-792c2e2aad38f2da13faffde98f34421.png" width="1623" height="1071" class="img_ev3q"></p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="unveiling-the-future-of-scilla">Unveiling the Future of Scilla<a class="hash-link" href="#unveiling-the-future-of-scilla" title="Direct link to heading">​</a></h2><p><img loading="lazy" alt="future of scilla" src="/assets/images/inspirational-190a511ec52952c5c4fe6477f75e203b.jpg" width="5616" height="3744" class="img_ev3q"></p><p>As I stood before the esteemed mentors and fellow developers of ZilHive, my team and I knew we were presenting something truly special. When we unveiled our solution, a tool of unparalleled elegance and simplicity, it was met with gasps of wonder and applause. The user-friendly interface, intuitive design, and ease of use had captivated the audience, and they could see the potential impact it could have on the developer community. With its revolutionary capabilities, this tool would change the game, ignite a new era of development, and empower a new generation of developers to bring their wildest dreams to life. Zilkin is not just a tool, but a bridge to a brighter future, where the complexities of Scilla are no longer a barrier, and the true potential of the Zilliqa ecosystem can be fully realized!</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="a-new-era-of-scilla">A New Era of Scilla<a class="hash-link" href="#a-new-era-of-scilla" title="Direct link to heading">​</a></h2><p>I am proud that our innovation has become a beacon of hope, a shining light that illuminates the path to a brighter future for Scilla. It is a revolutionary tool that will make the deployment of Scilla smart contracts as effortless as a gentle breeze on a warm summer's day. It will empower developers, both the seasoned veterans and the eager newcomers, to harness the full potential of Scilla's power and versatility. With this solution, the horizon for Scilla is limitless and the possibilities are endless. I am honored to be a part of this transformative change that will shape the developer community for years to come and pave the way for a new era of innovation in the world of Scilla.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="mentored-by-the-masters">Mentored by the Masters<a class="hash-link" href="#mentored-by-the-masters" title="Direct link to heading">​</a></h2><p>With gratitude in my heart, I extend my deepest thanks to the esteemed mentors of Zilliqa - <a href="https://www.linkedin.com/in/kevin-meyer-44016399/" target="_blank" rel="noopener noreferrer">Kevin Meyer</a>, <a href="https://www.linkedin.com/in/tom-fleetham-34579122/" target="_blank" rel="noopener noreferrer">Tom Fleetham</a>, <a href="https://www.linkedin.com/in/elliottgreencompsci/" target="_blank" rel="noopener noreferrer">Elliott Green</a>, and <a href="https://www.linkedin.com/in/bradley-laws-aa23aa19/" target="_blank" rel="noopener noreferrer">Bradley Laws</a> - for their unwavering guidance and support. My journey would not have been possible without their invaluable wisdom and expertise. I am also forever grateful to the brilliant trainer, <a href="https://www.linkedin.com/in/leeweimeng/" target="_blank" rel="noopener noreferrer">Wei-Meng Lee</a>, for imparting his knowledge and skills to me and helping me to reach new heights. Their mentorship has been a true blessing, and I am eternally grateful for the opportunity to learn from such esteemed individuals in the industry.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-bridge-to-a-brighter-future">The Bridge to a Brighter Future<a class="hash-link" href="#the-bridge-to-a-brighter-future" title="Direct link to heading">​</a></h2><p>As I look back on the journey that led me to the creation of Zilkin, I am filled with a sense of wonder and gratitude. The road was long, and the challenges were many, but the rewards have been immeasurable. I have not only developed a tool that will change the way Scilla smart contracts are deployed, but I have also grown as a developer and as a person.</p><p>Through the mentorship and guidance of the esteemed Zilliqa team, I have been able to tap into my full potential and create something truly special. The future is bright for Scilla and the developer community, and I am honored to have played a small part in that.</p><p>As the future unfolds, I can't wait to see the impact Zilkin will have and the doors it will open for developers everywhere. This is just the beginning of a new era, where the power of Scilla will be accessible to all and the true potential of the Zilliqa ecosystem will be fully realized. Thank you for joining me on this journey, and I look forward to seeing where Zilkin will take us next.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="links">Links<a class="hash-link" href="#links" title="Direct link to heading">​</a></h2><ul><li><a href="https://github.com/xJQx/zilkin" target="_blank" rel="noopener noreferrer">Source code</a></li><li><a href="https://zilkin.tjh.sg/" target="_blank" rel="noopener noreferrer">Website</a></li></ul><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-team">The team<a class="hash-link" href="#the-team" title="Direct link to heading">​</a></h2><p><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz"><img src="https://github.com/ztjhz.png" height="60" width="60"></a></p><div><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz">Jing Hua</a></div><a class="container_T1_I" href="https://github.com/xjqx" title="ztjhz"><img src="https://github.com/xjqx.png" height="60" width="60"><div>Jing Qiang</div></a><p></p>]]></content>
        <author>
            <name>Jing Hua</name>
            <uri>https://github.com/ztjhz</uri>
        </author>
        <category label="web3" term="web3"/>
        <category label="blockchain" term="blockchain"/>
        <category label="react" term="react"/>
        <category label="tailwindcss" term="tailwindcss"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[ByteVid - Deep Learning Hackathon 1st Place]]></title>
        <id>bytevid</id>
        <link href="https://me.tjh.sg/blog/bytevid"/>
        <updated>2022-10-03T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Say goodbye to long and boring videos! 👋 - MLDA Deep Learning Hackathon 2022 🥇 1st Place]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="Prize Presentation" src="/assets/images/prize-presentation-c8b69937659d770b867ac4576a1ad9cb.jpeg" width="3265" height="1837" class="img_ev3q"></p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="say-goodbye-to-long-and-boring-videos-">Say goodbye to long and boring videos! 👋<a class="hash-link" href="#say-goodbye-to-long-and-boring-videos-" title="Direct link to heading">​</a></h2><div class="theme-admonition theme-admonition-info alert alert--info admonition_LlT9"><div class="admonitionHeading_tbUL"><span class="admonitionIcon_kALy"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_S0QG"><p>Powered by the cutting-edge deep learning technologies in 2022, ByteVid transforms long, boring videos into fun byte-sized content. Be it a one hour long lecture, or a 30-minute zoom meeting, ByteVid can transcribe, summarise the content, extract keywords, detect and extract important slides from the video, and translate into other languages.</p></div></div><blockquote><p>Over the weekend (1 Oct 2022 - 3 Oct 2022), our team comprising of <a href="https://github.com/ztjhz" target="_blank" rel="noopener noreferrer">Jing Hua</a>, <a href="https://github.com/xjqx" target="_blank" rel="noopener noreferrer">Jing Qiang</a>, and <a href="https://github.com/ayaka14732" target="_blank" rel="noopener noreferrer">Ayaka</a>, participated in the MLDA Deep Learning Week Hackathon 2022. We worked tirelessly through exhausting days and sleepless nights, to come up with ByteVid in 48 hours. We went through 2 rounds of pitching and eventually came out on top and finished 1st out of 120 teams 🥇. Here is how we did it...</p></blockquote><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="inspiration">Inspiration<a class="hash-link" href="#inspiration" title="Direct link to heading">​</a></h2><p class="centered"><img loading="lazy" alt="smart-nation" src="/assets/images/smart-nation-a636357bdba61dd6874cb596d2fcd12d.png" class="img_ev3q"></p><p>When we first encounter the topic of ‘AI and Smart Nation’, we were extremely excited as there were tons of areas that we could explore. Mobility, healthcare, media and entertainment, agriculture, social, sustainability, etc. Hours and hours of time were spent on finding something that intrigue us, but we were unsuccessful. How can we balance our skills and aspirations with the problem we want to work on?</p><p><img loading="lazy" alt="Problem" src="/assets/images/problem-1284e416f59872bb0907096b506247cd.png" width="1809" height="784" class="img_ev3q"></p><p>The idea struck upon us as we went back to our roots as ‘students’. With Singapore’s effort in promoting Smart Nation, online recorded lectures and meetings are becoming increasingly prevalent. We struggled with online recorded lectures and meetings because it was difficult to understand what the other person was saying. Some spoke in non-native languages. Some spoke with abnormal accents. While some don’t even provide slides for us students to follow the lecture with. We struggled even more given that an average audience attention span is about 7 minutes, and that current Zoom transcription feature is not so accurate. We could not understand the video content properly and with comfort 😥</p><p>Therefore, we were inspired and motivated to build a project that will help us, and others extract video information efficiently!</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-plan">The plan<a class="hash-link" href="#the-plan" title="Direct link to heading">​</a></h2><p>We got to the drawing board and planned for hours on what features we needed and how we could get there. This is what we came up with:</p><p><img loading="lazy" alt="Diagram" src="/assets/images/diagram-4b54e9274693a4a2750a2f31ccfa2fff.png" width="1026" height="720" class="img_ev3q"></p><ol><li><p>We first obtain the video file from the user, either directly or through a YouTube link. If the user supplies a YouTube Link, we utilise <a href="https://github.com/ytdl-org/youtube-dl" target="_blank" rel="noopener noreferrer">youtube-dl</a> to download the video on to our server.</p></li><li><p>We then use ffmepg to compress and speed up the video by 1.6x to optimise the performance of the speech recognition model.</p></li><li><p>The compressed video is then passed into <a href="https://github.com/openai/whisper" target="_blank" rel="noopener noreferrer">Whisper</a>, a state-of-the-art (SOTA) speech recognition model, generating a <code>transcript</code>.</p></li><li><p>Sentences are extracted from the <code>transcript</code> using the <a href="https://github.com/microsoft/BlingFire" target="_blank" rel="noopener noreferrer">BlingFire</a> model to generate an <code>article</code>. At the same time, the <code>transcript</code> is translated into various languages using <a href="https://api.fanyi.baidu.com/doc/21" target="_blank" rel="noopener noreferrer">Baidu Translate API</a>.</p></li><li><p>The article is then passed into the <a href="https://huggingface.co/ml6team/keyphrase-extraction-kbir-inspec" target="_blank" rel="noopener noreferrer">KBIR-inspec</a> model, which extracts <code>key phrases</code>, and the <a href="https://pypi.org/project/bert-extractive-summarizer/" target="_blank" rel="noopener noreferrer">Bert Extractive Summarizer</a>, which generates a <code>summary</code>.</p></li><li><p>With the <code>summary</code> and the <code>timestamps</code> from the transcript, we utilise OpenCV to extract <code>images</code> from the video of each sentence in the summary.</p></li><li><p>With the images, we pass it into our <a href="https://github.com/ztjhz/yolov7-slides-extraction" target="_blank" rel="noopener noreferrer">YOLOv7</a>, a SOTA object detection model that we fine-tuned with manually labelled data, to generate <code>slide images</code>.</p></li></ol><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="seeing-it-in-action">Seeing it in action<a class="hash-link" href="#seeing-it-in-action" title="Direct link to heading">​</a></h2><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="speech-recognition-and-natural-language-processing">Speech recognition and natural language processing<a class="hash-link" href="#speech-recognition-and-natural-language-processing" title="Direct link to heading">​</a></h3><p>This is how our speech recognition and natural language processing looks like in action:</p><p><img loading="lazy" alt="Speech and NLP Demo" src="/assets/images/speech-nlp-demo-73721c0bb0112048daa40e0600755394.png" width="1796" height="932" class="img_ev3q"></p><p>Using 4 different deep learning models and a translation API, the video is transcribed, translated, and transformed into an article, summarised, and extracted into key phrases.</p><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="slides-extraction-using-computer-vision">Slides extraction using computer vision<a class="hash-link" href="#slides-extraction-using-computer-vision" title="Direct link to heading">​</a></h3><p>This is how our slides extraction using computer vision looks like in action:</p><p><img loading="lazy" alt="CV Demo" src="/assets/images/cv-demo-8e7d1cc52137bee98467bc72becff491.png" width="1488" height="842" class="img_ev3q"></p><p>As you can see, the slides have been extracted from the video.</p><div class="theme-admonition theme-admonition-info alert alert--info admonition_LlT9"><div class="admonitionHeading_tbUL"><span class="admonitionIcon_kALy"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_S0QG"><p>In fact, our slides detection model is even better than existing solution!</p><p><img loading="lazy" alt="CV Better Demo" src="/assets/images/cv-better-demo-f70478b9ebcf1a59256e9120c6a6e97d.png" width="1488" height="829" class="img_ev3q"></p></div></div><p>To achieve such an amazing result, we had to fine-tune the YOLOv7 model for lecture slides detection. To do that, we downloaded a diverse dataset of 200 lecture videos, ranging from computer science lectures, to business seminars, and to zoom meetings. We then manually labelled each and everyone of them using using a <a href="https://github.com/Cartucho/OpenLabeling" target="_blank" rel="noopener noreferrer">labelling software</a>. Subsequently, we trained our <a href="https://github.com/ztjhz/yolov7-slides-extraction" target="_blank" rel="noopener noreferrer">own lecture slides detection model</a> on our GPU server, and achieved fantastic results!</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="bringing-bytevid-to-the-user">Bringing ByteVid to the user<a class="hash-link" href="#bringing-bytevid-to-the-user" title="Direct link to heading">​</a></h2><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="frontend">Frontend<a class="hash-link" href="#frontend" title="Direct link to heading">​</a></h3><p>To make our solution easily accessible to the user, we created a <a href="https://github.com/xJQx/ByteVidFrontend" target="_blank" rel="noopener noreferrer">web application</a> (built with <code>React.js</code> and <code>Tailwindcss</code>, and deployed on <code>GitHub</code> pages) as well as a <a href="https://github.com/ztjhz/ByteVidExtension" target="_blank" rel="noopener noreferrer">browser extension</a>:</p><p><img loading="lazy" alt="Frontend Demo" src="/assets/images/frontend-demo-12b143689631c18c5a5eec3906ac45ee.jpg" width="1280" height="582" class="img_ev3q"></p><p>The web application features an intuitive user interface, where users are allowed to choose to submit a YouTube link or upload a video file. The language of the video and the translation language of the transcript can also be customised to their liking.</p><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="backend">Backend<a class="hash-link" href="#backend" title="Direct link to heading">​</a></h3><p>Our <a href="https://github.com/ayaka14732/ByteVid" target="_blank" rel="noopener noreferrer">backend</a> utilises a <code>Flask</code> server, which is deployed to a GPU machine. Since our GPU machine has no Internet access, we set up a relay server with autossh port forwarding to relay our GPU server port to an Internet-facing VPS. We then utilised Nginx reverse proxy to intergrate our GPU server to our existing web service API. We also utilised Cloudflare for site protection.</p><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-result">The result<a class="hash-link" href="#the-result" title="Direct link to heading">​</a></h3><p>Here is a demo of our live website:</p><p style="width:100%;height:360px"></p><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="source-code">Source code<a class="hash-link" href="#source-code" title="Direct link to heading">​</a></h3><ul><li><a href="https://github.com/ayaka14732/ByteVid" target="_blank" rel="noopener noreferrer">ByteVid</a></li><li><a href="https://github.com/xJQx/ByteVidFrontend" target="_blank" rel="noopener noreferrer">ByteVid Front-end</a></li><li><a href="https://github.com/ztjhz/ByteVidExtension" target="_blank" rel="noopener noreferrer">ByteVid Extension</a></li><li><a href="https://github.com/ztjhz/yolov7-slides-extraction" target="_blank" rel="noopener noreferrer">ByteVid YOLOv7 Slides Extraction</a></li></ul><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-benefits">The benefits<a class="hash-link" href="#the-benefits" title="Direct link to heading">​</a></h2><p>ByteVid comes with several benefits:</p><ol><li><p>It works for both business zoom meetings and recorded school lectures, saving time and energy in watching long videos.</p></li><li><p>Our translation can overcome language and accent barriers, allowing businesses to enhance their overseas inter-racial collaboration, and students to learn from any lectures of any languages.</p></li><li><p>Our extracted summary and slides serve as a template for both business executives and students to build notes upon.</p></li></ol><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="deep-learning-models-used">Deep learning models used<a class="hash-link" href="#deep-learning-models-used" title="Direct link to heading">​</a></h2><ol><li><a href="https://github.com/openai/whisper" target="_blank" rel="noopener noreferrer">Whisper</a>: SOTA speech recognition (Sep 2022)</li><li><a href="https://github.com/WongKinYiu/yolov7" target="_blank" rel="noopener noreferrer">YOLOv7</a>: SOTA object detection (Jul 2022)</li><li><a href="https://huggingface.co/ml6team/keyphrase-extraction-kbir-inspec" target="_blank" rel="noopener noreferrer">KBIR-inspec</a>: key phrase extraction (Dec 2021)</li><li><a href="https://pypi.org/project/bert-extractive-summarizer/" target="_blank" rel="noopener noreferrer">Bert Extractive Summarizer</a>: summarisation (Jun 2019)</li><li><a href="https://github.com/microsoft/BlingFire" target="_blank" rel="noopener noreferrer">BlingFire</a>: sentence extraction</li><li><a href="https://api.fanyi.baidu.com/doc/21" target="_blank" rel="noopener noreferrer">Baidu Translate API</a>: translation</li></ol><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="reflection">Reflection<a class="hash-link" href="#reflection" title="Direct link to heading">​</a></h2><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="accomplishments-that-were-proud-of">Accomplishments that we’re proud of<a class="hash-link" href="#accomplishments-that-were-proud-of" title="Direct link to heading">​</a></h3><ul><li>Building and deploying a fully functional AI product in less than 2 days</li><li>Our products are a combination of three exciting fields of AI: computer vision, natural language processing and speech processing</li><li>We build our own lecture slides dataset and CV model that is better than existing solutions</li></ul><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="challenges">Challenges<a class="hash-link" href="#challenges" title="Direct link to heading">​</a></h3><ul><li>There is no existing solution for lecture slides detection - we manually labelled hundreds of videos and images to train our own lecture slides detection model.</li><li>Our GPU server has no Internet access - we set up a relay server with autossh port forwarding.</li><li>The ffmpeg commands were complicated - when we finally succeeded in demystifying them, we feel a sense of achievement.</li><li>The speech recognition model is relatively slow - we noticed that professors usually speak slowly, so we optimised the performance by speeding up lecture videos by 1.6x before passing them into the speech recognition model.</li><li>We used up our Baidu translation API free quota during testing - we paid S$10 to buy extra quota.</li><li>The Baidu translation API has a rate limit - we split paragraph into chunks of sentences and request at a moderate speed.</li><li>There is no simple method to split paragraphs into sentences (e.g. 3.14 will become two sentences when split by periods) - we utilised the BlingFire model to solve this problem.</li></ul><h3 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-we-learned">What we learned<a class="hash-link" href="#what-we-learned" title="Direct link to heading">​</a></h3><ul><li>Deploying deep learning models on cloud server</li><li>Speech knowledge for speech recognition and video transcription</li><li>NLP knowledge for machine translation, summarisation and keyword extraction</li><li>CV knowledge for object detection and lecture slide extraction</li><li>Developing a browser extension</li></ul><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="whats-next-for-bytevid">What’s next for ByteVid<a class="hash-link" href="#whats-next-for-bytevid" title="Direct link to heading">​</a></h2><ul><li>Auto-navigation to certain timestamps in videos based on keywords</li><li>Increase support for other URLs other than YouTube</li><li>Implement a Telegram bot</li><li>Implement a mobile application</li></ul><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="hackathon-submission">Hackathon Submission<a class="hash-link" href="#hackathon-submission" title="Direct link to heading">​</a></h2><p><a href="https://devpost.com/software/bytevid" target="_blank" rel="noopener noreferrer">Devpost</a></p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="references">References<a class="hash-link" href="#references" title="Direct link to heading">​</a></h2><p>[1]<!-- --> A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” 2022.</p><p>[2]<!-- --> C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022.</p><p>[3]<!-- --> M. Kulkarni, D. Mahata, R. Arora, and R. Bhowmik, “Learning rich representation of keyphrases from text,” arXiv preprint arXiv:2112.08547, 2021.</p><p>[4]<!-- --> D. Miller, “Leveraging BERT for extractive text summarization on lectures,” arXiv preprint arXiv:1906.04165, 2019.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-team">The team<a class="hash-link" href="#the-team" title="Direct link to heading">​</a></h2><p><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz"><img src="https://github.com/ztjhz.png" height="60" width="60"></a></p><div><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz">Jing Hua</a></div><a class="container_T1_I" href="https://github.com/ayaka14732" title="ztjhz"><img src="https://github.com/ayaka14732.png" height="60" width="60"><div>Ayaka</div></a><a class="container_T1_I" href="https://github.com/xjqx" title="ztjhz"><img src="https://github.com/xjqx.png" height="60" width="60"><div>Jing Qiang</div></a><p></p>]]></content>
        <author>
            <name>Jing Hua</name>
            <uri>https://github.com/ztjhz</uri>
        </author>
        <category label="hackathon" term="hackathon"/>
        <category label="nlp" term="nlp"/>
        <category label="cv" term="cv"/>
        <category label="deeplearning" term="deeplearning"/>
        <category label="ai" term="ai"/>
        <category label="whisper" term="whisper"/>
        <category label="computervision" term="computervision"/>
        <category label="speechrecognition" term="speechrecognition"/>
        <category label="objectdetection" term="objectdetection"/>
        <category label="machinetranslation" term="machinetranslation"/>
        <category label="yolov7" term="yolov7"/>
        <category label="bert" term="bert"/>
        <category label="react" term="react"/>
        <category label="flask" term="flask"/>
        <category label="tailwindcss" term="tailwindcss"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[TrAVis - Transformer Attention Visualiser]]></title>
        <id>travis</id>
        <link href="https://me.tjh.sg/blog/travis"/>
        <updated>2022-09-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How we created an in-browser BERT attention visualiser without a server]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="Result" src="/assets/images/result-d51c7e47c5c036d4ae94a4833adc2efe.png" width="1488" height="991" class="img_ev3q"></p><p>How we created an in-browser BERT attention visualiser without a server</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="what-is-this">What is this?<a class="hash-link" href="#what-is-this" title="Direct link to heading">​</a></h2><p>TrAVis is a Transformer Attention Visualiser. The idea of visualising the attention matrices is inspired by <a href="https://arxiv.org/abs/1409.0473" target="_blank" rel="noopener noreferrer">Neural Machine Translation by Jointly Learning to Align and Translate</a>.</p><p class="centered" style="height:500px"><img loading="lazy" alt="transformer" src="/assets/images/transformer-30b4218690677d30fe3b34e2b964f95b.png" class="img_ev3q"></p><p>The original paper of the Transformer model was named <a href="https://arxiv.org/abs/1706.03762" target="_blank" rel="noopener noreferrer">Attention Is All You Need</a>, demonstrating the centrality of the attention mechanism to <a href="https://huggingface.co/docs/transformers/model_summary" target="_blank" rel="noopener noreferrer">Transformer-based models</a>. These models generate attention matrices during the computation of the attention mechanism, which indicate how the models process the input data, and can therefore be seen as a concrete representation of the mechanism.</p><p>In the <a href="https://arxiv.org/abs/1810.04805" target="_blank" rel="noopener noreferrer">BERT</a> Base Uncased model, for example, there are 12 transformer layers, each layer contains 12 heads, and each head generates one attention matrix. TrAVis is the tool for visualising these attention matrices.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="why-is-it-important">Why is it important?<a class="hash-link" href="#why-is-it-important" title="Direct link to heading">​</a></h2><p>Despite the popularity of Transformer-based models, people often utilise them by just simply running the training scripts, ignoring what is going on inside the model. TrAVis helps us to better understand how Transformer-based models work internally, thus enabling us to better exploit them to solve our problems and, furthermore, giving us inspirations to make improvements to the model architecture.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="how-did-we-do-it">How did we do it?<a class="hash-link" href="#how-did-we-do-it" title="Direct link to heading">​</a></h2><p>The project consists of 4 parts.</p><p>Firstly, we <a href="https://github.com/ayaka14732/bart-base-jax" target="_blank" rel="noopener noreferrer">implemented</a> the <a href="https://arxiv.org/abs/1910.13461" target="_blank" rel="noopener noreferrer">BART</a> model from scratch using <a href="https://github.com/google/jax" target="_blank" rel="noopener noreferrer">JAX</a>. We chose JAX because it is an amazing deep learning framework that enables us to write clear source code, and it can be easily converted to NumPy, which can be executed in-browser. We chose the #BART model because it is a complete encoder-decoder model, so it can be easily adapted to other models, such as BERT, by simply taking a subset of the source code.</p><p><a href="https://github.com/ayaka14732/bart-base-jax" target="_blank" rel="noopener noreferrer"><img loading="lazy" alt="Bart Base Jax" width="500px" src="/assets/images/bart-base-jax-63d409385bb87c172fe59eb952ff2178.png" class="img_ev3q"></a></p><p>Secondly, we <a href="https://github.com/ztjhz/word-piece-tokenizer" target="_blank" rel="noopener noreferrer">implemented</a> the <a href="https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer" target="_blank" rel="noopener noreferrer">HuggingFace BERT Tokeniser</a> in pure Python, as it can be more easily executed in-browser. Moreover, we have optimised the tokenisation algorithm, which is faster than the original HuggingFace implementation.</p><p><a href="https://github.com/ztjhz/word-piece-tokenizer" target="_blank" rel="noopener noreferrer"><img loading="lazy" alt="Word Piece Tokenizer" width="500px" src="/assets/images/word-piece-tokenizer-2993a5e4efc62ffbf6a067dd14dcf2ff.png" class="img_ev3q"></a></p><p>Thirdly, we use <a href="https://pyodide.org" target="_blank" rel="noopener noreferrer">Pyodide</a> to run our Python code in browser. Pyodide supports all Python libraries implemented in pure Python, with <a href="https://pyodide.org/en/stable/usage/packages-in-pyodide.html" target="_blank" rel="noopener noreferrer">additional support</a> for a number of other libraries such as NumPy and SciPy.</p><p>Fourthly, we visualise the attention matrices in our web application using <a href="https://d3js.org/" target="_blank" rel="noopener noreferrer">d3.js</a>.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="result">Result<a class="hash-link" href="#result" title="Direct link to heading">​</a></h2><p>The result is that our Transformer model can run entirely in the browser without the need of a server.</p><p><img loading="lazy" alt="Result" src="/assets/images/result-d51c7e47c5c036d4ae94a4833adc2efe.png" width="1488" height="991" class="img_ev3q"></p><p>When users input sentences into our web application, the loaded model will generate the attention matrices of the sentences, which will then be visualised as a heatmap. Subsequently, users can select which Transformer layer and attention head to visualise, by utilising the range slider.</p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="source-code">Source code<a class="hash-link" href="#source-code" title="Direct link to heading">​</a></h2><p>The source code of our visualiser is published on GitHub</p><p><a href="https://github.com/ayaka14732/TrAVis" target="_blank" rel="noopener noreferrer"><img loading="lazy" alt="Word Piece Tokenizer" width="500px" src="/assets/images/TrAVis-f631d2f962545c4a4c2bfb263ffdf1c0.png" class="img_ev3q"></a></p><h2 class="anchor anchorWithHideOnScrollNavbar_WYt5" id="the-team">The team<a class="hash-link" href="#the-team" title="Direct link to heading">​</a></h2><p><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz"><img src="https://github.com/ztjhz.png" height="60" width="60"></a></p><div><a class="container_T1_I" href="https://github.com/ztjhz" title="ztjhz">Jing Hua</a></div><a class="container_T1_I" href="https://github.com/ayaka14732" title="ztjhz"><img src="https://github.com/ayaka14732.png" height="60" width="60"><div>Ayaka</div></a><p></p>]]></content>
        <author>
            <name>Jing Hua</name>
            <uri>https://github.com/ztjhz</uri>
        </author>
        <author>
            <name>Ayaka</name>
            <uri>https://github.com/ayaka14732</uri>
        </author>
        <category label="nlp" term="nlp"/>
        <category label="ai" term="ai"/>
        <category label="deeplearning" term="deeplearning"/>
        <category label="machinelearning" term="machinelearning"/>
        <category label="pyodide" term="pyodide"/>
        <category label="d3.js" term="d3.js"/>
        <category label="bert" term="bert"/>
        <category label="huggingface" term="huggingface"/>
        <category label="tokeniser" term="tokeniser"/>
        <category label="jax" term="jax"/>
        <category label="transformer" term="transformer"/>
    </entry>
</feed>