Jing Hua Blog

BetterChatGPT - An amazing UI for OpenAI's ChatGPT

2023-04-10T00:00:00.000Z

Better ChatGPT is the ultimate destination for anyone who wants to experience the limitless power of conversational AI. With no limits and completely free to use for all, our app harnesses the full potential of OpenAI's ChatGPT API to offer you an unparalleled chatbot experience.

ZilKin - The Future of Zilliqa Smart Contracts

2023-01-10T00:00:00.000Z

My Journey in Paving the Way for a New Era of Innovation in the World of Zilliqa

info

ZilKin is a Scilla Contracts Deployment Tool that empowers you to deploy your Scilla contracts effortlessly via our a) Contracts Wizard (Interactive Code Generator) or b) Automatic Deployment Contract.

Over the 2022 Winter (Dec 2022 - Jan 2023), I was beyond thrilled to have been handpicked as one of the elite 20 undergraduate students from the APAC region to participate in the highly acclaimed ZilHive Student Practicum. This program was an absolute game-changer for me, providing me with unparalleled opportunities to immerse myself in the world of cutting-edge blockchain and Web3 development.

The Spark of Inspiration

As a developer, I am driven by the burning passion to craft solutions that not only simplify the lives of my peers but also reshape the world as we know it. That's why, when I was privileged to be a part of the infrastructure track at the ZilHive Student Practicum, I knew that this was a chance to leave an indelible mark on the developer community. My objective was to conceive a tool that would make the process of deploying Scilla smart contracts a breeze, and accessible for developers of all skill levels.

The inspiration for this idea came to me when I realized that Scilla, a powerful and versatile language, was often underutilized due to its complexity and lack of resources. I realized that if I could create a tool that would make it easy for developers to deploy Scilla smart contracts, it would have a transformative impact on the developer community.

Building a Masterpiece

To bring this idea to life, my team and I decided to harness the power of React, tailwindcss, and Scilla. React, a widely adopted JavaScript library, was chosen for its user-friendly interface, while tailwindcss, a CSS framework, was used to infuse elegance and aesthetic appeal to the tool. Scilla, a smart contract language, was chosen for its powerful features and versatility. However, to fully grasp Scilla and create a successful solution, I had to immerse myself in the language, learning about smart contracts and the intricacies of the Scilla language. It was a challenging but enlightening journey.

Unveiling Zilkin: A Revolutionary Scilla Smart Contracts Deployment Tool

After a journey of tireless dedication and relentless innovation, my team and I stand before you, beaming with pride as we unveil Zilkin - a revolutionary tool that shatters the barriers of Scilla smart contract deployment. With our state-of-the-art Contracts Wizard (Interactive Code Generator) and Automatic Contract Deployment feature, we have empowered developers to effortlessly weave their ideas into the fabric of the Zilliqa ecosystem with unprecedented ease and speed.

Unveiling the Future of Scilla

As I stood before the esteemed mentors and fellow developers of ZilHive, my team and I knew we were presenting something truly special. When we unveiled our solution, a tool of unparalleled elegance and simplicity, it was met with gasps of wonder and applause. The user-friendly interface, intuitive design, and ease of use had captivated the audience, and they could see the potential impact it could have on the developer community. With its revolutionary capabilities, this tool would change the game, ignite a new era of development, and empower a new generation of developers to bring their wildest dreams to life. Zilkin is not just a tool, but a bridge to a brighter future, where the complexities of Scilla are no longer a barrier, and the true potential of the Zilliqa ecosystem can be fully realized!

A New Era of Scilla

I am proud that our innovation has become a beacon of hope, a shining light that illuminates the path to a brighter future for Scilla. It is a revolutionary tool that will make the deployment of Scilla smart contracts as effortless as a gentle breeze on a warm summer's day. It will empower developers, both the seasoned veterans and the eager newcomers, to harness the full potential of Scilla's power and versatility. With this solution, the horizon for Scilla is limitless and the possibilities are endless. I am honored to be a part of this transformative change that will shape the developer community for years to come and pave the way for a new era of innovation in the world of Scilla.

Mentored by the Masters

With gratitude in my heart, I extend my deepest thanks to the esteemed mentors of Zilliqa - Kevin Meyer, Tom Fleetham, Elliott Green, and Bradley Laws - for their unwavering guidance and support. My journey would not have been possible without their invaluable wisdom and expertise. I am also forever grateful to the brilliant trainer, Wei-Meng Lee, for imparting his knowledge and skills to me and helping me to reach new heights. Their mentorship has been a true blessing, and I am eternally grateful for the opportunity to learn from such esteemed individuals in the industry.

The Bridge to a Brighter Future

As I look back on the journey that led me to the creation of Zilkin, I am filled with a sense of wonder and gratitude. The road was long, and the challenges were many, but the rewards have been immeasurable. I have not only developed a tool that will change the way Scilla smart contracts are deployed, but I have also grown as a developer and as a person.

Through the mentorship and guidance of the esteemed Zilliqa team, I have been able to tap into my full potential and create something truly special. The future is bright for Scilla and the developer community, and I am honored to have played a small part in that.

As the future unfolds, I can't wait to see the impact Zilkin will have and the doors it will open for developers everywhere. This is just the beginning of a new era, where the power of Scilla will be accessible to all and the true potential of the Zilliqa ecosystem will be fully realized. Thank you for joining me on this journey, and I look forward to seeing where Zilkin will take us next.

The team

Jing Hua

Jing Qiang

ByteVid - Deep Learning Hackathon 1st Place

2022-10-03T00:00:00.000Z

Say goodbye to long and boring videos! 👋

info

Powered by the cutting-edge deep learning technologies in 2022, ByteVid transforms long, boring videos into fun byte-sized content. Be it a one hour long lecture, or a 30-minute zoom meeting, ByteVid can transcribe, summarise the content, extract keywords, detect and extract important slides from the video, and translate into other languages.

Over the weekend (1 Oct 2022 - 3 Oct 2022), our team comprising of Jing Hua, Jing Qiang, and Ayaka, participated in the MLDA Deep Learning Week Hackathon 2022. We worked tirelessly through exhausting days and sleepless nights, to come up with ByteVid in 48 hours. We went through 2 rounds of pitching and eventually came out on top and finished 1st out of 120 teams 🥇. Here is how we did it...

Inspiration

When we first encounter the topic of ‘AI and Smart Nation’, we were extremely excited as there were tons of areas that we could explore. Mobility, healthcare, media and entertainment, agriculture, social, sustainability, etc. Hours and hours of time were spent on finding something that intrigue us, but we were unsuccessful. How can we balance our skills and aspirations with the problem we want to work on?

The idea struck upon us as we went back to our roots as ‘students’. With Singapore’s effort in promoting Smart Nation, online recorded lectures and meetings are becoming increasingly prevalent. We struggled with online recorded lectures and meetings because it was difficult to understand what the other person was saying. Some spoke in non-native languages. Some spoke with abnormal accents. While some don’t even provide slides for us students to follow the lecture with. We struggled even more given that an average audience attention span is about 7 minutes, and that current Zoom transcription feature is not so accurate. We could not understand the video content properly and with comfort 😥

Therefore, we were inspired and motivated to build a project that will help us, and others extract video information efficiently!

The plan

We got to the drawing board and planned for hours on what features we needed and how we could get there. This is what we came up with:

We first obtain the video file from the user, either directly or through a YouTube link. If the user supplies a YouTube Link, we utilise youtube-dl to download the video on to our server.
We then use ffmepg to compress and speed up the video by 1.6x to optimise the performance of the speech recognition model.
The compressed video is then passed into Whisper, a state-of-the-art (SOTA) speech recognition model, generating a transcript.
Sentences are extracted from the transcript using the BlingFire model to generate an article. At the same time, the transcript is translated into various languages using Baidu Translate API.
The article is then passed into the KBIR-inspec model, which extracts key phrases, and the Bert Extractive Summarizer, which generates a summary.
With the summary and the timestamps from the transcript, we utilise OpenCV to extract images from the video of each sentence in the summary.
With the images, we pass it into our YOLOv7, a SOTA object detection model that we fine-tuned with manually labelled data, to generate slide images.

Seeing it in action

Speech recognition and natural language processing

This is how our speech recognition and natural language processing looks like in action:

Using 4 different deep learning models and a translation API, the video is transcribed, translated, and transformed into an article, summarised, and extracted into key phrases.

Slides extraction using computer vision

This is how our slides extraction using computer vision looks like in action:

As you can see, the slides have been extracted from the video.

info

In fact, our slides detection model is even better than existing solution!

To achieve such an amazing result, we had to fine-tune the YOLOv7 model for lecture slides detection. To do that, we downloaded a diverse dataset of 200 lecture videos, ranging from computer science lectures, to business seminars, and to zoom meetings. We then manually labelled each and everyone of them using using a labelling software. Subsequently, we trained our own lecture slides detection model on our GPU server, and achieved fantastic results!

Bringing ByteVid to the user

Frontend

To make our solution easily accessible to the user, we created a web application (built with React.js and Tailwindcss, and deployed on GitHub pages) as well as a browser extension:

The web application features an intuitive user interface, where users are allowed to choose to submit a YouTube link or upload a video file. The language of the video and the translation language of the transcript can also be customised to their liking.

Backend

Our backend utilises a Flask server, which is deployed to a GPU machine. Since our GPU machine has no Internet access, we set up a relay server with autossh port forwarding to relay our GPU server port to an Internet-facing VPS. We then utilised Nginx reverse proxy to intergrate our GPU server to our existing web service API. We also utilised Cloudflare for site protection.

The result

Here is a demo of our live website:

Source code

The benefits

ByteVid comes with several benefits:

It works for both business zoom meetings and recorded school lectures, saving time and energy in watching long videos.
Our translation can overcome language and accent barriers, allowing businesses to enhance their overseas inter-racial collaboration, and students to learn from any lectures of any languages.
Our extracted summary and slides serve as a template for both business executives and students to build notes upon.

Deep learning models used

Whisper: SOTA speech recognition (Sep 2022)
YOLOv7: SOTA object detection (Jul 2022)
KBIR-inspec: key phrase extraction (Dec 2021)
Bert Extractive Summarizer: summarisation (Jun 2019)
BlingFire: sentence extraction
Baidu Translate API: translation

Reflection

Accomplishments that we’re proud of

Building and deploying a fully functional AI product in less than 2 days
Our products are a combination of three exciting fields of AI: computer vision, natural language processing and speech processing
We build our own lecture slides dataset and CV model that is better than existing solutions

Challenges

There is no existing solution for lecture slides detection - we manually labelled hundreds of videos and images to train our own lecture slides detection model.
Our GPU server has no Internet access - we set up a relay server with autossh port forwarding.
The ffmpeg commands were complicated - when we finally succeeded in demystifying them, we feel a sense of achievement.
The speech recognition model is relatively slow - we noticed that professors usually speak slowly, so we optimised the performance by speeding up lecture videos by 1.6x before passing them into the speech recognition model.
We used up our Baidu translation API free quota during testing - we paid S$10 to buy extra quota.
The Baidu translation API has a rate limit - we split paragraph into chunks of sentences and request at a moderate speed.
There is no simple method to split paragraphs into sentences (e.g. 3.14 will become two sentences when split by periods) - we utilised the BlingFire model to solve this problem.

What we learned

Deploying deep learning models on cloud server
Speech knowledge for speech recognition and video transcription
NLP knowledge for machine translation, summarisation and keyword extraction
CV knowledge for object detection and lecture slide extraction
Developing a browser extension

What’s next for ByteVid

Auto-navigation to certain timestamps in videos based on keywords
Increase support for other URLs other than YouTube
Implement a Telegram bot
Implement a mobile application

Hackathon Submission

Devpost

References

[1] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” 2022.

[2] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022.

[3] M. Kulkarni, D. Mahata, R. Arora, and R. Bhowmik, “Learning rich representation of keyphrases from text,” arXiv preprint arXiv:2112.08547, 2021.

[4] D. Miller, “Leveraging BERT for extractive text summarization on lectures,” arXiv preprint arXiv:1906.04165, 2019.

The team

Jing Hua

Ayaka

Jing Qiang

TrAVis - Transformer Attention Visualiser

2022-09-28T00:00:00.000Z

How we created an in-browser BERT attention visualiser without a server

What is this?

TrAVis is a Transformer Attention Visualiser. The idea of visualising the attention matrices is inspired by Neural Machine Translation by Jointly Learning to Align and Translate.

The original paper of the Transformer model was named Attention Is All You Need, demonstrating the centrality of the attention mechanism to Transformer-based models. These models generate attention matrices during the computation of the attention mechanism, which indicate how the models process the input data, and can therefore be seen as a concrete representation of the mechanism.

In the BERT Base Uncased model, for example, there are 12 transformer layers, each layer contains 12 heads, and each head generates one attention matrix. TrAVis is the tool for visualising these attention matrices.

Why is it important?

Despite the popularity of Transformer-based models, people often utilise them by just simply running the training scripts, ignoring what is going on inside the model. TrAVis helps us to better understand how Transformer-based models work internally, thus enabling us to better exploit them to solve our problems and, furthermore, giving us inspirations to make improvements to the model architecture.

How did we do it?

The project consists of 4 parts.

Firstly, we implemented the BART model from scratch using JAX. We chose JAX because it is an amazing deep learning framework that enables us to write clear source code, and it can be easily converted to NumPy, which can be executed in-browser. We chose the #BART model because it is a complete encoder-decoder model, so it can be easily adapted to other models, such as BERT, by simply taking a subset of the source code.

Secondly, we implemented the HuggingFace BERT Tokeniser in pure Python, as it can be more easily executed in-browser. Moreover, we have optimised the tokenisation algorithm, which is faster than the original HuggingFace implementation.

Thirdly, we use Pyodide to run our Python code in browser. Pyodide supports all Python libraries implemented in pure Python, with additional support for a number of other libraries such as NumPy and SciPy.

Fourthly, we visualise the attention matrices in our web application using d3.js.

Result

The result is that our Transformer model can run entirely in the browser without the need of a server.

When users input sentences into our web application, the loaded model will generate the attention matrices of the sentences, which will then be visualised as a heatmap. Subsequently, users can select which Transformer layer and attention head to visualise, by utilising the range slider.

Source code

The source code of our visualiser is published on GitHub

The team

Jing Hua

Ayaka

Jing Hua Blog

BetterChatGPT - An amazing UI for OpenAI's ChatGPT

ZilKin - The Future of Zilliqa Smart Contracts

The Spark of Inspiration​

Building a Masterpiece​

Unveiling Zilkin: A Revolutionary Scilla Smart Contracts Deployment Tool​

Unveiling the Future of Scilla​

A New Era of Scilla​

Mentored by the Masters​

The Bridge to a Brighter Future​

Links​

The team​

ByteVid - Deep Learning Hackathon 1st Place

Say goodbye to long and boring videos! 👋​

Inspiration​

The plan​

Seeing it in action​

Speech recognition and natural language processing​

Slides extraction using computer vision​

Bringing ByteVid to the user​

Frontend​

Backend​

The result​

Source code​

The benefits​

Deep learning models used​

Reflection​

Accomplishments that we’re proud of​

Challenges​

What we learned​

What’s next for ByteVid​

Hackathon Submission​

References​

The team​

TrAVis - Transformer Attention Visualiser

What is this?​

Why is it important?​

How did we do it?​

Result​

Source code​

The team​

The Spark of Inspiration

Building a Masterpiece

Unveiling Zilkin: A Revolutionary Scilla Smart Contracts Deployment Tool

Unveiling the Future of Scilla

A New Era of Scilla

Mentored by the Masters

The Bridge to a Brighter Future

Links

The team

Say goodbye to long and boring videos! 👋

Inspiration

The plan

Seeing it in action

Speech recognition and natural language processing

Slides extraction using computer vision

Bringing ByteVid to the user

Frontend

Backend

The result

Source code

The benefits

Deep learning models used

Reflection

Accomplishments that we’re proud of

Challenges

What we learned

What’s next for ByteVid

Hackathon Submission

References

The team

What is this?

Why is it important?

How did we do it?

Result

Source code

The team