[go: up one dir, main page]

CN119815266A - A method, device, equipment and medium for playing sound and picture in the same position - Google Patents

A method, device, equipment and medium for playing sound and picture in the same position Download PDF

Info

Publication number
CN119815266A
CN119815266A CN202411772963.3A CN202411772963A CN119815266A CN 119815266 A CN119815266 A CN 119815266A CN 202411772963 A CN202411772963 A CN 202411772963A CN 119815266 A CN119815266 A CN 119815266A
Authority
CN
China
Prior art keywords
sound
image
sound source
playing
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411772963.3A
Other languages
Chinese (zh)
Inventor
王会军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Cloud Technology Co Ltd
Original Assignee
China Telecom Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Cloud Technology Co Ltd filed Critical China Telecom Cloud Technology Co Ltd
Priority to CN202411772963.3A priority Critical patent/CN119815266A/en
Publication of CN119815266A publication Critical patent/CN119815266A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stereophonic System (AREA)

Abstract

本发明涉及一种声画同位的播放方法、装置、设备和介质,属于音视频信号处理技术领域,其方法包括:将屏幕从中心向边缘按照长度由短到长划分为多个声像区域;获取屏幕中第一声源图像的第一位置信息;根据所述第一位置信息,确定出所述第一声源图像在屏幕中的所属第一声像区域;根据所述第一声像区域获得第一目标方位;在所述第一目标方位上生成所述第一声源图像的虚拟声像,通过所述虚拟声像进行声音播放。本发明方案通过将屏幕划分出有限数量的固定分区区域,判断声源图像在哪个区域显示并将虚拟声像的目标生成方位设为该区域几何中点,实现声画同位。

The present invention relates to a method, device, equipment and medium for playing sound and picture in the same position, belonging to the field of audio and video signal processing technology, and the method comprises: dividing a screen from the center to the edge according to the length from short to long into a plurality of sound and picture areas; obtaining first position information of a first sound source image in the screen; determining the first sound and picture area to which the first sound source image belongs in the screen according to the first position information; obtaining a first target orientation according to the first sound and picture area; generating a virtual sound and picture of the first sound source image at the first target orientation, and playing sound through the virtual sound and picture. The scheme of the present invention realizes the same position of sound and picture by dividing the screen into a limited number of fixed partition areas, determining in which area the sound source image is displayed, and setting the target generation orientation of the virtual sound and picture to the geometric midpoint of the area.

Description

Sound and picture co-located playing method, device, equipment and medium
Technical Field
The present invention relates to the field of audio and video signal processing, and in particular, to a method, apparatus, device, and medium for playing audio and video signals in a co-located manner.
Background
The immersive video and audio technology is well-developed in the video industry, for example, in movie theatres, a viewer can generate an immersive experience that a train flies from the top of the head by calling and howling beside the viewer through a virtual sound image positioning technology and matching with 3D video. However, in the fields of video conferences, remote lectures, video lectures and the like, the current products still far from reach the field feeling which is satisfactory to users.
Some of the methods are to display an image of a certain sound source on a fixed screen, the screen is bound with a single loudspeaker near the screen, when the sound source displayed on the screen is confirmed to sound, the loudspeaker corresponding to the screen sounds, so that a user in front of the screen can generate the sense of co-located sound and picture, but the method has a large sound and picture deflection angle, and especially when the screen is large, a viewer still generates the sense of sound and picture separation.
Disclosure of Invention
In view of this, it is necessary to provide a method, apparatus, device and medium for playing co-located sound and picture, which are used to solve the problem that when the sound source displayed on the screen is confirmed to sound, the corresponding speaker sounds, and the sound and picture deviation angle is large, and the viewer can recognize that the sound and picture are separated.
In order to solve the above-mentioned problems, the present invention provides a playback method for audio-visual co-location, comprising:
dividing the screen into a plurality of sound image areas from the center to the edge according to the length from short to long;
acquiring first position information of a first sound source image in a screen;
Determining a first sound image area of the first sound source image in a screen according to the first position information;
obtaining a first target azimuth according to the first sound image area;
and generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
Optionally, generating a virtual sound image of the first sound source image in the first target azimuth, and after playing sound by the virtual sound image, further includes:
And detecting the position information of the first sound source image in real time, and playing sound through the virtual sound image in the first target azimuth when the first sound source image changes from a first position to second position information belonging to the first sound image area.
Optionally, generating a virtual sound image of the first sound source image in the first target azimuth, and after playing sound by the virtual sound image, further includes:
Detecting the position information of the first sound source image in real time, and when the first sound source image is changed from the first position information to third position information belonging to a second sound image area;
obtaining a second target azimuth according to the second acoustic area;
And moving the virtual sound image from the first target position to the second target position, and playing sound through the virtual sound image at the second target position.
Optionally, the detecting, in real time, the position information of the first sound source image specifically includes:
determining detection frequency according to the response time of the audience to the picture and the delay time of the audio;
and detecting the position information of the first sound source image according to the detection frequency.
Optionally, the obtaining a first target azimuth according to the first sound image area specifically includes:
and determining a geometric midpoint of the first sound image area, and determining the geometric midpoint as a first target position.
Optionally, the dividing the screen from the center to the edge into a plurality of sound image areas from short to long according to the length specifically includes:
The screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, M and N are respectively odd numbers, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge.
Optionally, the acquiring the first position information of the first sound source image in the screen specifically includes:
when the first sound source image is a human or animal, the first position information is a mouth key point;
When the first sound source image is an object, the first position information is a lateral geometric center of the object.
The invention also provides a sound and picture co-located playing device, which comprises a region dividing module, a sound source position obtaining module, a judging module, a target azimuth calculating module and a virtual sound image generating and playing module;
the region dividing module is used for dividing the screen into a plurality of sound image regions from the center to the edge according to the length from short to long;
the sound source position acquisition module is used for acquiring first position information of a first sound source image in a screen;
the judging module is used for determining a first sound image area of the first sound source image in a screen according to the first position information;
the target azimuth calculation module is used for obtaining a first target azimuth according to the first sound image area;
The virtual sound image generation and playing module is used for generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
Another technical solution of the present invention for solving the above technical problems is an electronic device, including a memory and a processor, wherein,
The memory is used for storing programs;
the processor is coupled to the memory, and is configured to execute the program stored in the memory, so as to implement the steps in the playing method of the sound and picture parity according to any of the above schemes.
Another technical solution of the present invention to solve the above technical problems is a computer readable storage medium storing a computer readable program or instructions, which when executed by a processor, can implement the steps in a playback method for sound and picture parity according to any of the above technical solutions.
By applying the technical scheme provided by the embodiment of the application, the cost of hardware and algorithm is reduced by dividing the screen into a limited number of fixed partition areas. The target position of the virtual sound image is determined by determining the sound image area of the sound source image in the screen, the virtual sound image is played at the target position to achieve the sound picture apposition effect of the sound source image, the position of the virtual sound image is adjusted to the image position of the sound source on the screen, and the audience perceives the sound to come from the corresponding position of the sound source image on the screen to achieve the sound picture apposition effect.
Drawings
FIG. 1 is a flowchart illustrating a method for playing audio and video co-ordinates according to an embodiment of the present invention;
FIG. 2 is a schematic view of sound source image coordinates and corresponding virtual sound image target orientations provided in an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for playing audio and video co-ordinates according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for playing audio and video co-ordinates according to another embodiment of the present invention;
fig. 5 is a flowchart of steps for detecting position information of a first sound source image provided in an embodiment of the present invention;
fig. 6 is a schematic view of a horizontal plane projection when m= 5,N =1 provided in the embodiment of the present invention;
FIG. 7 is a flowchart illustrating steps of a method for audio-visual co-location according to an embodiment of the present invention
Fig. 8 is a block diagram of a playback apparatus for audio-visual co-location according to an embodiment of the present invention;
Fig. 9 is a block diagram of a hardware structure of an electronic device according to embodiments of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention will be readily apparent, a more particular description of embodiments of the invention will be rendered by reference to the appended drawings, which are illustrated in the appended drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a step flow chart of a playing method of sound and picture co-location is shown, and the method can be applied to audio and video signal processing, as shown in fig. 1, and specifically can include the following steps:
S101, dividing a screen into a plurality of sound image areas from the center to the edge according to the length from short to long;
In this embodiment, step S101 may include non-uniformly dividing the screen display area from short to long into a number of areas with dense middle and sparse edges.
The screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, wherein M and N are limited to be more than or equal to 10 and more than or equal to 3 and more than or equal to N and more than or equal to 1;M and N are respectively odd numbers, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge. The sound image area may be rectangular.
Step S102, acquiring first position information of a first sound source image in a screen;
In this embodiment, referring to fig. 2, a schematic diagram of coordinates of a sound source image and a corresponding target azimuth of a virtual sound image are shown, where the first sound source image may be a girl image, and the first position information may be a center position of a mouth of the girl image.
Step S103, determining a first sound image area of the first sound source image in a screen according to the first position information;
In the present embodiment, referring to fig. 2, a schematic diagram showing sound source image coordinates and corresponding virtual sound image target orientations is shown, and the screen is divided into 5 areas, virtual sound image area 1, virtual sound image area 2, virtual sound image area 3, virtual sound image area 4, and virtual sound image area 5, in order from left to right. The center position of the mouth of the child image is located in the virtual sound image area 4, and the first sound image area is the virtual sound image area 4 in the figure.
In this embodiment, referring to fig. 2, which shows a schematic diagram of the coordinates of the sound source image and the corresponding virtual target azimuth of the virtual sound image, the geometric center of the virtual sound image area 4 may be determined as the first target azimuth.
Step S105, generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound by the virtual sound image.
In this embodiment, the virtual sound image and the virtual sound image localization may be that after sound emitted by the plurality of speakers propagates to the ears, the listener perceives the existence of some sound sources that are not in the actual positions of the speakers. These sources are themselves "phantom" by the listener's auditory system, and are therefore referred to as virtual sound images, or phantom sources, where there is no room for the listener. The output signals of the plurality of loudspeakers are adjusted by using amplitude translation, time delay adjustment or a mixing method, so that a listener can perceive the target azimuth of the virtual sound image, and the corresponding technologies include VBAP (amplitude translation based on vectors), DBAP (amplitude translation based on distances) and the like, and the technology can also be called virtual sound image generation.
In this embodiment, virtual sound image generation may be performed in the target azimuth by using a virtual sound image positioning algorithm such as VBAP and DBAP according to speaker configuration of the device where the screen is located. An example of m= 5,N =1 is shown in fig. 2. The virtual sound image generating method can include calculating different volumes of the speakers according to included angles from target positions of the speakers and the virtual sound image to target listening points, converting the volumes into gains, and then processing audio signals to be played.
In one scenario, the sound-to-picture co-location may be that when video is played through a screen, the virtual sound image location is adjusted to the image location of the sound source on the screen by the virtual sound image location technique, so that the viewer perceives that the sound comes from the corresponding sound source image location on the screen.
In an example, referring to fig. 3, a flowchart illustrating steps of a method for playing a sound and picture co-location is shown, as shown in fig. 3, specifically the method may include the following steps:
Step S301, dividing a screen into a plurality of sound image areas from the center to the edge according to the length from short to long;
The screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, wherein M and N are limited to be more than or equal to 10 and more than or equal to 3 and more than or equal to N and more than or equal to 1;M and N are respectively odd numbers, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge. The sound image area may be rectangular.
Step S302, acquiring first position information of a first sound source image in a screen;
In this embodiment, referring to fig. 2, a schematic diagram of coordinates of a sound source image and a corresponding target azimuth of a virtual sound image are shown, where the first sound source image may be a girl image, and the first position information may be a center position of a mouth of the girl image.
Step S303, determining a first sound image area of the first sound source image in a screen according to the first position information;
In the present embodiment, referring to fig. 2, a schematic diagram showing sound source image coordinates and corresponding virtual sound image target orientations is shown, and the screen is divided into 5 areas, virtual sound image area 1, virtual sound image area 2, virtual sound image area 3, virtual sound image area 4, and virtual sound image area 5, in order from left to right. The center position of the mouth of the child image is located in the virtual sound image area 4, and the first sound image area is the virtual sound image area 4 in the figure.
In this embodiment, referring to fig. 2, which shows a schematic diagram of the coordinates of the sound source image and the corresponding virtual target azimuth of the virtual sound image, the geometric center of the virtual sound image area 4 may be determined as the first target azimuth.
Step S305, generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image;
Step S306, detecting the position information of the first sound source image in real time, and playing sound through the virtual sound image in the first target azimuth when the first sound source image changes from the first position to the second position information belonging to the first sound image area.
In the present embodiment, the first sound source image is changed from the first position to the second position information belonging to the first sound image area, and it may be that the child image moves in the virtual sound image area 4 as shown in fig. 2, but the virtual sound image area 4 is not displayed, so that the first target azimuth is unchanged.
In an example, referring to fig. 4, a flowchart illustrating steps of a method for playing a sound and picture co-location is shown, as shown in fig. 4, specifically the method may include the following steps:
S401, dividing a screen into a plurality of sound image areas from the center to the edge according to the length from short to long;
In this embodiment, step S401 may include non-uniformly dividing the screen display area from short to long into a number of areas with dense middle and sparse edges.
The screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, wherein M and N are limited to be more than or equal to 10 and more than or equal to 3 and more than or equal to N and more than or equal to 1;M and N are respectively odd numbers, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge. The sound image area may be rectangular.
Step S402, acquiring first position information of a first sound source image in a screen;
In this embodiment, referring to fig. 2, a schematic diagram of coordinates of a sound source image and a corresponding target azimuth of a virtual sound image are shown, where the first sound source image may be a girl image, and the first position information may be a center position of a mouth of the girl image.
Step S403, determining a first sound image area of the first sound source image in a screen according to the first position information;
In the present embodiment, referring to fig. 2, a schematic diagram showing sound source image coordinates and corresponding virtual sound image target orientations is shown, and the screen is divided into 5 areas, virtual sound image area 1, virtual sound image area 2, virtual sound image area 3, virtual sound image area 4, and virtual sound image area 5, in order from left to right. The center position of the mouth of the child image is located in the virtual sound image area 4, and the first sound image area is the virtual sound image area 4 in the figure.
In this embodiment, referring to fig. 2, which shows a schematic diagram of the coordinates of the sound source image and the corresponding virtual target azimuth of the virtual sound image, the geometric center of the virtual sound image area 4 may be determined as the first target azimuth.
Step S405, generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image;
In this embodiment, virtual sound image generation may be performed in the target azimuth by using a virtual sound image positioning algorithm such as VBAP and DBAP according to speaker configuration of the device where the screen is located. An example of m= 5,N =1 is shown in fig. 2. The virtual sound image generating method can include calculating different volumes of the speakers according to included angles from target positions of the speakers and the virtual sound image to target listening points, converting the volumes into gains, and then processing audio signals to be played.
Step S406, detecting the position information of the first sound source image in real time, when the first sound source image is changed from the first position information to third position information belonging to a second sound image area;
step S407, obtaining a second target azimuth according to the second acoustic area;
Step S408, moving the virtual sound image from the first target azimuth to the second target azimuth, and playing sound through the virtual sound image at the second target azimuth.
In this embodiment, the first sound source image is changed from the first position information to the third position information belonging to the second acoustic image region, and may be, as shown in fig. 2, for example, that the talking child image moves out of the virtual acoustic image region 4 and moves into the virtual acoustic image region 5, and then the virtual acoustic image region 5 is the second acoustic image region. Also by way of example, when the sound producing unit of the talking girl is transferred to the drum with the lateral dimension of the drum centered in the virtual sound image area 1, the virtual sound image area 1 is the second sound image area, and the second sound source image is the drum.
In one example, referring to fig. 5, a flowchart illustrating a step of detecting position information of a first sound source image, as shown in fig. 5, may specifically include the steps of:
Step S501, determining detection frequency according to the response time of the audience to the picture and the delay time of the audio;
Step S502, detecting the position information of the first sound source image according to the detection frequency.
In a scene, according to the related literature, a common audience can perceive a sound change of 80ms of a lag picture, the delay of a spatial audio renderer in the field of headphones is often less than 60ms, and the frequency of updating the coordinates of a sound source image by the video signal processing module is set to be greater than 20Hz, preferably 30Hz.
In one example, calculating the first target bearing may include the steps of:
and determining a geometric midpoint of the first sound image area, and determining the geometric midpoint as a first target position.
In the present embodiment, referring to fig. 2, a schematic diagram of sound source image coordinates and corresponding virtual sound image target orientations is shown, specifically, a screen is divided into 5 areas, namely, a virtual sound image area 1, a virtual sound image area 2, a virtual sound image area 3, a virtual sound image area 4 and a virtual sound image area 5, in this order from left to right. The center position of the mouth of the child image is located in the virtual sound image area 4, and the first sound image area is the virtual sound image area 4 in the figure. The geometric center of the virtual sound image area 4, i.e. the intersection of the diagonals of the rectangular virtual sound image area 4.
In one example, the screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, M and N are respectively odd, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge. The sound image area may be rectangular.
In one scenario, the human auditory system has the highest accuracy of positioning sound images in the horizontal direction in front and rapidly attenuates from the accuracy of positioning in the middle to the two sides, and the related literature shows that the accuracy of positioning in the near-to-speaker approach is about 20% D when the subject perceives virtual sound images in the middle in the horizontal dimension as about 7% D when listening indoors. The number of virtual sound images in the horizontal direction that can be perceived by the human auditory system at the sweet spot is 7-10. The human auditory system is insensitive to sound image movements in elevation, sometimes resulting in 60 ° angular displacement at the subject to be perceived.
By way of example, the screen display area is divided into areas from the center to the two sides in a horizontal and vertical mode from small to large and non-uniform mode, so that M areas with M horizontal areas, N areas with dense middle areas and sparse edges in the longitudinal directions are formed, and M and N are limited to be 10, M, N, and M, N are more than or equal to 3, and N is more than or equal to 1.
As another example, when the complexity of the algorithm and the perceived accuracy of the viewer are combined, and the screen with the lateral width L is set, and the area number is 1~M from left to right, the width d of the center-most area is less than or equal to 10% by L, that is:
when M is an odd number, the width d of the (M+1)/2-th area is less than or equal to 10% L;
When M is even, the width d of the M/2 th area and the width d of the (M/2+1) th area are equal and less than or equal to 10% L;
The width of the other areas increases from the center to the two sides according to the increasing trend of the level difference. Preferably, M and N are odd in value in order to form the difference in sound image positions in the middle and on both sides.
In one example, referring to fig. 6, a schematic view of a horizontal projection is shown, and when m= 5,N =1, the horizontal width of each area is 30%, 15%, 10%, 15%, 30% of the screen width as shown in fig. 6.
In one example, the acquiring the first location information may specifically include the following steps:
when the first sound source image is a human or animal, the first position information is a mouth key point;
When the first sound source image is an object, the first position information is a lateral geometric center of the object.
In one example, referring to FIG. 7, a flow chart of steps of a method for co-location of sound and painting is shown, as shown in FIG. 7, specifically including the steps of non-uniformly dividing a screen from center to edge into a plurality of areas from short to long;
In the embodiment, a coordinate system is established for a display area of the screen, and the coordinate range of each area is confirmed in the display coordinate system of the screen corresponding to the subareas, and the coordinate of the geometric center in the area is confirmed;
when a screen displays a video image, the video signal processing module gives coordinate points of a sound source.
For the speaker's image, the mouth center point coordinates are given;
For a non-human sound source, the coordinates of the midpoint of its image transverse dimension are given.
For example, a Cartesian coordinate system may be established with the lower left corner of the screen as the origin.
Judging which region the sound source image is in, and taking the geometric midpoint of the current region as a target azimuth;
In the embodiment, virtual sound image generation can be performed at the target azimuth by using virtual sound image positioning algorithms such as VBAP, DBAP and the like according to the speaker configuration of the device where the screen is positioned. An example of m= 5,N =1 is shown in fig. 2. The virtual sound image generating method can include calculating different volumes of the speakers according to included angles from target positions of the speakers and the virtual sound image to target listening points, converting the volumes into gains, and then processing audio signals to be played.
Refreshing coordinates of a sound source image on a screen;
And judging whether the sound source image leaves the previous area, if so, refreshing coordinates of the sound source image on the screen by taking the geometric midpoint of the new area as a target azimuth.
If not, taking the geometric midpoint of the current area as a target azimuth, generating a virtual sound image, and refreshing the coordinates of the sound source image on the screen.
In this embodiment, the video signal processing module dynamically updates the coordinates of the sound source image at a fixed frequency, and the audio signal processing module determines whether to update the target generation azimuth of the virtual sound image according to the coordinates.
When the displacement of the sound source image on the screen is larger, namely the sound source image leaves the current mapping area, sound-image separation can be generated, at the moment, the audio signal processing module can carry out virtual sound image generation again according to the new coordinates of the sound source image, and otherwise, the position of the virtual sound image does not need to be adjusted.
Specifically, according to the related literature, the common audience can perceive the sound change of the lag picture of 80ms, the delay of the spatial audio renderer in the earphone field is often less than 60ms, and the frequency of updating the sound source image coordinates by the video signal processing module is set to be greater than 20Hz, and preferably can be 30Hz.
When the embodiment of the application is applied to the video playing of the lecture, the screen display area is unevenly divided into a plurality of rectangular areas with dense middle and sparse edges according to the length, the area where the sound source image of the lecturer on the screen is positioned is dynamically identified and confirmed, then a virtual sound image is generated by a virtual sound image positioning technology by taking the geometric center of the area as the target azimuth, and the output signals of a plurality of loudspeakers are regulated by utilizing an amplitude translation, time delay regulation or mixing method, so that the lecturer can feel the target azimuth where the virtual sound image is positioned.
When the embodiment of the application is applied to video conference playing, a screen display area is unevenly divided into a plurality of rectangular areas with dense middle and sparse edges according to the length from short to long, which area a conference sound source image on a screen is positioned in is dynamically identified and confirmed, then a virtual sound image is generated by a virtual sound image positioning technology by taking the geometric center of the area as a target azimuth, and the output signals of a plurality of loudspeakers are regulated by utilizing an amplitude translation, time delay regulation or mixing method, so that a watching conference can perceive the target azimuth where the virtual sound image is positioned.
When the embodiment of the application is applied to live broadcast, a screen display area is unevenly divided into a plurality of rectangular areas with dense middle and sparse edges according to the length from short to long, which area a live broadcast sound source image on a screen is positioned in is dynamically identified and confirmed, then a virtual sound image is generated by a virtual sound image positioning technology by taking the geometric center of the area as a target azimuth, and the output signals of a plurality of loudspeakers are regulated by utilizing an amplitude translation, time delay regulation or mixing method, so that a live broadcast viewer can perceive the target azimuth where the virtual sound image is positioned.
When the embodiment of the application is applied to remote teaching playing, a screen display area is unevenly divided into a plurality of rectangular areas with dense middle and sparse edges according to the length from short to long, which area a sound source image of a teaching person on a screen is positioned in is dynamically identified and confirmed, then a virtual sound image is generated by a virtual sound image positioning technology by taking the geometric center of the area as a target azimuth, and output signals of a plurality of loudspeakers are regulated by utilizing amplitude translation, time delay regulation or a mixing method, so that the teaching person can perceive the target azimuth of the virtual sound image.
In an example, referring to fig. 8, a block diagram of a playback apparatus for sound and picture co-location according to an embodiment of the present invention is shown, and as shown in fig. 8, the apparatus may specifically include the following modules:
A region dividing module 801 for dividing the screen from the center to the edge into a plurality of sound image regions from short to long according to the length;
a sound source position acquisition module 802, configured to acquire first position information of a first sound source image in a screen;
A judging module 803, configured to determine, according to the first position information, a first sound image area to which the first sound source image belongs in the screen;
a target azimuth calculation module 804, configured to obtain a first target azimuth according to the first sound image area;
And a virtual sound image generating and playing module 805, configured to generate a virtual sound image of the first sound source image in the first target azimuth, and play sound through the virtual sound image.
In one example, the apparatus may specifically further include the following modules:
and the real-time monitoring module is used for detecting the position information of the first sound source image in real time, and playing sound through the virtual sound image in the first target azimuth when the first sound source image changes from the first position to the second position information belonging to the first sound image area.
In one example, the apparatus may specifically further include the following modules:
The real-time monitoring module is also used for detecting the position information of the first sound source image in real time, and when the first sound source image is changed from the first position information to third position information belonging to a second sound image area;
obtaining a second target azimuth according to the second acoustic area;
And moving the virtual sound image from the first target position to the second target position, and playing sound through the virtual sound image at the second target position.
In one example, the real-time monitoring module is further specifically configured to determine a detection frequency according to a response time of the viewer to the picture and a delay time of the audio;
and detecting the position information of the first sound source image according to the detection frequency.
In one example, the target position calculation module is specifically configured to determine a geometric midpoint of the first sound image area, and determine the geometric midpoint as the first target position.
In one example, the region dividing module is specifically configured to divide the screen into M lateral regions and N longitudinal regions in a central symmetry manner to obtain a plurality of sound image regions, where M and N are respectively odd numbers, lateral lengths of the lateral regions gradually increase from the center to the edge, and longitudinal lengths of the longitudinal regions gradually increase from the center to the edge.
In one example, the sound source position obtaining module is specifically configured to, when the first sound source image is a human or an animal, obtain the first position information as a mouth key point;
When the first sound source image is an object, the first position information is a lateral geometric center of the object.
The foregoing embodiment of the present invention provides a playback apparatus for sound and picture parity, which can implement the technical solution described in the foregoing embodiment of a method for playing sound and picture parity, and the specific implementation principles of the foregoing modules or units may refer to the corresponding content in the foregoing embodiment of a method for playing sound and picture parity, which is not described herein again.
In one example, referring to fig. 9, a block diagram of an electronic device 900 according to an embodiment of the present invention is shown, where the electronic device 900 includes a processor 901, a memory 902, and a display 903, as shown in fig. 9. Fig. 9 shows only some of the components of the electronic device 900, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.
The memory 902 may be an internal storage unit of the electronic device 900, such as a hard disk or memory of the electronic device 900, in some embodiments. The memory 902 may also be an external storage device of the electronic device 900 in other embodiments, such as a plug-in hard disk provided on the electronic device 900, a smart memory card (SMARTMEDIACARD, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (FLASHCARD), or the like.
Further, the memory 902 may also include both internal storage units and external storage devices of the electronic device 900. The memory 902 is used for storing application software and various types of data for installing the electronic device 900.
The processor 901 may be a central processing unit (CentralProcessingUnit, CPU), microprocessor or other data processing chip in some embodiments, for executing program codes or processing data stored in the memory 902, such as a sound and picture co-location playing method in the present invention.
The display 903 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-emitting diode) touch device, or the like in some embodiments. The display 903 is used to display information at the electronic device 900 and to display a visual user interface. The components 901-903 of the electronic device 900 communicate with each other over a system bus.
In some embodiments of the present invention, when the processor 901 executes the playback program for sound and picture parity in the memory 902, the following steps may be implemented:
dividing a screen into a plurality of sound image areas from the center to the edge according to the length from short to long, acquiring first position information of a first sound source image in the screen, determining the first sound image area of the first sound source image in the screen according to the first position information, and acquiring a first target azimuth according to the first sound image area;
and generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
It should be understood that the processor 901 may perform other functions in addition to the above functions when executing the playback program for sound and picture reproduction in the memory 902, and in particular reference is made to the description of the corresponding method embodiments above.
Further, the type of the electronic device 900 is not particularly limited, and the electronic device 900 may be a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a wearable device, a laptop computer (laptop), or other portable electronic devices. Exemplary embodiments of portable electronic devices include, but are not limited to, portable electronic devices that carry IOS, android, microsoft or other operating systems. The portable electronic device described above may also be other portable electronic devices, such as a laptop computer (laptop) or the like having a touch-sensitive surface, e.g. a touch panel. It should also be appreciated that in other embodiments of the invention, electronic device 900 may not be a portable electronic device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch panel).
In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements a method for playing sound and picture parity provided by the above methods, the method including dividing a screen into a plurality of sound image areas from short to long according to lengths from center to edge, acquiring first position information of a first sound source image in the screen, determining a first sound image area to which the first sound source image belongs in the screen according to the first position information, acquiring a first target azimuth according to the first sound image area, generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program that instructs associated hardware, and that the program may be stored in a computer readable storage medium. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing describes the principles and embodiments of the present invention in detail using specific examples to illustrate the principles and embodiments of the present invention, and the foregoing examples are provided to assist in understanding the methods and core ideas of the present invention, and meanwhile, according to the ideas of the present invention, those skilled in the art may change in the specific embodiments and application ranges, so that the disclosure should not be construed as limiting the invention.

Claims (10)

1. A sound and picture co-located playing method is characterized by comprising the following steps:
dividing the screen into a plurality of sound image areas from the center to the edge according to the length from short to long;
acquiring first position information of a first sound source image in a screen;
Determining a first sound image area of the first sound source image in a screen according to the first position information;
obtaining a first target azimuth according to the first sound image area;
and generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
2. The playback method as recited in claim 1, wherein generating a virtual sound image of the first sound source image in the first target azimuth, and after playing back sound by the virtual sound image, further comprises:
And detecting the position information of the first sound source image in real time, and playing sound through the virtual sound image in the first target azimuth when the first sound source image changes from a first position to second position information belonging to the first sound image area.
3. The sound and picture co-location playing method according to claim 1 or 2, characterized in that generating a virtual sound image of the first sound source image in the first target azimuth, and after playing sound by the virtual sound image, further comprising:
Detecting the position information of the first sound source image in real time, and when the first sound source image is changed from the first position information to third position information belonging to a second sound image area;
obtaining a second target azimuth according to the second acoustic area;
And moving the virtual sound image from the first target position to the second target position, and playing sound through the virtual sound image at the second target position.
4. The method for playing sound and picture co-ordination according to claim 3, wherein said detecting in real time the position information of said first sound source image specifically comprises:
determining detection frequency according to the response time of the audience to the picture and the delay time of the audio;
and detecting the position information of the first sound source image according to the detection frequency.
5. The method for playing sound and picture co-ordination according to claim 1 or 2, wherein said obtaining a first target azimuth according to the first sound image area specifically comprises:
and determining a geometric midpoint of the first sound image area, and determining the geometric midpoint as a first target position.
6. The method for playing sound and picture co-ordination according to claim 1 or 2, wherein the dividing the screen from the center to the edge into a plurality of sound image areas from short to long according to the length, specifically comprises:
The screen is divided into M transverse areas and N longitudinal areas in a central symmetry mode to obtain a plurality of sound image areas, M and N are respectively odd numbers, the transverse lengths of the transverse areas gradually increase from the center to the edge, and the longitudinal lengths of the longitudinal areas gradually increase from the center to the edge.
7. The method for playing sound and picture co-ordination according to claim 1 or 2, wherein the obtaining the first position information of the first sound source image in the screen specifically includes:
when the first sound source image is a human or animal, the first position information is a mouth key point;
When the first sound source image is an object, the first position information is a lateral geometric center of the object.
8. The sound and picture co-located playing device is characterized by comprising a region dividing module, a sound source position acquisition module, a judging module, a target azimuth calculation module and a virtual sound image generation playing module;
the region dividing module is used for dividing the screen into a plurality of sound image regions from the center to the edge according to the length from short to long;
the sound source position acquisition module is used for acquiring first position information of a first sound source image in a screen;
the judging module is used for determining a first sound image area of the first sound source image in a screen according to the first position information;
the target azimuth calculation module is used for obtaining a first target azimuth according to the first sound image area;
The virtual sound image generation and playing module is used for generating a virtual sound image of the first sound source image in the first target azimuth, and playing sound through the virtual sound image.
9. An electronic device comprising a memory and a processor, wherein,
The memory is used for storing programs;
The processor is coupled to the memory for executing the program stored in the memory for implementing the steps in a sound and picture co-location playing method according to any one of the preceding claims 1 to 7.
10. A computer readable storage medium storing a computer readable program or instructions which when executed by a processor is capable of carrying out the steps of a method of playing sound and picture co-ordinates as claimed in any one of claims 1 to 7.
CN202411772963.3A 2024-12-04 2024-12-04 A method, device, equipment and medium for playing sound and picture in the same position Pending CN119815266A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411772963.3A CN119815266A (en) 2024-12-04 2024-12-04 A method, device, equipment and medium for playing sound and picture in the same position

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411772963.3A CN119815266A (en) 2024-12-04 2024-12-04 A method, device, equipment and medium for playing sound and picture in the same position

Publications (1)

Publication Number Publication Date
CN119815266A true CN119815266A (en) 2025-04-11

Family

ID=95263197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411772963.3A Pending CN119815266A (en) 2024-12-04 2024-12-04 A method, device, equipment and medium for playing sound and picture in the same position

Country Status (1)

Country Link
CN (1) CN119815266A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012004994A (en) * 2010-06-18 2012-01-05 Jvc Kenwood Corp Sound playback system and sound playback method
CN113810837A (en) * 2020-06-16 2021-12-17 京东方科技集团股份有限公司 A synchronous sound control method of a display device and related equipment
CN114416014A (en) * 2022-01-05 2022-04-29 歌尔科技有限公司 Screen sounding method and device, display equipment and computer readable storage medium
CN116347320A (en) * 2022-09-07 2023-06-27 荣耀终端有限公司 Audio playback method and electronic device
CN116567322A (en) * 2022-01-28 2023-08-08 纬创资通股份有限公司 Multimedia system and multimedia operation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012004994A (en) * 2010-06-18 2012-01-05 Jvc Kenwood Corp Sound playback system and sound playback method
CN113810837A (en) * 2020-06-16 2021-12-17 京东方科技集团股份有限公司 A synchronous sound control method of a display device and related equipment
CN114416014A (en) * 2022-01-05 2022-04-29 歌尔科技有限公司 Screen sounding method and device, display equipment and computer readable storage medium
CN116567322A (en) * 2022-01-28 2023-08-08 纬创资通股份有限公司 Multimedia system and multimedia operation method
CN116347320A (en) * 2022-09-07 2023-06-27 荣耀终端有限公司 Audio playback method and electronic device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林莹: "声画同步技术能为户外广告带来什么?——对话上海霓玺计算机科技有限公司首席技术官邵磊", 《中国广告》, 31 December 2017 (2017-12-31) *

Similar Documents

Publication Publication Date Title
US11055057B2 (en) Apparatus and associated methods in the field of virtual reality
US8571192B2 (en) Method and apparatus for improved matching of auditory space to visual space in video teleconferencing applications using window-based displays
US9544527B2 (en) Techniques for localized perceptual audio
CN105325014A (en) Sound field adjustment based on user tracking
US20210112361A1 (en) Methods and Systems for Simulating Acoustics of an Extended Reality World
CN101459797A (en) Sound positioning method and system
JP2021535632A (en) Methods and equipment for processing audio signals
JP2021533593A (en) Audio equipment and its operation method
US20250310713A1 (en) Methods, apparatus and systems for audio reproduction
US11102604B2 (en) Apparatus, method, computer program or system for use in rendering audio
CN116193196A (en) Virtual surround sound rendering method, device, equipment and storage medium
CN115955622A (en) 6DOF rendering for audio captured by a microphone array at a position outside the microphone array
US20220171593A1 (en) An apparatus, method, computer program or system for indicating audibility of audio content rendered in a virtual space
CN101330585A (en) Method and system for sound localization
CN116261096A (en) Audio system that dynamically adjusts to the target listening point and eliminates distractions from ambient objects
CN119815266A (en) A method, device, equipment and medium for playing sound and picture in the same position
US20220036075A1 (en) A system for controlling audio-capable connected devices in mixed reality environments
CN119815267A (en) A method, device and equipment for playing sound and picture in the same position
JP2025072927A (en) Audio processing device and its program
JP2024135229A (en) Program and method for generating virtual sound signal considering listener's direction, and virtual sound realization device
CN120812305A (en) Audio processing method, device, electronic equipment, user equipment and storage medium
JP2021525980A (en) Index skiming on filter parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination