From carson@siggraph.org Fri Mar 6 06:11:17 1998 Received: from siggraph.org (siggraph.org [205.168.252.205]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id GAA00297 for ; Fri, 6 Mar 1998 06:11:09 +0100 Received: from study.huntleigh.net by siggraph.org (SMI-8.6/SMI-SVR4) id WAA06027; Thu, 5 Mar 1998 22:09:07 -0700 Message-Id: <3.0.32.19980305221853.00fc66a0@siggraph.org> X-Sender: carson@siggraph.org X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Thu, 05 Mar 1998 22:19:31 -0700 To: SC24@dkuug.dk, vrml-mpeg4@vrml.org From: Steve Carson Subject: Text only version of comments on MPEG-4 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The SC 24 Secretary will submit these comments to SC 29 at the close of business today, Friday 6 March 1998. Please e-mail any concerns or corrections to bothg myself and Dick Puk (carson@sigggaph.org and puk@igraphics.com) as soon as possible. Some formatting will not be correct in this text version so a MS Word version follows. Comments from ISO/IEC JTC 1/ SC 24 on=20 ISO/IEC CD 14496-1 (MPEG-4)=20 (ISO/IEC JTC 1/ SC 29 N2291) The following comments were prepared by the national bodies of ISO/IEC JTC 1/ SC 24 and the VRML Consortium through its Category C Liaison with ISO/IEC JTC 1/SC 24. General Comments: 1. Since CD 14496-1 (MPEG-4 Systems) makes normative reference to both the abstract syntax and the semantics of ISO/IEC 14772-1 (VRML) there must be a normative mapping included within CD 14496-1 (perhaps in a normative annex) that precisely defines how: a) the abstract syntax of the CD 14496-1 data stream maps to the abstract syntax of VRML in sufficient detail to determine that the conformance requirements of ISO/IEC 14772-1 regarding abstract syntax are being met; b) nodes that listed in Clause 7.2 as "common" between CD 14496-1 and ISO/IEC 14772-1 map; c) nodes that are unique to CD 14496-1 (and not present in VRML) but that have functionality related to presentation of and interaction with information can be realised as conforming extensions of ISO/IEC 14772-1 using the available extension mechanisms (PROTO and EXTERNPROTO). The above information should be provided for the 2D, 3D, VRML and complete profiles as defined in Sub-clauses 7.8.1.1 through 7.8.1.4, but not necessarily for the audio profile as defined in Sub-clause 7.8.1.5. 2. There is no RER document accompanying this specification as required by JTC1 Procedures. This document should be created and should accompany the FCD ballot. 3. It is not clear how the MPEG-4 standard plans to cope with the evolution of the VRML standard. In consequence, it is likely that incompatibilities between VRML and MPEG-4 will arise as VRML is extended. For example, what happens if a VRML extension is similar to, but incompatible with, an extension defined by CD 14496-1? As the normative parts of this document refer frequently to the VRML specification, VRML must be a normative reference. As a consequence, SC29 should investigate how the two should stay in step. SC24 and the VRML Consortium believe that simply referencing a specific version of ISO/IEC 14772-1 is an insufficient alignment strategy, since both VRML and the many commercial products that conform to it will evolve in an orderly and planned way. A plan for maintaining alignment should be formulated and published. 4. The way the semantics for the new nodes is described in the MPEG-4 document is muddled with the details of the encoding. The semantics and abstract syntax of nodes defined in MPEG-4 should be given first with the encoding of those nodes separately defined later in the standard. This will make it much easier to distinguish between semantics, syntax, and encoding. 5. New nodes to support 2D have been included in CD 14496-1 to satisfy some perceived requirements of MPEG-4. Many of these requirements are also useful for users of VRML. However, the design provided in CD 14496-1 does not integrate well with the ISO-standardised VRML functionality. The VRML Consortium has initiated a project to carefully architect 2D functionality into VRML which not only will satisfy the MPEG-4 requirements but also satisfy on-going requirements of the VRML Community. A preliminary design for these nodes is attached as Exhibit A which is considered an inherent part of these comments. It should be emphasised that this preliminary design is only an initial draft, but it is the base document for the first planned amendment to ISO/IEC 14772-1. Work PDAM 1 to IS 14772-1 is underway but could not be completed in time for these comments. The intent of the 2D design present in Exhibit A is to minimise the number of nodes while still meeting all MPEG-4 requirements and allowing the specification of a 2D-only profile for VRML. While the encoding of the new nodes in binary format will not be part of this amendment, the same techniques used by MPEG-4 for creating a binary encoding of the 3D nodes should be straightforward. 6. Throughout: References to IS 14772-1 refer to "Sections". However, International standards refer to such subdivisions as "Clauses". See Part 3 of the ISO Directives for rules on specifying references to sub-clauses of other standards. 7. For all Semantic Tables: There is semantic information in the font used to specify the field/event names as defined in IS 14772-1. This same semantic information has been lost from the semantic tables contained in CD14496-1 and should be recovered. 8. For all Semantic Tables: Events (both eventIns and eventOuts) do not have default values. The entry in the tables for these items should be= blank. 9. Each of the new nodes being introduced which define geometry (e.g., Circle and Curve2D) should have accompanying figures illustrating the geometry. 10. For all Semantic Tables: A format for this table was agreed at the Fribourg meeting but was not used in the preparation of CD 14496-1. Since the details of the agreed format are critical for the binary encoding, an additional ballot round at CD (not FCD) is requested to provide adequate review time. 11. It is absolutely crucial that ALL nodes defined in IS 14772-1 be supported in the first ISO/.IEC 14496-1 standard. Since several nodes have been left out of this specification, it must be considered incomplete. Unless all nodes are supported, use of VRML content will not be possible and implementations of IS 14496-1 will not able to be considered conforming implementations of IS 14772-1. This would be unfortunate for both MPEG-4 and VRML. 12. A critical part of the VRML functionality has been left out of CD 14496-1. This is the PROTO and EXTERNPROTO mechanism. These nodes are needed to support effective world authoring. PROTO increases a compression ratio of a single BIFS stream if there are repeats of similar scene graph parts. The implementation is not particularly difficult because it is quite similar to macro in other programming languages. EXTERNPROTO increases a compression ratio of multiple BIFS streams if there are repeats of similar scene graph parts in them. Again, the implementation is not particularly difficult because it is quite similar to macro in other programming languages. The special form of the EXTERNPROTO mechanism which allows browser-specific implementations need not be supported. The EXTERNPROTO mechanism is a highly effective means of pre-caching node definitions at the end user site (i.e., the terminal) so that they need not be downloaded with every world. The VRML Consortium is even now in the process of identifying a standard object library which can be downloaded once and then used by all worlds which wish to access the objects. Such objects include PROTO definitions, textures, and audio clips. Most VRML content uses the PROTO and EXTERNPROTO facility to make authoring easier and more efficient. 13. Pixel coordinate addressing is incompatible with VRML and should not be used. The Image(2D) (sub-clause 7.2.5.2.2.8) and VideoImage(2D) (sub-clause 7.2.5.2.2.24) nodes can define the rendering of their content such that the content elements are mapped to single pixels. However, the positioning of these nodes should be in standard transformation units. 14. The portion of the specification concerned with Facial Animation (in particular, sub-clause 7.2.3.3) effectively fills the need for which it was designed. However, this design uses one rather narrowly focused animation technique and does not allow for easy integration with other techniques. It is difficult to author and places a heavy burden on authors who wish to use other techniques. Other valid techniques include (but are not limited to) keyframed CoordinateInterpolators, morph targets, joint animation, texture map animation, and animated "cut-out" shapes. There are also other parameter systems for facial animation. It is important that animators who want to use other techniques not have to carry the burden of FAPs (or any other high-level technique) by default. A structure such as the one being developed by the VRML Consortium H-Anim Working Group should be adopted to allow selection of the most appropriate technique.=20 Technical Comments: 1. Figure 7-1: The terms DMIF and CB have not been previously defined. 2. Clause 7.2.1.2: In the second sentence, the term "syntax" is not correct. Both the syntax and the semantics of the constructs must be specified for a scene description to be properly presented. 3. Clause 7.2.2.1: the term "attribute" is typically used in computer graphics to specify modifiers of geometry. It is suggested that the more general term "property" be used to avoid confusion. Note that, in VRML, nodes have both geometric and appearance properties. Typically, the appearance properties take on the term attributes. Thus, bullet item 1 in this sub-clause becomes more clear. As currently stated, there is no bullet item indicating that the geometry of a node (or the pixels in an image texture) are part of the scene description when clearly they are. It is also unclear whether geometric nodes such as a sphere have "audio/video properties". Typically, throughout the industry the term "audio/video" refers to aural and visual (video) streams only. 4. Clause 7.2.2.2, 1st paragraph. This paragraph is inaccurate. VRML specifies a complete spectrum of audio/video as well as synthetic graphics elements within a scene description. This paragraph should be reworded as: "The BIFS scenes are described conforming to the provisions of IS 14772-1 with additional BIFS-specific nodes. The combination provides the following features: =B7 2D only primitives =B7 3D only primitives =B7 A mix of 2D and 3D primitives, in several ways: =B7 2D and 3D complete scenes layered in a 2D space with depth =B7 2D and 3D scenes used as texture maps for 2D or 3D primitives =B7 2D scenes drawn in the local X-Y plane of the local coordinate system in a 3D scene" In addition, the term "primitives" is not used in VRML and should not be used in BIFS. It is suggested that this term be replaced by "geometric nodes". Note also that there is nothing in VRML that keeps a properly formed VRML world from being composed only of aural nodes with no geometric nodes present. 5. Clause 7.2.2.3: The "2D coordinate system" should not be finite in extent. In fact, 2D coordinate systems should be considered a degenerate case of 3D coordinate systems. When a Viewport2D node is specified, any limitations on a conceptually infinite 2D coordinate system should then be stated as part of this environmental construct. 6. Clause 7.2.2.3, 1st paragraph: This paragraph implies that coordinate units may be non-square. This should, in fact, only happen when non-uniform scaling is being applied by a Transform node. Conceptually, all coordinate units should be considered of the same size unless such scaling is being applied. The mapping of any coordinate system (2D or 3D) to the rendering surface should assume that no such scaling occurs during this mapping. It should be up to the implementation of the compositor to adapt for any non-square pixels in the implementation hardware. 7. There is no clear definition what should happen when the aspect ratio of the top Layer2D and device display area are different. The possible mappings from the top Layer2D to the device rendering area are:=20 =B7 fit the longer edge keeping the aspect ratio and allow blank area beside= s shorter edges; =B7 fit the shorter edge keeping the aspect ratio and allow clipping for the longer edge direction; =B7 fit both edges without keeping the aspect ratio; =B7 leave it as it is described in Layer2D; =B7 allow the aspect ratio and extent of the Layer to be different and independent of the display surface. The above mappings should be defined, and choosable from the content. 8. Clause 7.2.2.5: The first row of the table is not correct. IS 14772-1 considers all coordinates to be specified in meters. There should be no difference between 2D and 3D coordinate units. 9. Clause 7.2.2.8.2: The term "runtime" is nowhere defined. 10. Clause 7.2.2.9: 1024 is an inadequate number of identifiers. It is quite easy to create complicated worlds that require many more than that number. Note that there are essentially an infinite number in VRML since arbitrary combinations of the allowed characters may be used. It is suggested that a BIFS control node be provided which specifies the number of bits which are to be used for this purpose. Alternatively, a 32-bit field would effectively remove this restriction. 11. Clause 7.2.2.12.1.1: The use in this paragraph of "For instance" is inappropriate in an ISO/IEC standard. Each such item required by BIFS should be enumerated. 12. Clause 7.2.2.13.7.1: "ROUTEs" are not nodes. The third action makes no sense as currently written. 13. Clause 7.2.2.14.1: In the 6th paragraph, it is not clear what the text "[number 1 through k(1)]" means. It would seem that the range being indicated has one data type (1) on one end and another data type (k(1)) on the other. Should this be "1 .. j" or "k(1) .. "k(j)"? 14. Clause 7.2.2.14.5: 14. Clause 7.2.2.14.5: This concept is useful and is not limited to 2D. But placing ordering information in the child nodes leads to inefficient traversals. This information should be in a parent node explicitly designed for ordering. An OrderedGroup node should be specified which would behave identically to the Group node (and can contain both 2D and 3D nodes within the group) but would also specify a drawing order field. See Exhibit A for more information. 15. Figure 7-10 is highly confusing and nowhere described. It should both be referenced from the text and its content described. 16. Clause 7.2.3.1.2: This description does not conform to VRML. VRML requires that multiple WorldInfo nodes be supported and that they occur anywhere. These nodes are the means of providing copyright and various other information which must accompany the world content. It is also used by many VRML worlds for parametrizing the worlds via PROTOs so that the parameters can be accessed by Scripts. 17. Clause 7.2.4.1.10: In the 3rd paragraph past the equations, the reference to ANSI C is inappropriate. In the case of a binary encoding, all floating point numbers should be encoded as IEEE Floating Point with an appropriate entry in Clause 2 for the IEEE standard for floating point numbers. The ANSI C float specification supplies only to a string representation of a floating point number. 18. Clause 7.2.4.1.10: The bullet should actually be an enumerated list. In the first bullet item, it is not clear that the encoding used retains all precision of the associated floating point value. Such precision must not be lost since these normals are used in the lighting equations which can be quite sensitive to the values provided. 19. Clause 7.2.4.2.3.1: The 3rd sentence should be removed since it adds nothing to the description and gives the false implication that only 2D nodes need this facility. In actual fact, VRML requires that nodes in a group be identified by position in the children list. 20. Clause 7.2.5.1.2.1.2: The differences between AnimationStream and MovieTexture nodes should be enumerated here. It is difficult to try and identify these differences from reading the detailed semantics. It is also not clear what is being streamed by the AnimationStream node. 21. Clause 7.2.5.1.2.1.3: Since MovieTexture nodes are only used for textures, it is not clear that the same semantics apply to AnimationStream nodes which seem to be a series of commands which may do alterations to the scene graph and other actions. If the AnimationStream only applies to textures, it should be explicitly so stated; otherwise, the comparison with MovieTexture nodes should be removed and the exact behaviour of AnimationStream nodes described. The current detailed semantic seems to intermix the concepts of AnimationStream node and MovieTexture node indiscriminately. 22. Clause 7.2.5.1.2.10.3, 2nd paragraph: Character value 13 only refers to carriage return. It does not imply a linefeed. 23. Clause 7.2.5.1.2.10.3, 3rd paragraph. This is no such thing as a FontStyle2D node. The correct node type is FontStyle. 24. Clause 7.2.5.1.2.11: While this node is unnecessary, it could be left in even though more flexible and powerful functionality is already supported using Script nodes. As it is, the description is unclear as to exactly what the semantic of this node is. In addition, this node is misnamed. The term Valuator in international standards when applied to information presentation refers to an input class which can return a continuous value usually derived from some physical input device such as a dial. This node should be either removed (since Script nodes can provide the same functionality more flexibly) or, at a minimum, renamed to be something else (e.g., EventMapper). 25. Clause 7.2.5.1.3.2.1: The first field of the VRML node is missing and must be supported for conformance. 26. Clause 7.2.5.1.3.2.3: The 2nd paragraph mentions a children field but there is no children field in this node. In any case, that statement is meaningless since an AudioClip node only accesses a single sound source. 27. Clause 7.2.5.1.3.7.2: This sub-clause contains statements which attempt to duplicate information in IS 14772-1 but may actually change the semantic of the node so as to be non-conforming. Only information which restricts the semantic in a manner unique to MPEG-4 should be included.=20 28. Clause 7.2.5.1.3.7.2: The question "(CH: What do we do if it is not available?)" was posed in the text. This is obviously an editorial comment left over from an early draft. The answer should not occur or the world is non-conforming. 29. Clause 7.2.5.1.3.10.3, 2nd paragraph: A list of valid nodes should be specified. 30. Clause 7.2.5.1.3.10.4, 2nd paragraph: The text "apparent spatialization position" should be replaced by "geometry". 31. Clause 7.2.5.2.2.1: This node is unnecessary and should be removed since it duplicates the functionality of the Background node. It is easy to define a PROTO which restricts the Background node to the desired functionality. Note that it has been suggested that if the skyColor field is set to an empty Color node, a reasonable interpretation would be define the sky as being transparent. This is allowable in ISO/IEC 14772-1 and could be mandated in CD 14496-1. 32. Clause 7.2.5.2.2.6.3: This description discusses width and height fields when the semantic table only contains a size field. The detailed semantics should be rewritten appropriately to refer to the size field.. 33. Clause 7.2.5.2.2.7: This node is unnecessary and should be removed. The current Group node is perfectly adequate for the concept. 34. Clause 7.2.5.2.2.8.1: This node is unduly restrictive. There should be an additional field which allows the position of the image to be specified. 35. Clause 7.2.5.2.2.8.2: This node has nothing to do with the ImageTexture node since this is not a texture node. References to the ImageTexture node should be removed from the Detailed Semantics. 36. Clause 7.2.5.2.2.8.3, 2nd paragraph: It is more common for this type of node to be positioned using the upper left corner. 37. Clause 7.2.5.2.2.8.3: It is not clear from the semantics whether this node behaves like a Billboard node (always orienting its image to the screen plane) or like other 2D nodes where the image is painted on the current z=3D0 plane and thus is subject to transformation. 38. Clause 7.2.5.2.2.9: This node is unnecessary and should be removed. It has the same parametrization as the current IndexedFaceSet node. The dimensionality can easily be determined by the Coordinate node in the coord field. 39. Clause 7.2.5.2.2.10: This node is unnecessary and should be removed. It has the same parametrization as the current IndexedLineSet node. The dimensionality can easily be determined by the Coordinate node in the coord field. 40. Clause 7.2.5.2.2.11: This node is unnecessary and should be removed. The current Inline node is perfectly capable of inlining either a 2D or 3D world. 41. Clause 7.2.5.2.2.12.3, 3rd paragraph: The list of preferred text breaking points should include "after hyphens". 42. Clause 7.2.5.2.2.13: If linewidth is to be specified, it is also necessary to specify fields defining cap and join style. 43. Clause 7.2.5.2.2.13.2: This node should apply to all nodes which generate lines. Whether they generate the lines in a single plane or not is independent of the dimensionally. 44. Clause 7.2.5.2.2.13.3: The lineStyle field provides a very restrictive set of line styles. There is already an international standard for specifying line widths. See the specification in IS 9592-1:1997 (PHIGS). A similar mechanism is specified in IS 8632:1992 (CGM). It is not necessary to use an indirect specification but the technique for describing an arbitrary dash pattern are what is needed. It is suggested that a lineStyle node be created which contains the "dash cycle repeat length" (in meters) and the list of dash segment lengths specified in arbitrary units relative to one dash cycle. Then the lineProperties node would contain fields specifying the Adaptability, Continuity, and Offset along with a field for the lineStyle. 45. Clause 7.2.5.2.2.13.3: In the 2nd paragraph after the bullet list, it is noted that widths are considered geometric entities which are affected by transformations. This usually prohibits the use of any hardware assisted wide line capability. In addition, this assumption is computationally expensive and hence is likely to impact performance. This is especially true since the width specification is in local coordinate systems which means that the width can change over the extent of the line. It is suggested that the width specification instead be considered cosmetic and unaffected by transformation. This is typically done by allowing an implementation to choose a nominal width of line to which is applied a multiplicative linewidth scale factor specified in a field. 46. Clause 7.2.5.2.2.14.3, 1st paragraph: The diffuseColor field should probably be an emissiveColor field since the diffuseColor field is only applied when lighting is in effect. 47. Clause 7.2.5.2.2.14.3, 2nd paragraph: The filled field is inappropriate for a Material node. It should be part of the geometric node definition. 48. Clause 7.2.5.2.2.14.3, 3rd paragraph. The default behavior when this field is not specified is ill-formed. It forces an implementation to use an entire pixel for the width thus preventing any effective anti-aliasing from be applied. Instead a nominal width line should be specified. 49. Clause 7.2.5.2.2.14.3, 4th paragraph. The Shadow node renders a second polygon under the first, offset by the amount given in the given colour. This sort of approximate shadow generation has been widely used in presentation graphics for many years. Unfortunately, it is neither a powerful nor a useful technique by modern standards and fundamentally conflicts with shadows from true 3D lighting techniques. While it is true that some present systems do not implement shadows as part of their lighting models, the sort of approximation that this node suggests is best left out of a standard and added manually by the artist using an authoring system when needed. Note that this shadowing technique would be a good area for a PROTO implementation. Why is the semantic table specified differently from the others? An SC24 representative spent considerable time producing semantic tables in this form by the end of the Fribourg meeting but then they were not used when all that was needed was to cut and paste them. Please use one form or the other but not both. 50. Clause 7.2.5.2.2.15.2: This node is not a texture and hence should not be described in terms of texture nodes. In fact, it is a peculiar form of geometry which is not well-specified in as much as it is not clear whether the resultant display is affected by the associated transforms, is displayed parallel to the screen plane or in the current Z=3D0 plane, or eve= n where it is positioned. 51. Clause 7.2.5.2.2.16: This node is unnecessary and should be removed. The current PlaneSensor node provides all of the functionality. 52. Clause 7.2.5.2.2.17: This node is unnecessary and should be removed. The PointSet node can be used with the dimensionality determined by the content of the coord field. 53. Clause 7.2.5.2.2.18: This node is unnecessary and should be removed. The PositionInterpolator node can provide this functionality. 54. Clause 7.2.5.2.2.21: This node is unnecessary and should be removed.. Shadows should be computed based on the effects of lights in the scene. VRML allows such shadows to be rendered if a browser wishes. This node interferes with that at least on a conceptual level. 55. Clause 7.2.5.2.2.23: This node is unnecessary as the Transform node can be constrained to perform only 2D effects. 56. Clause 7.2.5.2.2.24: This clause seems to duplicate clause 7.2.5.2.2.15 but is more consistently presented. However, it still has the same problems in its .2 sub-clause. 57. Clause 7.2.5.3.3.13: The semantics of this node are specified in IS 14772-1. Only the restrictions should be specified in this clause. 58. Clause 7.2.5.3.3.15.2: Only the ability to reference BIFS-Updates and BIFS-Anim should be described here. The second sentence should be replaced by "The external source may produce BIFS-Updates and BIFS-Anim frames." 59. Clause 7.2.5.3.3.28.2, last paragraph: Here the semantics are undefined yet in the Group node the semantics are TBD. Both should be specified as being undefined. 60. Clause 7.2.5.3.3.29.2: This paragraph breaks most current VRML content and is not conforming to ISO/IEC 14772-1. The semantic should be identical to that defined in ISO/IEC 14772-1. 61. Clauses 7.2.5.4.2.1 and &.2.5.4.2.2: These nodes should only control the layering effect and should be combined into a single layer node. Ordering effects should be controlled if desired by encapsulating the children in an OrderedGroup node. There is no need to have two nodes for this control. There should be a single Layer node which can handle children of any dimensionality. In particular, each layer should be separately rendered including resetting the depth buffer between layers. We are still investigating the issue of possibly separating clipping from layering. 62. Clauses 7.2.5.4.2.1 and 7.2.5.4.2.2: The size field, including its implied behaviour, must be better defined. This node would be better termed a Viewport. 63. Clauses 7.2.5.4.2.3 and 7.2.5.4.2.4: These should be combined into a single CompositeTexture node which defines the scene to be used as a texture. Any valid set of children nodes should be allowed. This functionality is independent of dimensionality. 64. Clauses 7.2.5.4.2.3 and 7.2.5.4.2.4: It is not clear why there should be any restriction on the attachment of sensors to an object to which a SceneTexture is applied. However, it is reasonable for there to be a restriction on the interpretation of sensors in the children subtree of the CompositeTexture node. It is better to allow such sensors to exist but ignored than to be prohibited. In this manner, existing worlds can be used to produce a texture by in-lining them. Note that the children of a CompositeTexture node are not really part of the geometry of the scene graph of which the CompositeTexture node is a part. Instead, there is a separate local scene graph used only to produce the texture represented by the CompositeTexture node. 65. Clause 7.2.5.4.2.5: This node is redundant and duplicates the functionality already provided by the CompositeTexture node. This unnecessarily increases the footprint of the compositer.=20 Editorial Comments: 1. Throughout: The presentation of bullet and numbered lists should be made consistent and should follow the style defined in Part 3 of the ISO Directives. Note that these directives indicate that enumerated lists should use lower case letters for the first enumeration level. 2. Throughout: The Normal style should be set up to provide leading before and/or after each paragraph so that proper and consistent leading occurs between paragraphs. Extra empty paragraphs should not be used to provide leading between paragraphs. 3. Throughout: The document should be spell checked. 4. Throughout: The phrase "enables to" is not acceptable English. It is used many times throughout the document. Each occurrence should be found and the surrounding text rewritten to remove it. 5. Throughout: Several different representation styles for the contractions used for the terms "two-dimensional" and "three-dimensional" are used. A single, consistent abbreviation format should be chosen for both abbreviations and then applied throughout. 6. Throughout: Equations are typically centred on their own lines in international standards. It is suggested that this be done in this standard to insure clarity and proper layout of the equations. See Part 3 of the ISO Directives for guidance. 7. Table of Contents: Section 7.2.5.3.3.27 (Semantic Table) is actually a subsection of the previous section on the Spotlight node. 8. Clause 0.2.2: The phrase "a known amount of receiver buffers" is poor English. It is suggested that it be replaced by "a known amount of receiver buffer resource". 9. Clause 0.2.3: There is an inappropriate paragraph mark in the middle of the paragraph. 10. Throughout: The word "must" should be replaced by "shall" or otherwise reworded. 11. Clause 0.5.1: The phrase "must be properly identified" should be replaced by "require proper identification". 12. Clause 0.5.2: The second paragraph should have the following text appended: "BIFS is an encoding of the elements of IS 14772-1, the Virtual Reality Modeling Language (VRML) with additional elements and constructs." 13. Clause 0.5.3: There is an inappropriate paragraph mark in the middle of the paragraph. 14. Clause 3: According to Part 3 of the ISO Directives, the contents of this clause belong in the preceding clause since they are all normative references. Clause 3 should then be removed. Also, "DIS 14722-1" (reference 2) is now ISO/IEC 14772-1. This occurred in December 1997. 15. Clause 3: Reference 7 is inappropriate and should be replaced by a reference to ISO/IEC 10646 of which it is a part. 16. Clause 2/3: Non-ISO standards being used in MPEG-4 have not been referenced. These include applicable IETF recommended practices such as the one for Uniform Resource Locators. See ISO/IEC 14772-1 for proper= references. 17. Clause 7: Why does this clause start on a new page while previous clauses do not? The document should be consistent. 18. Clause 7.1.2.2: The occurrences of the abbreviation (e.g.) differ in presentation format (i.e., "e.g.:" vs. "e.g. "). A consistent format should be used. 19. Clause 7.1.2.3: Part 3 of the ISO Directives states that notes should be in a font size two points smaller than the standard presentation size. 20. Clause 7.2: Why is there so much white space before this sub-clause? 7.2 should immediately follow 7.1.4.3. 21. Clause 7.2.1.3.3: The use of 1st person constructs is inappropriate in an international standard. 22. Figure 7-8: This is a Table not a Figure and should be properly labelled as such. See Part 3 of the ISO Directives. 23. Clause 7.2.2.6: The first occurrence of the word "different" should be "differently". 24. Clause 7.2.2.8.1: The 1st paragraph should be reworded as follows: "For each of the basic data types, single field and multiple field data types are defined in IS 14772-1:1997, Clause 5.2. Some further restrictions are described herein." 25. Clause 7.2.2.13.3: Is this a "Working Draft" as stated herein or a "Committee Draft" as stated elsewhere? This is apparently left over from a previous version. 26. Clause 7.2.2.13.5: The spelling of "audio-visual" varies throughout the document. A consistent spelling should be used. 27. Clause 7.2.2.13.6: What does the construct "Sub-clause =85" mean? It appears that there is some sort of linked and embedded resource that was not included within the file. 28. Clause 7.2.2.13.8: The phrase "allow to trigger events" is poor English. It should be replaced by "allow triggering of events". 29. Clause 7.2.2.14: The two bullet items are indented too far. 30. Clause 7.2.2.14.1: The font used for this document does not clearly differentiate between the numeral "1" and the lower case letter "l" thus making the sixth paragraph confusing. It is suggested that the example index be changed from "l" to "j" to avoid this problem. 31. Clause 7.2.2.14.2.1: The table in this sub-clause does not have a table number or table title. It should be both numbered and titled and then referenced by number in the text. See Part 3 of the ISO Directives for table title positioning and format. 32. Clause 7.2.2.16: The term "browser" has not been previously defined. Since this term is not typically used in MPEG-4, it is suggested that it be replaced by the term "compositor". 33. Clause 7.2.2.17.2: The first sentence of this sub-clause is poorly written. It is suggested that the phrase "enables to change" be replaced by "supports external changes to". 34. Clause 7.2.2.17.2, 2nd Sentence: The term "aspect" is incorrect. A better term would be "appearance". 35. Clause 7.2.2.17.2, last Sentence: The term "behaviour" is unclear. It is suggested that this be replaced by "ROUTEs". 36. Clause 7.2.2.17.2.1: The phrase "time instant in time" should be "instant in time". 37. Clause 7.2.2.17.2.1: The 2nd sentence is poor English and should be rewritten as follows "However, continuous changes of the parameters of the scene are best provided using the animation scheme described in 7.2.2.17.3. 38. Clause 7.2.2.17.2.1: The text "The Repeat Scene command enables to repeat all the updates from the last Replace Scene." is poorly written. It should be replaced by "The Repeat Scene command may be used to replay all updates since the last Replace Scene command.". 39. Clause 7.2.2.17.2.1, last sentence: The text "identification of node field" should be "identification of a node field" 40. Clause 7.2.2.17.2.2: The term "BIFS =96Update" should be "BIFS-Update". The entire document should be checked for consistent usage and presentation of terms. 41. Clause 7.2.2.17.3: The text =91" BIFS-Anim "=92 seems to have unnecessar= y spaces within the quotation marks. 42. Clause 7.2.2.17.3.1: The text associated with the enumerated list is improperly formatted. 43. Clause 7.2.2.17.3.1: When two enumerated lists exist in the same sub clause, the second list begins its enumeration with the first available enumerant after the end of the previous list. 44. Clause 7.2.4.1.1: The term "associated to" is poor English. It should be "associated with". 45. Clause 7.2.4.1.3, 2nd sentence: "ony" should be "only". 46. Clause 7.2.4.1.4: In the 1st sentence, the term "def field" is in confusing. Should this not be "field defined for the node"? Then the following use of "defined" should be replaced by "provided". 47. Clause 7.2.4.2.5.5: The phrase "consists in" is poor English. It should be replaced by "consists of". 48. Clause 7.2.4.3: In the penultimate sentence of the 1st paragraph, the "i.e.," should be removed or the entire parenthetical expression enclosed in parentheses instead of commas. 49. Clause 7.2.4.3: The 2nd paragraph cannot be understood. It needs rewriting. 50. Clause 7.2.4.3.2.4: See Part 3 of the ISO Directives for the proper format for referencing other parts of the same standard. 51. Clause 7.2.4.3.2.4: The table should have a table title and which is then used in the reference within the text. See Part 3 of the ISO Directives for information about table titles. 52. Clause 7.2.5.1.2.1.3: What is a "VOP"? It is not clear from the context and is not included in the list of abbreviations. 53. Clause 7.2.5.2.2.1.1: The content of this sub-clause should be kept together. Exhibit A 2D in VRML A Contribution of the VRML Consortium 5 March 1998 Overview CD 14496-1 (MPEG-4 Systems) adopts the architecture, abstract syntax and semantics (including node structure) of ISO/IEC 14772-1 (VRML) as the basis for BIFS. CD 14496-1 also includes additional nodes designed to satisfy identified requirements of MPEG-4 for 2D representations and for additional control structures. After review of the CD 14496-1 specification, the VRML Consortium finds that the functionality inherent in these extensions is of general utility to the VRML community. This contribution defines a different 2D architecture that can be viewed as a refinement of the present CD 14496-1 architecture. This redefinition both satisfies the requirements of MPEG-4 and more closely matches the VMRL architecture. This contribution is the base document for a future amendment 1 to IS 14772-1. Adopting 2D functionality and a 2D-only profile of ISO/IEC 14772-1 is a major aspect of this first amendment. Note that this amendment will also change ISO/IEC 14772-1 to require mandatory support for the scripting languages whose interfaces are defined in Annexes B and C. In fact, this decision was made by the VRML Consortium in July 1997 but only recently has the state of product offerings matured sufficiently so that such mandatory support could be required without severe interoperability problems. This contribution reduces the number of 2D and 2D/3D integration nodes without loss of functionality or significant additional overhead of transmission. It makes the nodes more consistent with the current VRML nodes and allows for efficient implementation of either a 2D-only subset or a full 2D/3D set. Details New MPEG-4 Nodes for 2D In sub-clause 7.2.5.2, CD 14496-1 defines these nodes for 2D: Background2D Circle Coordinate2D Curve2D DiscSensor Form Group2D Image2D IndexedFaceSet2D IndexedLineSet2D Inline2D Layout LineProperties Material2D PlaneSensor2D PointSet2D Position2DInterpolator Proximity2DSensor Rectangle ShadowProperties Switch2D Transform2D VideoObject2D Additionally, sub-clause 7.2.5.4 defines several nodes supporting the integration of 2D and 3D: Layer2D Layer3D Composite2DTexture Composite3DTexture CompositeMap The rationale for a full set of 2D nodes appears to be based on these three assertions: 1. A full set of nodes allows a separate 2D-only profile. 2) A full set of nodes allows specifying coordinates in 2-space rather than 3-space. 3) A full set of nodes allows optimised transformation and rendering of 2D shapes. One of the motivations for these nodes as cited in discussions between the VRML and MPEG-4 communities is to allow for drawing interaction controls on the MPEG-4 terminal display. The user can then use this "dashboard" to interact with the world. In VRML, this is normally done outside the world using browser controls or using html in other frames. We understand that these options are not considered possible for many uses of MPEG-4.=20 Regarding the first assertion, it is certainly possible to have a 2D-only profile comprising a subset of 3D nodes, perhaps even with diminished functionality (e.g., IndexedFaceSets must have z=3D0). The intent of the second assertion can be provided by defining a Coordinate2D node which has only two coordinate components. This can then be binary encoded by having a QuantizationParameter specifying that all 3D coordinates are specified with 2 values (x and y) with an implied z value of 0. Alternatively, the implied z value could be specified in the QuantizationParameter for greater flexibility.=20 Regarding the third assertion, a 2D-only profile would have sufficient restrictions and defaults to allow an optimised transformation and renderer to be created. For instance, a Transform node would be required to have: translation - z must equal 0 rotation - vector must be 0 0 1 (rotations must be about z) scale - z must equal 1 center - z must equal 0 scaleOrientation - vector must be 0 0 1 This leaves the "drawingOrder" field as the only remaining difference between Transform and Transform2D. The manner in which this can be resolved is described below. 2D Node Disposition Based on the rationale given above, many 2D nodes can be removed from sub-clauses 7.2.5.2 and 7.2.5.4 of CD 14496-1 without loss of functionality or efficiency. The nodes that we believe can be eliminated and the suggested replacement strategy for each is given below: Background2D Use the Background node. A 2D-only profile would only allow the frontUrl to be specified. One of the items they need is a method of allowing underlying stuff to show through. Currently, VRML forces sky as a minimum. The interpretation of the Background node field skyColor when empty would now be that the "sky" is transparent. This is compatible with current IS 14772-1 which does not state an interpretation for the case of this field being empty. Coordinate2D It may be possible to remove this node also by degenerating the Coordinate node to handle 2D coordinates as well as 3D coordinates. This can certainly be done using the QuantizationParameter node of the binary encoding. Whether there a way of doing the same thing for the utf-8 encoding is being investigated. DiscSensor Use the CylinderSensor. Restricted in 2D-only profile to provide functionality of the DiscSensor. Group2D Use Group. IndexedFaceSet2D Use IndexedFaceSet. The coordinate overhead is taken care of by using Coordinate2D node in the coord field and a 2D-only profile can make the concept of "solid" ignored. IndexedLineSet2D Use IndexedLineSet The coordinate overhead is taken care of by using Coordinate2D node in the coord field. Inline2D Use Inline LineProperties See Material2D Material2D This node has a "filled" field, which causes IndexedFaceSet to be filled or unfilled. But that duplicates the functionality of IndexedLineSet. A filled flag is needed for Rectangle and Circle (and it is assumed it applies to Curve2D as well). It is agreed that the Circle, Rectangle, and Curve primitive need to be filled or unfilled. There are two alternatives: separate nodes, like IndexedFaceSet and IndexedLineSet, or a "filled" field in each node. The latter is better. This would be conceptually similar to the "beginCap" field or the "solid" field, both of which are contained in the geometric nodes they control. The lineStyle is a SFInt32 with 6 line styles (such as solid or dashed-dotted-dotted). Other line styles which might be needed are not supported. A better mechanism is to support dash definitions as described in Technical Comment 44 of the main document. Also, this node defines the line width, but not the join style and cap style of the lines. This is very important once lines are wider than about 3 pixels. . It is agreed that a Circle, Rectangle, and Curve primitive are needed. These primitives also need to be filled or unfilled. There are two alternatives: separate nodes, like IndexedFaceSet and IndexedLineSet, or a "filled" field in each node. The latter is better. There seems to be no good reason for extra nodes, and this field is similar to the "beginCap" field or the "solid" field, both of which are in the nodes they control. However, the meaning of "filled" in the case of the Curve node must be defined. Next is the issue of controlling the line style. The idea of a LineStyle node (even a limited lineStyle field) is a good one, but it should be referenced by the nodes using it, rather than the Material2D node. This makes it look like the Text node, which references the FontStyle node. So, the following 3 nodes are required: Circle { field SFFloat radius 1 field SFNode lineStyle NULL field SFBool filled TRUE } Curve { exposedField SFNode point NULL exposedField SFInt32 fineness 0 field SFNode lineStyle NULL field SFBool filled TRUE } Rectangle { field SFVec2f size 2 2 field SFNode lineStyle NULL field SFBool filled TRUE } Then the Material2D property should be removed and the Material property be used in its place. For 2D-only profiles (which do not include lighting) only the emissiveColor and transparency fields would be used. LineProperties This node should be renamed LineProperty to fit with the rest of VRML. PlaneSensor2D Use PlaneSensor. Normal restrictions would apply for a 2D-only profile. PointSet2D Use PointSet Position2Dinterpolator Use PositionInterpolator Proximity2Dsensor Note that this node has NO description. In a 2D profile without navigation, what use is a ProximitySensor? This node should be removed. No replacement functionality is needed for a 2D-only profile. Rectangle Redesigned as above ShadowProperties Is this really necessary. Is it that much more expensive to use reuse the Rectangle, Curve, etc. nodes transformed and with a different Material? This would be especially inexpensive if they had the ability to create a "ShadowedObject" PROTO. Switch2D Use Switch Transform2D Use Transform (see discussion on drawingOrder below).=20 If the nodes listed above are eliminated from sub-clauses 7.2.5.2 and 7.2.5.4 of CD 14496-1, the set of remaining 2D nodes is: Circle Curve Form Layout LineProperty Rectangle VideoObject2D Image2D A new node should be added to sub-clause 7.2.5.4 to handle the issue of drawing order. This would have semantics for both 2D and 3D scenes as= follows: OrderedGroup { eventIn MFNode addChildren eventIn MFNode removeChildren exposedField MFNode children [] exposedField MFInt32 order [] field SFVec3f bboxCenter 0 0 0 field SFVec3f bboxSize -1 -1 -1 } This is simply a group with an extra "order" field. This field specifies the desired drawing order, with one value per child. Children with the lowest value are drawn first, highest are last. Children with the same order value are drawn earliest child first. Any children without a value they are drawn earliest child first. That makes the default order (empty) draw children from first to last. For 2D scenes, this simply layers higher ordered children on top of lower ordered children. For 3D scenes this properly layers children with identical z values without producing z tearing. For instance, this would allow a rectangle with a texture of a painting to be placed on a wall without that rectangle z tearing. There are well known algorithms for doing this in a z-buffer renderer (in fact OpenGL has a special extension to handle it). OrderedGroup would perform the same job as the "drawingOrder" field of Transform2D, but it would do it more efficiently, and it would be useful in 3D scenes as well. 2D/3D Node Disposition The Layer2D and Layer3D nodes in sub-clause 7.2.5.4 should be replaced by a single Layer node. It is not clear why the Layer3D node has "background", "fog", "navigationInfo", and "viewpoint" fields. These should be handled by normal VRML semantics of the nodes in the "children" field of the Layer node. That is really the only difference between these nodes. Also, the depth field should be replaced by the "order" concept from OrderedGroup. It is much simpler for a parent node to have knowledge of the desired rendering order of its children rather than having to traverse each child twice, once to find out the rendering order and again to do the actual rendering. So, the following node is defined to replace the Layer2D and Layer3D nodes in the present subclauses 7.2.5.4.2.1 and 7.2.5.4.2.2: Layer { field SFNode child NULL exposedField MFNode childrenLayer [] exposedField SFVec2f translation 0 0 exposedField SFVec2f size -1 -1 } Note that the children MFNode has been replaced with a private "child" SFNode field. This is because making this node have children makes it like a Group node. As such it would need to have bbox and addChildren/replaceChildren events. This simplifies that. If access to the children is needed, the "child" field can be a Group node and access the children obtained through that. The Composite2DTexture and Composite3DTexture nodes in sub-clauses 7.2.5.4.2.3 and 7.2.5.4.2.4 should be combined as well. A better name for this node would be SceneTexture since it makes it more obvious what it does. The following defines this node: SceneTexture { field SFNode child NULL exposedField SFVec2f size -1 -1 field SFBool repeatS TRUE field SFBool repeatT TRUE } It is not clear why the CompositeMap is needed. If it is an optimisation to place a SceneTexture onto a simply rectangle it seems to be overkill. It is not clear how this would optimise the rendering of 2D objects into a 3D scene. In order to maintain correct perspective you would pretty much need to separately render and texture the result. Even if such an optimisation were devised, how hard is it to detect that the geometry was a simply rectangle (or whatever simplified geometric shape the optimisation could be applied to. Sub-clause 7.2.5.4.2.5 defining CompositeMap should be removed. To satisfy the requirement for an orthographic projection, an OrthographicViewpoint node would be added. This node would have the following definition: OrthoViewpoint { eventIn SFBool set_bind exposedField SFFloat aspectRatio 0 exposedField SFFloat height 2 exposedField SFBool jump TRUE exposedField SFRotation orientation 0 0 1 0 # [-1,1],(-,) exposedField SFVec3f position 0 0 10 # (-,) field SFString description "" field SFBool adjustViewport FALSE eventOut SFTime bindTime eventOut SFBool isBound } This is simply a Viewpoint node with the added "adjustViewport" field, and the "fieldOfView" field replaced with "aspectRatio" and "height". The defaults would create a viewpoint where 0,0,0 would be in the centre of the viewport and the corners of a perfectly square viewport would be (1,1) and (-1,-1). A non-square viewport would adjust the width (and therefore the corner coordinates) to be wider or narrower than the specified height field. In all other ways (binding, description, etc.) this node would behave just like a Viewpoint node. Note that I have left in the 3D position and orientation fields. I think this is important for consistency. In a pure 2D environment their default values would give reasonable results. Changing the position, and even the orientation in a pure 2D environment gives very useful effects and their overhead is quite minimal. Moreover, they are needed for this node to be used in 3D. The results of making these changes gives a re-written sub-clause 7.2.5.4.2 containing only three new sub-clauses, each defining one of these new nodes: 7.2.5.4.2.1 OrderedGroup 7.2.5.4.2.2 Layer 7.2.5.4.2.3 SceneTexture Conclusions The above 2D node architecture provides all the functionality of the current CD 14496-1 design plus a bit more. It also removes the artificial separation of 2D and 3D nodes. Removal of this artificial separation is a primary goal of the VRML Consortium. Note: along with the above redesign, the QuantizationParameter node would need a few added fields to take into account optimisations to coordinate and transformation parameters. Steve Carson Chair, ISO/IEC JTC 1/SC 24 Computer Graphics and Image Processing --------------------------------------------------------- Steve Carson phone: +1-505-521-7399 GSC Associates Inc. fax: +1-505-521-9321 5272 Redman Road e-mail: carson@siggraph.org Las Cruces, NM 88011 USA ---------------------------------------------------------