Feature Article

DPFrag: Trainable Stroke Fragmentation Based on Dynamic Programming R. Sinan Tümen and T. Metin Sezgin ■ Koç University

M

any computer graphics applications require approximation of digital curves using sets of prespecified geometric primitives. For example, CAD and freehand modeling applications require fitting Nurbs (nonuniform rational B-splines) to digital curves.1,2 Graphics applications that support sketch-based interaction typically require fitting line and arc segments to hand-drawn curves to set the stage for sketch recognition.3 In all these applications, the goal is to find the best way to represent a digital curve using geometric primitives that collectively produce an accurate approximation. The sketch recognition community calls this problem stroke fragmentation. Stroke fragmentation methods should convert strokes into 2D geometric primitives while meeting criteria for accuracy, efficiency, and adaptability. Unfortunately, no existing method meets all these criteria. To alleviate this problem, we developed DPFrag, which has two main components. First, a dynamicprogramming framework reduces the exponential runtime complexity of naive search to O(n2), in terms of stroke length. Second, an adaptive cost function enables DPFrag to adapt to different sets of primitives, contexts, and user styles. An evaluation with five datasets showed that DPFrag met the fragmentation method criteria and was more accurate than other state-of-the art methods.

Fragmentation Method Criteria Here, we look a little more closely at the fragmentation method criteria. Published by the IEEE Computer Society

The first criterion is high accuracy. We measure accuracy by comparing a method’s fragmentations to the ground truth, which human annotators provide. The second criterion is runtime efficiency. A fragmentation system must discover an optimal solution in an acceptable time. A naive brute-force search for that solution is impractical because the number of DPFrag is an efficient, globally fragmentations increases exponentially with the stroke length. optimal fragmentation method The third criterion is adapta- that learns segmentation tion to fragmentation preferences. parameters from data and What constitutes a correct frag- produces fragmentations mentation for a given stroke can by combining primitive be highly subjective. For exam- recognizers in a dynamicple, two users might prefer dif- programming framework. ferent fragmentations for a par- The fragmentation is fast and ticular stoke (see Figure 1). doesn’t require laborious and The fourth criterion is adaptedious parameter tuning. In tation to sketching styles. Users’ experiments, it beat state-ofdrawing styles vary. For inthe-art methods on standard stance, the level of messiness, noise, and drawing speed are databases. subject to change. The final criterion is adaptation to specific sets of primitives. The set of primitives allowed for building fragmentations might vary across domains and applications. For example, in a scenario involving recognition of hand-drawn circuit diagrams, a set of primitives that includes helices might aid the recognition of inductor symbols. So, the set of primitives used in generating fragmentations should be easy to customize.

0272-1716/13/$31.00 © 2013 IEEE

IEEE Computer Graphics and Applications

59

Feature Article Arc1

Arc1 Line1 Line2

Line5

Line1 Line3

of the individual segments obtained through independent classification decisions:

Line3

P (isOptimal ( F ) S ) = P (isPrim (Sc1:c2 ) S )

Line2

Line4

P (isPrim (Sc2:c3 ) S )

Figure 1. Two valid fragmentations of a stroke. A fragmentation system should be able to adapt to different user preferences.

6

7

5

9

4

10

2

3

11

∏ P (isPrim(S

ci :ci+1

) S).

The fragmentation problem’s optimality criterion is

12

13

1

Fopt = arg max P (isOptimal ( F ) S )

Line2 3

11

13

(b)

F∈G

= arg max F∈G

Figure 2. A fragmentation example. (a) The raw points for stroke S. (b) The correct fragmentation F ={S1:3, S3:11, S11:13}, which generates three primitives: two lines and an arc.

Terminology A stroke S is an ordered set of time-stamped points taken between pen-up and pen-down events. In Matlab notation, the range from the ith to jth stroke point is Si:j. A primitive is a simple 2D geometric object with a few parameters (for example, lines, elliptical arcs, splines, Bézier curves, spirals, and helices). Each primitive has an ordered subset of stroke points. A fragmentation F is an ordered partition of the stroke points into nonempty sets. We express a fragmentation in terms of a set of corner points, C = {c1, c2, …, cn}. We denote a fragmentation as F = {Sc1:c2 , Sc2:c3 ,…, Sc(n−1):cn }. In the context of Figure 2, a plausible fragmentation is F = {S1:3, S3:11, S11:13}.

n−1

∏ P (isPrim(S

ci :ci+1

) S),

i=1

where G is the set of all possible fragmentations. To avoid the numerical underflow of probabilities, we replace the probability term with log probability: Fopt = arg max F∈G

n−1

∑ log P (isPrim(S

ci :ci+1

i=1

) S ).

Then, a cost function C(F, S), which is minimized for optimal solutions, is n−1

∑ log P (isPrim(S

C (F, S) = −

i=1

ci :ci+1

) S ).

We can restate the optimality criterion in terms of C(F, S): Fopt = arg min C ( F , S ) F∈G

n−1

∑ log P (isPrrim(S

= arg min −  F∈G

Optimal Fragmentation

i=1

ci :ci+1

) S). (1)

We can assess the optimality of a particular F by checking how well its subsets look like individual primitives:

Next, we explain how DPFrag calculates P(isPrim(Sci:ci+1)S), which is a segment’s likelihood of being a primitive.

isPrim (Sc1:c2 ) ,      isPrim (Sc2:c3 ) ,  P (isOptimal ( F ) S ) = P   , …,    isPrim (Scn :cn ) S 

The DPFrag Implementation

−1

where isOptimal(F) denotes that F is optimal, and isPrim (Sci :c j ) denotes that the stroke segment from point ci to point cj is a primitive. We rewrite the previous probability as a product of probabilities 60

n−1 i=1

Line1 1 (a)

P (isPrim (Scn−1:cn ) S )                               =

Arc

8



September/October 2013

DPFrag comprises two modules. The classifier learning module lets DPFrag learn optimal fragmentation settings and produces the DPFrag classifier. The stroke fragmentation module takes the DPFrag classifier and computes optimal fragmentations of new strokes, using dynamic programming.

Classifier Learning Stroke segments fall into three classes:

Annotated strokes

Segment extraction

Training-set generation

Feature extraction

Automatic label assignment

Classifier training

DPFrag classifier

Figure 3. The classifier-learning module’s five stages. This module produces the DPFrag classifier, which estimates the likelihood of a segment being a single primitive. ■■ ■■ ■■

Primitive. The segment forms only one primitive. Subprimitive. The segment covers part of a primitive. Multiprimitive. The segment contains multiple primitives.

We train the DPFrag classifier to estimate which of these classes matches a given stroke segment. The classifier lets us estimate P (isPrim (Sci :ci+1 ) S ). Classifier learning has five stages (see Figure 3).

■■

On the basis of the properties we just described, we use these shape-based features: ■■ ■■

Segment extraction and training-set generation. To train the DPFrag classifier, we need examples of primitives, subprimitives, and multiprimitives. Consecutive sequences of linear segments in an annotated stroke provide the examples. We use the well-known Douglas-Peucker algorithm to extract linear segments. This algorithm recursively divides a curve into short linear segments until each point on the stroke is at least a distance of ϵ from the line connecting its left and right segment points. We automatically set ϵ such that no Douglas-Peucker segment in the training set contains multiprimitives. This is a conservative threshold, so the probability of missing corners in the test data owing to a large ϵ is negligible. After extracting the Douglas-Peucker segments, we generate the training set using all their connected subsets (see Figure 4) and feed the training set into feature extraction and label assignment. Feature extraction. The DPFrag classifier deals with primitives, so we use features that encode the primitives’ shape, curvature, and speed. Some features are based on prior fragmentation research; we designed the other features. The most distinctive property of primitives is visual appearance. So, we mainly use fit-based features to characterize the similarity between the hand-drawn ink and a set of idealized geometric templates such as lines and elliptical arcs. Measuring this similarity lets us discriminate the three cases:

■■ ■■ ■■ ■■ ■■ ■■

■■ ■■ ■■

■■



For primitives, the fit-based error for a specific template should be lower than for multiprimitives. The concatenation of a new segment to either endpoint of the sequence significantly increases the fit-based error. For subprimitives, the fit-based error for a specific template should be lower than for multi­ primitives. Furthermore, extending the stroke

∆avg (Sci :c j ) and ∆max (Sci :c j ) for a line fit, min (∆avg (Sci−:c j ) , ∆avg (Sci :c j + )) for a line fit, min (∆max (Sci−:c j ) , ∆max (Sci :c j + )) for a line fit, ∆avg (Sci :c j ) and ∆max (Sci :c j ) for an ellipse fit, min (∆avg (Sci−:c j ) , ∆avg (Sci :c j + )) for an ellipse fit, min (∆max (Sci−:c j ) , ∆max (Sci :c j + )) for an ellipse fit, the chord length, the Euclidean distance between the starting points and endpoints, the bounding-box width, the bounding-box height, and the number of segments.

∆max (Sci :c j ) and ∆avg (Sci :c j ) represent the maximum and average least-squares fit errors for the stroke segment. Points ci+ and ci– are the closest points to ci on either side, as found by the Douglas-Peucker algorithm. Extensive research on stroke fragmentation has indicated curvature and speed features’ utility. To design such features, we took the following properties into account: ■■

■■

■■

For primitives, the endpoints will likely have corner characteristics (maxima in curvature and minima in speed). For subprimitives, only one endpoint has corner characteristics. For multiprimitives, other points besides the endpoints will likely have corner characteristics.

We ended up with these features: ■■

■■

segment on either end shouldn’t substantially increase the fit error. Multiprimitives should have a large fit error for all primitive templates.

■■ ■■ ■■

■■ ■■

the maximum absolute curvature at the segment’s start, the minimum speed at the segment’s start, the minimum straw value at the segment’s start, the maximum absolute curvature at the segment’s midpoints, the minimum speed at the segment’s midpoints, the minimum straw value for the segment’s midpoints, IEEE Computer Graphics and Applications

61

Feature Article Training example

where (xi, yi) is the ith point’s coordinates. We estimate the speed at a given point using finite differences: Speedi = 3

Segment extraction (Douglas-Peucker) 1

Primitive

2

5

4

Subprimitive

Multiprimitive 3

2

1 3

2

1

2

4

4

5

4

2

3

1

2

3

3

4

3

1

2

5

3

3

5

4

2

4

5

4

Training set Figure 4. Training-set generation. We generate examples of primitive, subprimitive, and multiprimitive segments using the consecutive Douglas-Peucker segments. Circles indicate human-annotated corners for the training example; filled dots indicate segments extracted by the Douglas-Peucker algorithm.

Region1 Region2

Figure 5. To get reliable curvature and speed features, we compute the curvature extrema separately in three regions: two circular regions with a radius of ϵ/2 around the stroke segment’s starting point and endpoint and a region between the two circular regions. ■■

■■ ■■

the maximum absolute curvature at the segment’s end, the minimum speed at the segment’s end, and the minimum straw value at the segment’s end.

(A straw is the Euclidean distance between a point’s right and left neighbors.) To estimate the curvature at a given point, we use this formula from analytical geometry: Curvaturei =

62

x i yi − xi y i 32

(x i2 + y i2 )

September/October 2013

,

di+1 − di−1 , ti+1 − ti−1

where di is the chord length between the stroke’s first and ith point, and ti is the ith point’s time stamp. To better estimate the curvature and speed at the segment’s ends, we pick points within a radius of ϵ/2 of the segment endpoints (see Figure 5). We compute the curvature maxima and speed minima in these regions. A radius of ϵ/2 guarantees that the support regions for computing the curvature and speed features remain disjoint. Label assignment. DPFrag automatically labels the extracted features as primitives, subprimitives, or multiprimitives on the basis of a human annotator’s manual fragmentations (see Figure 4). If an annotated segment contains corners only at both ends, it’s a primitive. If it contains only one corner at an endpoint, it’s a subprimitive. Otherwise, it’s a multiprimitive. Classifier training. Using examples of the three classes, we train a probabilistic multiclass support vector machine (SVM) with a radial-basis-function kernel. The SVM classifier outputs a posteriorprobability estimate of the class labels. We use the LIBSVM toolbox to learn pairwise-probability estimates by maximizing a likelihood function using the training data and predicted labels. Following the optimization, LIBSVM computes multiclass probabilities by combining pairwise class probabilities. We train the classifier via fivefold cross validation and use it to estimate P (isPrim (Sci :ci+1 ) S ) for a given stroke segment.

Stroke Fragmentation We can use the trained DPFrag classifier to estimate any given fragmentation’s cost. We could try to enumerate all possible fragmentations G of a given stroke S and then declare the least expensive fragmentation as optimal. However, because the number of fragmentations grows exponentially, exhaustive enumeration isn’t feasible. Fortunately, the optimal-substructure property of dynamic programming holds for our problem, which lets us use dynamic programming to efficiently search for optimal fragmentations. A problem exhibits optimal substructure if an optimal solution to the problem contains optimal solutions to the subproblems. Suppose that an

Stroke instance

optimal fragmentation for segment S1:n splits the segment at point Sk. Then, the fragmentation of S1:k within this optimal fragmentation must also be optimal. Otherwise, we could have substituted in a less costly fragmentation of S1:k to obtain a better segmentation, which is a contradiction. This shows that any optimal solution contains optimal solutions to subproblems and that the optimalsubstructure property holds for the fragmentation problem with the optimality criterion in Equation 1. So, we can build an optimal solution by solving subproblems, using dynamic programming. We define an optimal solution’s cost recursively in terms of optimal solutions to the subproblems. For the trivial case of no segment, the cost is zero. Assuming that fragmentation of S1:j splits the segment at Sk, where k ≤ j, the optimal fragmentation’s cost is Cost [ j ] = Cost [ k ] − log P (isPrim (Sk: j ) S ). So, our recursive definition for the minimum cost is Cost [ j ] 0 =  min Cost [ k ] − log P (isPrim (Sk: j ) S ) 0≤k 0

.

Instead of computing the solution by recurrence, we can iteratively compute the optimal cost using a bottom-up approach. This lets us solve the fragmentation problem in O(n2), which is far less than naive search methods’ exponential complexity. Figure 6 shows the stroke fragmentation module and how it connects to the DPFrag classifier.

Evaluation We evaluated DPFrag’s accuracy, performance, and adaptability on five datasets: three out-of-context datasets and two in-context datasets (see Table 1).

The Out-of-Context Datasets These datasets contained geometric shapes unassociated with a specific domain. The standard data­sets

Feature extraction

Segment extraction (Douglas-Peucker)

S1:c2

S1:c3

S1:c4

DPFrag classifier

S1:cn

Calculation of primitive match cost

Fragmented stroke Figure 6. The stroke fragmentation module efficiently calculates and combines the primitive-matching costs.

we used were ShortStraw, IStraw, and SpeedSeg. Each dataset contains symbols designed to evaluate its corresponding fragmentation method of the same name. (For more on these methods, see the sidebar.) ShortStraw contains polyline strokes; IStraw and SpeedSeg shapes contain line and arc primitives. Figures 7a, 7b, and 7c show representative symbols from the datasets.

The In-Context Datasets These datasets contained sketches of symbols that have a meaning in a domain. Here, we used the well-known COAD (Course of Action Diagrams)4 and NicIcon datasets. COAD contains symbols that soldiers draw on maps when planning military operations. NicIcon contains symbols for emergency management applications. Figures 7d and 7e show representative symbols from both datasets. Symbols in these datasets weren’t collected for evaluating specific fragmentation methods, so they let us test our method with more realistic input. This is important; previous stroke fragmentation research hasn’t reported accuracy and performance on in-context datasets. In particular, accuracy is what really matters because these data­ sets are

Table 1. The datasets we used to evaluate DPFrag. Dataset

Type

ShortStraw

Out-of-context

No. of strokes 600

No. of symbols

No. of symbol types

No. of subjects

600

11

15

IStraw

Out-of-context

400

400

10

10

SpeedSeg

Out-of-context

2,311

2,311

10

14

COAD*

In-context

1,507

400

20

8

NicIcon

In-context

1,204

400

14

32

*Course of Action Diagrams.



IEEE Computer Graphics and Applications

63

Feature Article

Related Work in Stroke Fragmentation

G

enerally, stroke fragmentation methods fall into two categories. The first category finds fragmentations by detecting corners; the other computes piecewise approximations of given digital curves. Both categories employ carefully designed heuristics.

Corner Detection To detect corners, these methods rely mainly on the maxima in curvature and the minima in speed. However, speed and curvature data contain substantial noise due to digitization and imperfect motor control of the hand, which must be filtered out before corner detection. To address this issue, many researchers have tried to determine optimal smoothing parameters and then locate corners. To distinguish smallscale curvature variations due to noise from the larger variations due to real corners, early research employed multiscale approaches. Recent corner detection methods still employ speed and curvature metrics. For example, ShortStraw declares points with straw values less than some threshold as corners, where a point’s straw is the Euclidean distance between its right and left neighbors.1 IStraw, an extension of ShortStraw, also uses straw values but exploits speed features.2 ShortStraw and IStraw rely on local stroke features. Their parameters must be manually tuned to adapt them to different users, contexts, and primitive types. This requirement is their biggest disadvantage. Although they can achieve high accuracy in datasets for which their parameters have been tuned, they perform poorly on other databases unless they go through tedious iterative manual tuning. An exception is ClassySeg, which uses machine learning.3 However, ClassySeg relies on local decisions to build a corner set and doesn’t find a globally optimal solution.

Piecewise Approximation Generally, these methods identify an initial set of segments and then split or merge them according to heuristics. For instance, Bo Yu used mean-shift clustering for noise reduction and then approximated the stroke with arcs, lines, and ellipses using visual features.4 The sortmerge-repeat method merges the initial segments on the basis of fit errors.5 Some piecewise fragmentation methods have used dynamic programming. For example, Vincenzo Deufemia and Michele Risi used it to match a given stroke to tem-

plates.6 Liu Yin and colleagues used it with a deterministic cost function.7 Alexander Kolesnikov used it in a minimumdescription-length setup.8 These methods were either limited to predefined templates or used deterministic cost functions, which don’t meet the fragmentation problem’s adaptability requirements. Unlike these methods, ours (see the main article) learns primitive-level models from data. After identifying a set of initial segments, SpeedSeg merges and splits them using a set of improved heuristics; however, it relies on preset parameters.9 In contrast, our method finds piecewise approximations via dynamic programming combined with an adaptive cost function.

References 1. A. Wolin and T. Hammond, “ShortStraw: A Simple and Effective Corner Finder for Polylines,” Proc. Eurographics 5th Ann. Conf. Sketch-Based Interfaces and Modeling (SBIM 08), Eurographics Assoc., 2008, pp. 33–40. 2. Y. Xiong and J.J. LaViola Jr., “A ShortStraw-Based Algorithm for Corner Finding in Sketch-Based Interfaces,” Computers & Graphics, vol. 34, no. 5, 2010, pp. 513–527. 3. J. Herold and T.F. Stahovich, “ClassySeg: A Machine Learning Approach to Automatic Stroke Segmentation,” Proc. 8th Eurographics Symp. Sketch-Based Interfaces and Modeling (SBIM 11), ACM, 2011, pp. 109–116. 4. B. Yu, “Recognition of Freehand Sketches Using Mean Shift,” Proc. 8th Int’l Conf. Intelligent User Interfaces (IUI 03), ACM, 2003, pp. 204–210. 5. A. Wolin, B. Paulson, and T. Hammond, “Sort, Merge, Repeat: An Algorithm for Effectively Finding Corners in Hand-Sketched Strokes,” Proc. 6th Eurographics Symp. Sketch-Based Interfaces and Modeling (SBIM 09), ACM, 2009, pp. 93–99. 6. V. Deufemia and M. Risi, “A Dynamic Stroke Segmentation Technique for Sketched Symbol Recognition,” Pattern Recognition and Image Analysis, LNCS 3523, Springer, 2005, pp. 335–357. 7. L. Yin, Y. Yajie, and L. Wenyin, “Online Segmentation of Freehand Stroke by Dynamic Programming,” Proc. 8th Int’l Conf. Document Analysis and Recognition (ICDAR 05), IEEE CS, 2005, pp. 197–201. 8. A. Kolesnikov, “Segmentation and Multi-model Approximation of Digital Curves,” Pattern Recognition Letters, vol. 33, no. 9, 2012, pp. 1171–1179. 9. J. Herold and T.F. Stahovich, “SpeedSeg: A Technique for Segmenting Pen Strokes Using Pen Speed,” Computers & Graphics, vol. 35, no. 2, 2011, pp. 250–264.

more representative of what strokes look like when people draw symbols from real domains, rather than just reproducing simple strokes with specific constructions. Both datasets can be fragmented using lines and elliptical arcs. Because these datasets contain an ex64

September/October 2013

cessive number of examples, we annotated and used random subsets of them. Table 1 lists the number of manually annotated strokes in each dataset.

Accuracy The most important criterion for measuring a

(a)

(b)

(c)

(d)

(e) Figure 7. Vector drawings of the shapes from (a) ShortStraw, (b) IStraw, (c) SpeedSeg, (d) COAD (Course of Action Diagrams), and (e) NicIcon. ShortStraw, IStraw, and SpeedSeg are out-of-context datasets, which contain geometric shapes unassociated with a specific domain. COAD and NicIcon are in-context datasets, which contain sketches of symbols that have a meaning in a domain.

fragmentation method’s utility is accuracy. We used a randomly selected subset of each dataset to train DPFrag and the rest of each dataset to test its accuracy. Figure 8 shows DPFrag’s average accuracy for different training-data sizes over 10 randomized runs. The error bars represent one standard deviation for the recorded accuracies of 10 repeated measurements. DPFrag reached excellent accuracy with as few as 25 to 50 annotated examples in all datasets. Table 2 compares DPFrag with four methods: IStraw, sort-merge-repeat, SpeedSeg, and ClassySeg. (For more on these methods, see the sidebar.) Again, the DPFrag accuracies are the average accuracies of 10 trials. DPFrag either matched the other methods’ performance (IStraw) or surpassed that performance by a wide margin (by 17 to 60 percent for sortmerge-repeat, SpeedSeg, and ClassySeg). DPFrag’s superior performance was especially pronounced with the in-context datasets. In particular, as in the case of IStraw, existing methods are highly tuned for their respective datasets and can’t generalize with data from a different dataset. DPFrag, on the other hand, can generalize to new datasets with only a handful of training examples. The inability to adapt to different datasets is common in methods needing hand tuning. This problem is aggravated by the fact that manual parameter tuning is hard, especially when there are too many

parameters to tune. For example, SpeedSeg has 12 free parameters, and it performed worse than DPFrag on its own dataset. Its performance on the ShortStraw dataset was also lower than expected. Using machine learning alone doesn’t result in better performance unless it takes global features into account and searches for a globally optimal solution. This is illustrated by ClassySeg, which classifies the candidate corner points using local features. Its accuracy was also below that of DPFrag on the SpeedSeg dataset. DPFrag’s superior performance shows the utility of using global features and searching for a global optimum in the space of all fragmentations, rather than relying on local decisions.

Speed To facilitate real-time interaction, fragmentation methods should be efficient and fast. DPFrag is particularly suited for real-time processing of strokes because it can start processing a stroke even before the user completes that stroke. Figure 9 shows the average time for processing strokes with varying numbers of segments. The times are discounted by the time spent for drawing each stroke. Negative values indicate cases in which DPFrag computed fragmentations faster than the users could draw. We performed the speed measurements on a standard laptop using a nonoptimized Matlab IEEE Computer Graphics and Applications

65

1.0

1.0

0.8

0.8

0.8

0.6 0.4 0.2

0.6 0.4 0.2

0

Accuracy

(a)

0

5 10 25 50 100 250 350 Training size (no. of symbols)

(b)

0

5 10 25 50 100 250 350 Training size (no. of symbols)

1.0

1.0

0.8

0.8

0.6 0.4 0.2 0

(d)

0.6 0.4 0.2

Accuracy

0

Accuracy

1.0

Accuracy

Accuracy

Feature Article

0

0

(c)

5 10 25 50 100 250 350 Training size (no. of symbols)

0.6 0.4 0.2

0

5 10 25 50 100 250 350 Training size (no. of symbols)

0

0

(e)

5 10 25 50 100 250 350 Training size (no. of symbols)

Figure 8. DPFrag’s learning curves for (a) ShortStraw, (b) IStraw, (c) SpeedSeg, (d) COAD, and (e) NicIcon. The error bars indicate one standard deviation. DPFrag performed best with at most 50 examples.

Table 2. Comparing DPFrag with other methods.* Method DPFrag, 25 training examples

DPFrag, 50 training examples

DPFrag, 350 training examples

IStraw

ShortStraw

0.99

0.99

0.99

IStraw

0.92

0.94

0.96

SpeedSeg

0.77

0.87

COAD

0.95

0.95

NicIcon

0.78

0.82

Dataset

Sort-mergerepeat

SpeedSeg (user agnostic)

ClassySeg (user agnostic)

0.99

0.97

0.86



0.96

0.78





0.89

0.72



0.78

0.75

0.97

0.82







0.84

0.24







*Only the IStraw method was available for testing. The SpeedSeg and ClassySeg methods aren’t publicly available, so we only list accuracies reported in the literature.5,6 Bold numbers indicate the best accuracy for a dataset

implementation of DPFrag. So, in practice, there’s substantial room for improving the speed.

and curvature features let DPFrag achieve excellent performance with only a few training examples.

The Feature Subsets’ Utility

W

Fragmentation methods use speed, curvature, and shape features extensively in many forms. However, researchers haven’t investigated these features’ utility for fragmentation. So, we compared accuracy after training the DPFrag classifier with individual feature subsets. Figure 10 shows the results. Shape features outperformed speed-and-curvature features for all datasets except IStraw. The shape features’ superiority was more pronounced for the in-context datasets, in which strokes might have more corners with subtle drops in speed. Feature-level combination of feature subsets boosted fragmentation accuracy for all five datasets. The complementary nature of shape, speed, 66

September/October 2013

e presented an efficient, highly accurate digital curve fragmentation algorithm that can easily adapt to different settings by learning from data. We plan to use DPFrag components and concepts for object-level recognition of sketches and scenes of digital curves.

References 1. D.I.S. Adi, S.M.b. Shamsuddin, and S.Z.M. Hashim, “Nurbs Curve Approximation Using Particle Swarm Optimization,” Proc. 7th Int’l Conf. Computer Graphics, Imaging, and Visualization, IEEE CS, 2010, pp. 73–79. 2. G. Orbay and L.B. Kara, “Beautification of Design

5

R. Sinan Tümen is an engineer at the Turkish Navy Research Center and a PhD student in computer engineering at Koç University. His research interests include intelligent user interfaces and machine learning. Tumen received an MS in computer science from Johns Hopkins University. Contact him at [email protected].

0 tfrag – tstroke (sec.)

Sketches Using Trainable Stroke Clustering and Curve Fitting,” IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 5, 2011, pp. 694–708. 3. C. Alvarado and R. Davis, “SketchRead: A Multidomain Sketch Recognition Engine,” Proc. 17th Ann. ACM Symp. User Interface Software and Technology (UIST 04), ACM, 2004, pp. 23–32. 4. Operations Terms and Graphics, US Army Field Manual FM 1-02, Dept. of the Army, 2004; http://armypubs. army.mil/doctrine/DR_pubs/dr_a/pdf/fm1_02.pdf. 5. J. Herold and T.F. Stahovich, “ClassySeg: A Machine Learning Approach to Automatic Stroke Segmenta­ tion,” Proc. 8th Eurographics Symp. Sketch-Based Inter­ faces and Modeling (SBIM 11), ACM, 2011, pp. 109–116. 6. J. Herold and T.F. Stahovich, “SpeedSeg: A Technique for Segmenting Pen Strokes Using Pen Speed,” Computers & Graphics, vol. 35, no. 2, 2011, pp. 250–264.

−5

−15

2

4

6

8

10 12 14 No. of segments

0.8

0.8

5 10 25 50 100 250 350 Training size (no. of symbols)

Accuracy

(a)

0

0.6 0.4 0.2 0

(b)

Shape features Speed-curvature features

0

5 10 25 50 100 250 350 Training size (no. of symbols)

1.0

1.0

0.8

0.8 Accuracy

0

Shape features Speed-curvature features

Accuracy

0.8 Accuracy

1.0

0.2

0.6 0.4 0.2 0

(d)

Shape features Speed-curvature features

0

20

Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

1.0

0.4

18

ligent interfaces and applying pattern recognition, computer vision, and machine learning to real-world problems. Sezgin received a PhD in electrical engineering and computer science from the Massachusetts Institute of Technology. Contact him at [email protected].

1.0

0.6

16

Figure 9. The net average difference between the time for processing strokes (t frag) and drawing them (tstroke). Negative values indicate cases in which DPFrag computed fragmentations faster than the users could draw. The error bars indicate one standard deviation.

T. Metin Sezgin is an assistant professor of computer engineering at Koç University. His research interests are intel-

Accuracy

ShortStraw IStraw SpeedSeg COAD NicIcon

−10

5 10 25 50 100 250 350 Training size (no. of symbols)

0.6 0.4 0.2 0

(c)

Shape features Speed-curvature features

0

5 10 25 50 100 250 350 Training size (no. of symbols)

0.6 0.4 0.2 0

(e)

Shape features Speed-curvature features

0

5 10 25 50 100 250 350 Training size (no. of symbols)

Figure 10. The accuracy of shape and speed-and-curvature features for (a) ShortStraw, (b) IStraw, (c) SpeedSeg, (d) COAD, and (e) NicIcon. For clarity of presentation, we omitted curves representing accuracies obtained by all features; Figure 8 presents those results.

IEEE Computer Graphics and Applications

67

DPFrag: trainable stroke fragmentation based on dynamic programming.

Many computer graphics applications must fragment freehand curves into sets of prespecified geometric primitives. For example, sketch recognition typi...
3MB Sizes 1 Downloads 3 Views